Vision

OaVision

Image and video runtime built on Vulkan compute. Load, decode, transform, and normalise on the accelerator. Every frame arrives as an OaDeviceMatrix — the same type used by the ML stack — ready for inference or training without a copy.

7

GPU image transforms

H.264 / H.265 / AV1

Video codecs

BF16 NCHW

Tensor output format

0

CPU copies on fast path

Design

Vision follows the same function API rule as the ML module. Stateless transforms live in OaFnVision::* and operate on OaDeviceMatrix. Long-lived resources — decoders, encoders, image pipelines — are explicit classes. The output type is always OaDeviceMatrix in NCHW layout, so vision and ML ops compose without any adapter layer:

Pipeline.cpp

auto x = OaFnMatrix::Zeros({1, 3, 224, 224});
auto y = OaFnVision::Normalize(rt, x, mean, std);
auto z = model.Forward(y); // OaDeviceMatrix all the way through

Tutorial — Image Viewer

Full display application: load a JPEG, show it in a GPU window, switch between RGB and individual colour channels with keyboard shortcuts. Verified on NVIDIA RTX 5090 Laptop GPU. Full source: Tutorial/Vision/TutorialImageViewer.cpp.

Tutorialimageviewer.cpp

// Tutorial/Vision/TutorialImageViewer.cpp
// Load and display an image with channel inspection
class OaImageViewerApp : public OaGpuiApp {
public:
void OnInit(OaGpui& gpui) override {
auto& rt = *OaVkComputeEngine::GetGlobal();
Image_ = OaUiImage::LoadFile(rt, "Asset/Image/Realm1024px.jpg").Unwrap();
Planes_ = OaImagePlanes::LoadFile(rt, "Asset/Image/Realm1024px.jpg").Unwrap();
auto& input = gpui.Input();
input.RegisterAction({.Name = "rgb", .Binding = {.Key = OuiKey::R},
.Callback = [this] { Mode_ = ViewMode::RGB; }});
input.RegisterAction({.Name = "red", .Binding = {.Key = OuiKey::Num1},
.Callback = [this] { Mode_ = ViewMode::R; }});
input.RegisterAction({.Name = "quit", .Binding = {.Key = OuiKey::Escape},
.Callback = [this] { Quit(); }});
}
void OnRender(Oui& oui) override {
// Letterbox: maintain aspect ratio
OaF32 a = (OaF32)Image_.Width / (OaF32)Image_.Height;
OaI32 dW = Gpui().Width(), dH = (OaI32)(dW / a);
OaI32 x = 0, y = (Gpui().Height() - dH) / 2;
oui.BeginPanel("image", {.X = x, .Y = y, .W = dW, .H = dH});
if (Mode_ == ViewMode::RGB)
oui.Image(Image_.BindlessIndex(), Image_.Width, Image_.Height);
else
oui.ImagePlanar(SingleChannel(Mode_)); // R / G / B grayscale
oui.EndPanel();
}
private:
OaUiImage Image_;
OaImagePlanes Planes_;
ViewMode Mode_ = ViewMode::RGB;
};
int main(int argc, char** argv) {
OaImageViewerApp app;
if (argc > 1) app.Path_ = argv[1];
return app.Run({.Title = "OA Image Viewer", .Width = 1280, .Height = 720}).IsOk() ? 0 : 1;
}

Image Preprocessing

Load a JPEG or PNG from disk or a memory buffer and receive a normalised device tensor in one call. Resize, channel normalisation, and ImageNet statistics are applied on the GPU.

Image_ingest.cpp

// File → GPU upload → optional transform → OaDeviceMatrix (BF16 NCHW)
auto img = OaJpegDecoder::DecodeFileToGpu(rt, "photo.jpg");
auto sized = OaFnVision::Resize(rt, img, 224, 224);
auto tensor = OaFnVision::Normalize(rt, sized, imagenet_mean, imagenet_std);
// tensor → [1, 3, 224, 224] BF16 — feed directly to model.Forward()

Video Decode

Open a decoder session for a codec and resolution, then submit compressed access units. The decoded NV12 frame transitions directly to shader-readable layout via hardware VK_KHR_sampler_ycbcr_conversion and converts to a BF16 tensor without leaving the device. A compute shader fallback activates automatically on devices where hardware YCbCr is absent.
Current verified device: NVIDIA RTX 5090 Laptop GPU. Intel, AMD, and Qualcomm coverage in progress.

Video_decode.cpp

// Hardware decode — NV12 stays device-local through to BF16
auto decoder = OaVideoDecoder::Create(rt, {
.Codec = OaVideoCodec::H264,
.Width = 1920,
.Height = 1080,
});
// vkCmdDecodeVideoKHR → NV12 VkImage
// → hardware YCbCr sampler (VK_KHR_sampler_ycbcr_conversion)
// → CvtNv12YcbcrToBf16.slang
auto tensor = decoder.DecodeFrameToBf16(access_unit);
// tensor → [1, 3, 1080, 1920] BF16

Transform Reference

FunctionDescription
OaFnVision::ResizeBilinear resize to target W × H
OaFnVision::NormalizePer-channel mean/std normalisation
OaFnVision::GaussianBlurSeparable Gaussian blur, configurable kernel
OaFnVision::CropRectangular crop by pixel coordinates
OaFnVision::FlipHorizontal or vertical flip
OaFnVision::RotateArbitrary-angle rotation
OaFnVision::ResizeBatchBatch resize — N images in one dispatch

Codec Support

FormatDecode pathOutput
H.264 / AVCVulkan Video hardwareNV12 → BF16 NCHW
H.265 / HEVCVulkan Video hardwareNV12 → BF16 NCHW
AV1Vulkan Video hardwareNV12 → BF16 NCHW
JPEGCPU decode → GPU uploadRGBA8 → OaDeviceMatrix
PNGCPU decode → GPU uploadRGBA8 → OaDeviceMatrix