Vision
OaVision
Image and video runtime built on Vulkan compute. Load, decode, transform, and normalise on the accelerator. Every frame arrives as an OaDeviceMatrix — the same type used by the ML stack — ready for inference or training without a copy.
7
GPU image transforms
H.264 / H.265 / AV1
Video codecs
BF16 NCHW
Tensor output format
0
CPU copies on fast path
Design
Vision follows the same function API rule as the ML module. Stateless transforms live in OaFnVision::* and operate on OaDeviceMatrix. Long-lived resources — decoders, encoders, image pipelines — are explicit classes. The output type is always OaDeviceMatrix in NCHW layout, so vision and ML ops compose without any adapter layer:
Pipeline.cpp
auto x = OaFnMatrix::Zeros({1, 3, 224, 224});auto y = OaFnVision::Normalize(rt, x, mean, std);auto z = model.Forward(y); // OaDeviceMatrix all the way through
Tutorial — Image Viewer
Full display application: load a JPEG, show it in a GPU window, switch between RGB and individual colour channels with keyboard shortcuts. Verified on NVIDIA RTX 5090 Laptop GPU. Full source: Tutorial/Vision/TutorialImageViewer.cpp.
Tutorialimageviewer.cpp
// Tutorial/Vision/TutorialImageViewer.cpp// Load and display an image with channel inspectionclass OaImageViewerApp : public OaGpuiApp {public:void OnInit(OaGpui& gpui) override {auto& rt = *OaVkComputeEngine::GetGlobal();Image_ = OaUiImage::LoadFile(rt, "Asset/Image/Realm1024px.jpg").Unwrap();Planes_ = OaImagePlanes::LoadFile(rt, "Asset/Image/Realm1024px.jpg").Unwrap();auto& input = gpui.Input();input.RegisterAction({.Name = "rgb", .Binding = {.Key = OuiKey::R},.Callback = [this] { Mode_ = ViewMode::RGB; }});input.RegisterAction({.Name = "red", .Binding = {.Key = OuiKey::Num1},.Callback = [this] { Mode_ = ViewMode::R; }});input.RegisterAction({.Name = "quit", .Binding = {.Key = OuiKey::Escape},.Callback = [this] { Quit(); }});}void OnRender(Oui& oui) override {// Letterbox: maintain aspect ratioOaF32 a = (OaF32)Image_.Width / (OaF32)Image_.Height;OaI32 dW = Gpui().Width(), dH = (OaI32)(dW / a);OaI32 x = 0, y = (Gpui().Height() - dH) / 2;oui.BeginPanel("image", {.X = x, .Y = y, .W = dW, .H = dH});if (Mode_ == ViewMode::RGB)oui.Image(Image_.BindlessIndex(), Image_.Width, Image_.Height);elseoui.ImagePlanar(SingleChannel(Mode_)); // R / G / B grayscaleoui.EndPanel();}private:OaUiImage Image_;OaImagePlanes Planes_;ViewMode Mode_ = ViewMode::RGB;};int main(int argc, char** argv) {OaImageViewerApp app;if (argc > 1) app.Path_ = argv[1];return app.Run({.Title = "OA Image Viewer", .Width = 1280, .Height = 720}).IsOk() ? 0 : 1;}
Image Preprocessing
Load a JPEG or PNG from disk or a memory buffer and receive a normalised device tensor in one call. Resize, channel normalisation, and ImageNet statistics are applied on the GPU.
Image_ingest.cpp
// File → GPU upload → optional transform → OaDeviceMatrix (BF16 NCHW)auto img = OaJpegDecoder::DecodeFileToGpu(rt, "photo.jpg");auto sized = OaFnVision::Resize(rt, img, 224, 224);auto tensor = OaFnVision::Normalize(rt, sized, imagenet_mean, imagenet_std);// tensor → [1, 3, 224, 224] BF16 — feed directly to model.Forward()
Video Decode
Open a decoder session for a codec and resolution, then submit compressed access units. The decoded NV12 frame transitions directly to shader-readable layout via hardware VK_KHR_sampler_ycbcr_conversion and converts to a BF16 tensor without leaving the device. A compute shader fallback activates automatically on devices where hardware YCbCr is absent.
Current verified device: NVIDIA RTX 5090 Laptop GPU. Intel, AMD, and Qualcomm coverage in progress.
Video_decode.cpp
// Hardware decode — NV12 stays device-local through to BF16auto decoder = OaVideoDecoder::Create(rt, {.Codec = OaVideoCodec::H264,.Width = 1920,.Height = 1080,});// vkCmdDecodeVideoKHR → NV12 VkImage// → hardware YCbCr sampler (VK_KHR_sampler_ycbcr_conversion)// → CvtNv12YcbcrToBf16.slangauto tensor = decoder.DecodeFrameToBf16(access_unit);// tensor → [1, 3, 1080, 1920] BF16
Transform Reference
| Function | Description |
|---|---|
OaFnVision::Resize | Bilinear resize to target W × H |
OaFnVision::Normalize | Per-channel mean/std normalisation |
OaFnVision::GaussianBlur | Separable Gaussian blur, configurable kernel |
OaFnVision::Crop | Rectangular crop by pixel coordinates |
OaFnVision::Flip | Horizontal or vertical flip |
OaFnVision::Rotate | Arbitrary-angle rotation |
OaFnVision::ResizeBatch | Batch resize — N images in one dispatch |
Codec Support
| Format | Decode path | Output |
|---|---|---|
| H.264 / AVC | Vulkan Video hardware | NV12 → BF16 NCHW |
| H.265 / HEVC | Vulkan Video hardware | NV12 → BF16 NCHW |
| AV1 | Vulkan Video hardware | NV12 → BF16 NCHW |
| JPEG | CPU decode → GPU upload | RGBA8 → OaDeviceMatrix |
| PNG | CPU decode → GPU upload | RGBA8 → OaDeviceMatrix |