Concepts

OaDeviceMatrix & OaFnMatrix

There is one matrix type: OaDeviceMatrix — GPU-resident, used uniformly across ML, Vision, and HPC. All operations live in the OaFnMatrix stateless namespace. No separate tensor class, no implicit conversion.

Usage

Matrix.cpp

// OaDeviceMatrix — the one matrix type used everywhere
OaDeviceMatrix x = OaFnMatrix::FromBytes(pixels, OaShape2D(batch, 784));
OaDeviceMatrix w = OaFnMatrix::RandKaimingUniform(OaShape2D(128, 784), dtype);
// OaFnMatrix — stateless ops namespace, all dispatch to GPU compute shaders
auto h = OaFnMatrix::Relu(OaFnMatrix::Matmul(x, w)); // [batch, 128]
auto s = OaFnMatrix::Scale(h, 1.0f / 255.0f); // scalar multiply
auto loss = OaFnMatrix::CrossEntropyLoss(logits, labels);
// SetRequiresGrad marks a matrix for autograd tracking
w.SetRequiresGrad(true);
OaFnGrad::Backward(loss); // reverse-mode from scalar loss
// w.Grad() is now populated on GPU

Shapes

Matrix.cpp

// OaShape2D / OaShape1D — explicit shape types
OaDeviceMatrix m2 = OaFnMatrix::Zeros(OaShape2D(32, 128)); // 2D
OaDeviceMatrix m1 = OaFnMatrix::Zeros(OaShape1D(128)); // 1D (labels)
// Reshape — returns new view, no copy
auto flat = OaFnMatrix::Reshape(emb, OaShape2D(batch, context * embed_dim));
// Size — dimension access
OaI64 rows = m2.Size(0);
OaI64 cols = m2.Size(1);

OaFnMatrix Operations

CategoryFunctions
FactoryZeros, Ones, RandKaimingUniform, RandGlorotUniform, FromBytes
ArithmeticScale, Add, Sub, Mul
Linear algebraMatmul, Transpose, Reshape
ActivationsRelu, Tanh, Softmax, Gelu, Silu
LossCrossEntropyLoss — accepts UInt8 class indices directly
NormLayerNorm, RmsNorm

Memory Model

Every OaDeviceMatrix is a device-local VkBuffer registered in the bindless descriptor heap. CPU reads via .At() or .DataAs<T>()after a sync. GPU dispatches via bindless index — no per-dispatch descriptor writes.