Concepts
OaDeviceMatrix & OaFnMatrix
There is one matrix type: OaDeviceMatrix — GPU-resident, used uniformly across ML, Vision, and HPC. All operations live in the OaFnMatrix stateless namespace. No separate tensor class, no implicit conversion.
Usage
Matrix.cpp
// OaDeviceMatrix — the one matrix type used everywhereOaDeviceMatrix x = OaFnMatrix::FromBytes(pixels, OaShape2D(batch, 784));OaDeviceMatrix w = OaFnMatrix::RandKaimingUniform(OaShape2D(128, 784), dtype);// OaFnMatrix — stateless ops namespace, all dispatch to GPU compute shadersauto h = OaFnMatrix::Relu(OaFnMatrix::Matmul(x, w)); // [batch, 128]auto s = OaFnMatrix::Scale(h, 1.0f / 255.0f); // scalar multiplyauto loss = OaFnMatrix::CrossEntropyLoss(logits, labels);// SetRequiresGrad marks a matrix for autograd trackingw.SetRequiresGrad(true);OaFnGrad::Backward(loss); // reverse-mode from scalar loss// w.Grad() is now populated on GPU
Shapes
Matrix.cpp
// OaShape2D / OaShape1D — explicit shape typesOaDeviceMatrix m2 = OaFnMatrix::Zeros(OaShape2D(32, 128)); // 2DOaDeviceMatrix m1 = OaFnMatrix::Zeros(OaShape1D(128)); // 1D (labels)// Reshape — returns new view, no copyauto flat = OaFnMatrix::Reshape(emb, OaShape2D(batch, context * embed_dim));// Size — dimension accessOaI64 rows = m2.Size(0);OaI64 cols = m2.Size(1);
OaFnMatrix Operations
| Category | Functions |
|---|---|
| Factory | Zeros, Ones, RandKaimingUniform, RandGlorotUniform, FromBytes |
| Arithmetic | Scale, Add, Sub, Mul |
| Linear algebra | Matmul, Transpose, Reshape |
| Activations | Relu, Tanh, Softmax, Gelu, Silu |
| Loss | CrossEntropyLoss — accepts UInt8 class indices directly |
| Norm | LayerNorm, RmsNorm |
Memory Model
Every OaDeviceMatrix is a device-local VkBuffer registered in the bindless descriptor heap. CPU reads via .At() or .DataAs<T>()after a sync. GPU dispatches via bindless index — no per-dispatch descriptor writes.