Dispatch Levels

Level 1 — OaModule + Autograd

Subclass OaModule, register layers, override Forward.OaFnGrad::Backward propagates gradients. OaAdamW updates weights. The same pattern scales from a two-layer MLP to a full transformer.

When to Use

  • Prototyping new architectures
  • Research and experimentation
  • Models where topology changes between steps
  • Starting point before moving to Level 2 for production

Full Example

Level1.cpp

// Tutorial/Ml/TutorialMnistClassifier.cpp
// Fashion-MNIST — 83.2% test accuracy, 244K samples/s (RTX 5090 Laptop)
class OaMnistClassifier : public OaModule {
public:
OaMnistClassifier() {
Fc1_ = OaMakeSharedPtr<OaLinear>(784, 128);
Fc2_ = OaMakeSharedPtr<OaLinear>(128, 10);
RegisterModule("fc1", Fc1_);
RegisterModule("fc2", Fc2_);
}
OaDeviceMatrix Forward(const OaDeviceMatrix& x) override {
auto h = OaFnMatrix::Scale(x, 1.0f / 255.0f); // normalize
h = OaFnMatrix::Relu(Fc1_->Forward(h)); // 784 -> 128, ReLU
return Fc2_->Forward(h); // 128 -> 10
}
private:
OaSharedPtr<OaLinear> Fc1_, Fc2_;
};
int main() {
auto rt = OaEngine::Create({.AppName = "Mnist"}).Unwrap();
OaMnistClassifier model;
OaAdamW opt(model.AllParameterPtrs(), 0.001f);
OaFnGrad::SetMode(OaGradMode::Dynamic);
for (OaI32 step = 0; step < 2000; ++step) {
sampler.NextBatch(batchX, batchY);
auto logits = model.Forward(batchX);
auto loss = OaFnMatrix::CrossEntropyLoss(logits, batchY);
OaFnGrad::Backward(loss);
opt.Step();
opt.ZeroGrad();
}
}

Available Layers

LayerDescription
OaLinearFully connected layer with optional bias
OaEmbeddingToken embedding lookup
OaLayerNormLayer normalisation
OaRMSNormRMS normalisation (no mean shift)
OaSequentialOrdered container of sub-modules

OaFnMatrix ops used in training

  • OaFnMatrix::Relu, Tanh, Softmax
  • OaFnMatrix::Scale — scalar multiply (e.g. ÷255 normalisation)
  • OaFnMatrix::CrossEntropyLoss — accepts U8 class indices directly
  • OaFnGrad::Backward(loss) — reverse-mode autodiff from scalar loss

Optimizers

  • OaAdamW — AdamW with decoupled weight decay (default: 0.01)
  • OaSGD — stochastic gradient descent with momentum