Dispatch Levels

Level 2 — Compiled Graph Replay

Set OaFnGrad::SetMode(OaGradMode::Compiled) on any Level 1 model. The first forward pass records the compute graph; all subsequent steps replay the compiled command buffer with near-zero CPU overhead.

When to Use

  • Production training runs with static input/output shapes
  • Long runs where per-step CPU recording overhead accumulates
  • Use Auto mode to stay on Dynamic for the first 3 steps, then compile automatically

Example — OaGradMode options

Same OaModule code as Level 1. Only the grad mode changes:

Tutorialtextgenerationrnn.cpp

// Tutorial/Ml/TutorialTextGenerationRnn.cpp
// Char-level LM — 300 steps, loss 0.897, 165K samples/s (RTX 5090 Laptop)
// Level 1 autograd — OaFnGrad::SetMode(Compiled) promotes to Level 2 replay
class OaTextModel : public OaModule {
public:
OaTextModel() {
Embed_ = OaMakeSharedPtr<OaEmbedding>(27, 16);
Fc1_ = OaMakeSharedPtr<OaLinear>(128, 64);
Head_ = OaMakeSharedPtr<OaLinear>(64, 27);
RegisterModule("embed", Embed_);
RegisterModule("fc1", Fc1_);
RegisterModule("head", Head_);
}
OaDeviceMatrix Forward(const OaDeviceMatrix& x) override {
auto e = Embed_->Forward(x); // [batch*8, 16]
auto h = OaFnMatrix::Tanh(Fc1_->Forward(e)); // [batch, 64]
return Head_->Forward(h); // [batch, 27]
}
private:
OaSharedPtr<OaEmbedding> Embed_;
OaSharedPtr<OaLinear> Fc1_, Head_;
};
int main() {
auto rt = OaEngine::Create({.AppName = "Text"}).Unwrap();
OaTextModel model;
OaAdamW opt(model.AllParameterPtrs(), 0.01f);
// Dynamic: per-op dispatch (Level 1 default)
// Compiled: compile on first forward, replay after (Level 2)
// Auto: compile after 3 dynamic steps (transitions automatically)
OaFnGrad::SetMode(OaGradMode::Dynamic);
for (OaI32 step = 0; step < 300; ++step) {
auto logits = model.Forward(input);
auto loss = OaFnMatrix::CrossEntropyLoss(logits, targets);
OaFnGrad::Backward(loss);
opt.Step();
opt.ZeroGrad();
}
model.Save("text.oam");
}

Grad Modes

ModeBehaviourBest for
DynamicPer-op dispatch every stepPrototyping, variable topology
CompiledRecord on step 0, replay afterProduction, static shapes
AutoDynamic for 3 steps, then CompiledGeneral use — no manual switching