Concepts
Autograd
Reverse-mode automatic differentiation. Tape-based. Every OaTensoroperation records to the computation graph. Backward() traverses the tape and accumulates gradients.
How It Works
- Operations record to a tape (computation graph)
loss.Backward()traverses the tape in reverse- Each op's backward kernel computes gradients via GPU compute shaders
- Gradients accumulate in
.Grad()on each tensor
Supported Operations
All standard ops have backward implementations: Matmul, Add, Sub, Mul, Scale, ReLU, GELU, SiLU, SwiGLU, Softmax, CrossEntropy, RMSNorm, LayerNorm, Embedding, Conv1d.
No-Gradient Context
Use OaNoGrad to disable autograd recording (inference, generation):
{
OaNoGrad noGrad;
auto output = model.Forward(input); // no tape recording
}