Concepts

Autograd

Reverse-mode automatic differentiation. Tape-based. Every OaTensoroperation records to the computation graph. Backward() traverses the tape and accumulates gradients.

How It Works

  1. Operations record to a tape (computation graph)
  2. loss.Backward() traverses the tape in reverse
  3. Each op's backward kernel computes gradients via GPU compute shaders
  4. Gradients accumulate in .Grad() on each tensor

Supported Operations

All standard ops have backward implementations: Matmul, Add, Sub, Mul, Scale, ReLU, GELU, SiLU, SwiGLU, Softmax, CrossEntropy, RMSNorm, LayerNorm, Embedding, Conv1d.

No-Gradient Context

Use OaNoGrad to disable autograd recording (inference, generation):

{
	OaNoGrad noGrad;
	auto output = model.Forward(input);  // no tape recording
}