Compute

OaComputeGraph

CPU-side DAG that tracks per-buffer read/write dependencies, inserts minimal barriers, and supports compile-once replay-many execution.

Dynamic Execution

For one-shot or changing topologies:

Dynamic.cpp

OaComputeGraph graph;
graph.Add("rmsnorm", bufs1, access1, &push1, sizeof(push1), groups1);
graph.Add("matmul", bufs2, access2, &push2, sizeof(push2), groups2);
graph.Add("silu", bufs3, access3, &push3, sizeof(push3), groups3);
auto status = graph.Execute(rt); // topo-sort, barrier insertion, submit, wait
graph.Reset();

Compile + Replay

For static topologies (ML training, inference) where the graph is identical every step. Compile once, replay thousands of times with zero CPU recording overhead.

Compiled.cpp

// Init: compile once
graph.Add("rmsnorm", bufs, access, &push, sizeof(push), groups);
// ... add all nodes ...
OA_RETURN_IF_ERROR(graph.Compile(rt)); // records secondary command buffer
// Hot path: replay every step (zero CPU overhead)
OA_RETURN_IF_ERROR(graph.Replay(rt)); // replays recorded command buffer

Measured Performance

Intel HD 620 iGPU (2017 laptop, 24 EUs):

MetricResult
Replay speedup2-4.56x vs Execute()
Memory aliasing71-92% VRAM savings
Barrier elimination60-70% fewer barriers
Per-replay cost~17 us/dispatch

Memory Aliasing

The graph knows every buffer's exact lifetime (first-write to last-read). Buffers that don't overlap in time share the same device memory. Measured 71-92% VRAM savings on transient activations.