Compute
OaEngine
The assembled compute context. One object owns the GPU device, memory allocator, pipeline registry, bindless descriptor heap, and stream pool. Create it once. Everything flows through it.
Create
Every binary starts by creating OaEngine. It auto-registers as the process-wide global context. OaTensor and OaModuledispatch to GPU immediately — no manual setup.
Main.cpp
#include <oa/runtime/engine.h>int main(int argc, char** argv) {auto rt = OaEngine::Create({.AppName = "MyApp"}).Unwrap();// Global context is set — tensors and modules dispatch to GPU.// Configure shader search paths (debug only):rt.AddShaderSearchPath("spirv");RunApp(rt, argc, argv);rt.Destroy();}
What It Owns
| Component | Purpose |
|---|---|
OaDevice | Physical + logical GPU device, SAM detection |
OaAllocator | GPU memory allocator |
OaPipelineRegistry | Compute pipeline cache, shared_mutex protected |
OaBindlessHeap | 64K-slot global descriptor set |
| Stream pool | Persistent async compute streams (free-list stack) |
Compute Streams
Each stream owns a persistent command pool, command buffer, and timeline semaphore. The engine manages a pool of streams via AcquireStream / ReleaseStream.
Streams.cpp
// Batched dispatchauto* stream = rt.AcquireStream();stream->Begin();stream->Record(rt, "rmsnorm", bufs1, &push1, sizeof(push1), groups1);stream->Record(rt, "silu", bufs2, &push2, sizeof(push2), groups2);stream->SubmitAndWait(rt);rt.ReleaseStream(stream);
Synchronization
Timeline semaphores for all stream sync. Automatic pipeline barriers with minimal insertion (only where read-after-write hazards exist). Queue submissions serialized via std::mutex.