High Performance Compute
The Oa Library
A hand-crafted C++ compute library. GPU acceleration, machine learning primitives, post-quantum cryptography, and networking in a single static binary. No vendor SDK. No framework. No dependencies.
5.1x
GPU speedup vs std
5+
GPU vendors
0
Dependencies
1
Binary
Three Lines to GPU Compute
Every binary starts by creating OaEngine. It owns the GPU device, memory allocator, pipeline registry, and stream pool. Nothing else needed.
Main.cpp
#include <oa/runtime/engine.h>int main() {auto rt = OaEngine::Create({.AppName = "MyApp"}).Unwrap();// GPU compute is ready — tensors, kernels, everything.rt.Destroy();}
Four Pillars
Memory
Inline assembly optimizations. Zero-copy streaming buffers. No framework overhead, no dispatch layer — your data hits the hardware directly. AVX2 memcpy that outperforms std::memcpy by 14.87x on large buffers.
Compute Kernels
Slang compute kernels compile once and run on every GPU vendor — NVIDIA, AMD, Intel, Qualcomm. Desktop, server, mobile. One binary, everywhere. All kernels use a bindless descriptor model with a single global heap.
Cryptography
Post-quantum signatures accelerated on GPU. Dilithium-3 (ML-DSA-65) batch verification at 1.26M ops/sec. SHAKE-256 hashing, Merkle trees, KMAC-256 — all at hardware speed.
Integration
Machine learning, cryptography, and general compute share the same substrate. One library. One binary. The ML training loop and the blockchain validator use the same OaEngine.
Cross-Vendor
Native GPU compute is the only path. No CUDA. No ROCm. No vendor lock-in. The same kernel binary executes on every GPU accelerator — which is all of them.
| Vendor | Status |
|---|---|
| NVIDIA (discrete) | Validated |
| Intel (integrated) | Validated |
| AMD (discrete + integrated) | Supported |
| Qualcomm (mobile) | Supported |
| Apple (via Metal translation) | Supported |
| lavapipe (CI software) | Validated |