High Performance Compute

The Oa Library

A hand-crafted C++ compute library. GPU acceleration, machine learning primitives, post-quantum cryptography, and networking in a single static binary. No vendor SDK. No framework. No dependencies.

5.1x

GPU speedup vs std

5+

GPU vendors

0

Dependencies

1

Binary

Three Lines to GPU Compute

Every binary starts by creating OaEngine. It owns the GPU device, memory allocator, pipeline registry, and stream pool. Nothing else needed.

Main.cpp

#include <oa/runtime/engine.h>
int main() {
auto rt = OaEngine::Create({.AppName = "MyApp"}).Unwrap();
// GPU compute is ready — tensors, kernels, everything.
rt.Destroy();
}

Four Pillars

Memory

Inline assembly optimizations. Zero-copy streaming buffers. No framework overhead, no dispatch layer — your data hits the hardware directly. AVX2 memcpy that outperforms std::memcpy by 14.87x on large buffers.

Compute Kernels

Slang compute kernels compile once and run on every GPU vendor — NVIDIA, AMD, Intel, Qualcomm. Desktop, server, mobile. One binary, everywhere. All kernels use a bindless descriptor model with a single global heap.

Cryptography

Post-quantum signatures accelerated on GPU. Dilithium-3 (ML-DSA-65) batch verification at 1.26M ops/sec. SHAKE-256 hashing, Merkle trees, KMAC-256 — all at hardware speed.

Integration

Machine learning, cryptography, and general compute share the same substrate. One library. One binary. The ML training loop and the blockchain validator use the same OaEngine.

Cross-Vendor

Native GPU compute is the only path. No CUDA. No ROCm. No vendor lock-in. The same kernel binary executes on every GPU accelerator — which is all of them.

VendorStatus
NVIDIA (discrete)Validated
Intel (integrated)Validated
AMD (discrete + integrated)Supported
Qualcomm (mobile)Supported
Apple (via Metal translation)Supported
lavapipe (CI software)Validated