Machine Learning

Train and deploy through one Vulkan compute runtime.

OA is a GPU-first C++ and Python machine-learning stack built around OaMatrix, graph-recorded operators, automatic differentiation, reusable modules, and portable model artifacts.

RNN · GRU · TransformerDense and sparse MoEMamba-3 · Empyrealm

Machine Learning

One execution model

Operators record into the current compute context. Modules compose those operators,OaGradientTape builds the backward pass, and the runtime submits the resulting work to Vulkan. The same public model is used from C++ and the oa.ml Python package.

Autograd

Forward operations register their gradient rules; no separate handwritten training graph is required.

Modules

Linear, embedding, recurrent, attention, Transformer, MoE, VQ, and optimizer components share one parameter contract.

Artifacts

.oam files persist architecture metadata, weights, optimizer state, and checkpoint identity.

Measured training

Training logs distinguish wall time, GPU time, validation loss, throughput, and task-specific quality metrics.

Core API

Type or namespace	Role
`OaMatrix`	Device-backed tensor shared by operators, modules, gradients, Vision, and Audio.
`OaModule`	Parameter and submodule container with recursive serialization.
`OaFnMatrix::*`	Stateless matrix, activation, loss, normalization, quantization, and sequence operations.
`OaGradientTape`	Reverse-mode automatic differentiation over the recorded forward graph.
`OaAdamW`	Optimizer over the recursive parameter list returned by a module.
`OaItTraining`	Training metrics, validation, checkpoints, throughput, and GPU timing.

One module, two front ends

The language switch changes the example without changing OA's execution model.

Linear.cpp

OaLinear layer(32, 64);
OaMatrix input = OaFnMatrix::RandN(OaMatrixShape{8, 32});
OaMatrix output = layer.Forward(input);

OaContext::GetDefault().Execute();
OaContext::GetDefault().Sync();

Verified reference workload

The native NLP suite covers Byte, BPE, and character tokenization across RNN, GRU, Transformer, sparse MoE Transformer, and experimental SSM backbones. Every run trains, evaluates, generates text, and verifies checkpoint round-trip behavior. The current Intel Iris Xe reference uses FP32; precision routes remain capability-gated rather than assumed.

Applications

Text to motion · USD

OaAlm

Generate prompt-conditioned motion tokens and decode them into a canonical USD skeleton clip through one native OA model bundle.

Explore OaAlm

Android · Physical Vulkan

OA Mobile Lab

Train the same five OA NLP architectures on a physical Adreno GPU, inspect live GPU metrics, and verify generation plus checkpoint replay from one Android application.

Explore Mobile Lab