Architecture

OaAdapter — Model Interface

Two routes to build models. One interface to run them all. From PyTorch-style prototyping to zero-overhead production dispatch, the adapter bridges any architecture to any application.

Two Routes, One Stack

Every model in Realm can be built two ways. Both produce the same .oam checkpoint, run on the same GPU compute engine, and plug into the same train/chat/serve applications.

Aspect	Easy Route (Level 1)	Production Route (Level 2)
API	`OaTensor` + `nn.Module` + autograd	`OaModule::Dispatch` + batched command buffers
Style	PyTorch / Keras	Manual GPU programming
Dispatch	One GPU submit per op	~30 ops in one submit
Memory	Automatic (autograd tape)	All buffers allocated in Init, reused
Gradients	Automatic (`loss.Backward()`)	Manual backward shaders
Best for	Prototyping, research, small models	Production training, inference serving
Example	`TinyLlm`	`OaLlm`, `OaGpt`

The Easy Route

Define an OaModule with layers from nn.h. Forward returns a tensor. Autograd handles the rest. If you know PyTorch, you know this.

Easy.cpp

// Level 1 — The Easy Route
// Define layers, forward, backward, step. Just like PyTorch.

class TinyLlm : public OaModule {
public:
	TinyLlm(OaI32 D, OaI32 DFF, OaI32 NL) {
		Embed_ = OaMakeShared<OaEmbedding>(256, D);
		for (OaI32 i = 0; i < NL; ++i) {
			Norm_.push_back(OaMakeShared<OaRMSNorm>(D));
			Up_.push_back(OaMakeShared<OaLinear>(D, DFF));
			Down_.push_back(OaMakeShared<OaLinear>(DFF, D));
		}
		Head_ = OaMakeShared<OaLinear>(D, 256);
	}

	OaTensor Forward(const OaTensor& x) override {
		auto h = Embed_->Forward(x);
		for (OaI32 i = 0; i < (OaI32)Norm_.size(); ++i) {
			auto r = Norm_[i]->Forward(h);
			r = Up_[i]->Forward(r).SiLU();
			h = h + Down_[i]->Forward(r);
		}
		return Head_->Forward(h);
	}
};

// Training loop — 6 lines.
auto rt = OaEngine::Create({.AppName = "Train"}).Unwrap();
TinyLlm model(64, 256, 2);
OaAdamW optimizer(model.Parameters(), 3e-4f);

for (int step = 0; step < 1000; ++step) {
	auto loss = OaCrossEntropyLoss(model.Forward(input), target);
	loss.Backward();
	optimizer.Step();
	optimizer.ZeroGrad();
}

The Production Route

For maximum throughput: allocate all GPU buffers once in Init(), batch all dispatches into a single command buffer, submit once per step. No per-op overhead. No autograd tape. Manual backward shaders.

Production.cpp

// Level 2 — The Production Route
// Manual GPU dispatch. Buffers allocated once, reused every step.
// ~30 dispatches batched into a single command buffer submission.

struct OaLlm : public OaModule {
	OaStatus Init(OaEngine& rt, const OaLlmConfig& cfg) {
		LoadShaders(rt, shaderTable);  // embedded SPIR-V
		for (auto& layer : Layers_)
			layer.Init(rt, cfg);       // Alloc all buffers
		InitWeights();                 // Fill via MappedPtr
		return OaStatus::Ok();
	}

	OaStatus Forward(OaEngine& rt) {
		BeginBatch(rt);
		Dispatch(rt, "byte_embed", ...);
		for (auto& layer : Layers_) {
			Dispatch(rt, "rmsnorm", ...);
			Dispatch(rt, "matmul", ...);
			Dispatch(rt, "silu", ...);
			// ... ~30 dispatches total
		}
		return FlushDispatches(rt);  // one submit, one wait
	}

	OaStatus Backward(OaEngine& rt) { /* mirror of Forward */ }
	OaStatus Step(OaEngine& rt, OaI64 step, OaF32 lr) { /* AdamW */ }
};

The Adapter

Both routes produce models that implement OaAdapterLm — a unified interface for train, generate, save, and load. The application layer never touches model internals. A factory creates the right adapter from an architecture name.

Adapter.h

// OaAdapter — The Bridge
// Unified interface between app (train/chat) and any model architecture.

struct OaAdapter {
	virtual OaStatus Init(OaEngine& rt) = 0;
	virtual void Destroy() = 0;
	virtual OaStringView GetArchitecture() const = 0;
};

struct OaAdapterLm : OaAdapter {
	virtual OaStatus Train(OaEngine& rt, const OaU8* data, ...) = 0;
	virtual OaStatus Generate(OaEngine& rt, ...) = 0;
	virtual OaStatus Save(const OaString& path) = 0;
	virtual OaStatus Load(OaEngine& rt, const OaString& path) = 0;
};

// Factory — architecture selection by name.
OaUnique<OaAdapter> OaCreateAdapter(OaStringView arch);
// Returns OaAdapterLlm for "OaLlm", OaAdapterGpt for "OaGpt", etc.

Plug and Play

The train and chat binaries are architecture-agnostic. Pass --arch OaLlm or --arch OaGpt and the factory wires everything up. Same binary, any model.

Train.cpp

// The app doesn't care which model is underneath.
// Same train binary runs any architecture.

struct TrainApp : OaVkApp {
	OaUnique<OaAdapter> Adapter;
	OaAdapterLm* Model = nullptr;

	OaStatus Init() override {
		Adapter = OaCreateAdapter(archName);
		Model = static_cast<OaAdapterLm*>(Adapter.get());
		return Model->Init(Rt);
	}

	OaStatus Tick() override {
		auto status = Model->Train(Rt, batch, ...);
		if (step % 500 == 0)
			Model->Save(checkpointPath);
		return status;
	}
};

// One binary. Any architecture. Plug and play.
// ./train --arch OaLlm -d data.txt --d-model 384 --layers 6
// ./train --arch OaGpt -d data.txt --d-model 768 --layers 12

Data Flow

Layer	Component	What it does
Application	`TrainApp`, `ChatApp`	CLI, data loading, main loop
Adapter	`OaAdapterLm`	Train / Generate / Save / Load interface
Model	`OaModule`	Forward / Backward / Step (Level 1 or 2)
Engine	`OaEngine`	GPU dispatch, pipeline cache, memory