Architectures

OaGptOss — Transformer

GPT-2 style transformer on Vulkan GPU compute. Direct nanoGPT comparison target. Byte-level vocabulary (256). Level 1 autograd dispatch.

Architecture

Standard GPT-2 transformer with pre-norm (LayerNorm before attention and FFN):

TokenEmbedding(256, D) + PositionalEmbedding(SeqLen, D)
→ N × [ LayerNorm → MultiHeadAttention → +res
         LayerNorm → Linear(D→4D) → GELU → Linear(4D→D) → +res ]
→ LayerNorm → Linear(D, 256) → CrossEntropy

Model Sizes

SizeDHeadsLayersParamsVRAM (FP32)
atom19266~9.5M~150 MB
small7681212~85M~1.3 GB
medium10241624~303M~4.6 GB

nanoGPT Comparison

OaGptOss is a direct nanoGPT-equivalent on Vulkan GPU compute. Same architecture (GPT-2), same training procedure (AdamW + cosine LR), byte-level vocabulary. All parameters are OaDeviceMatrix — no separate tensor type.

Training

Level 1 autograd dispatch. Parameters are OaDeviceMatrix instances withSetRequiresGrad(true). OaFnGrad::Backward propagates from the scalar cross-entropy loss. OaAdamW updates via GPU compute shaders.

Inference

Autoregressive byte-by-byte generation with temperature sampling. Context window of SeqLen tokens with automatic truncation.