One verified stack. Pick your model, pick your trust boundary.
Every path below runs through the same from-raw Helix toolchain — a compiler rebuildable
from 299 hand-typed bytes, emitting the kernels that execute the model. What differs is the
model size, the hardware path, and exactly where the trusted-once boundary sits. Every
number on this page is transcribed from a committed fail-closed gate output
(docs/HELIX_GPT2_DEMO_RUNBOOK.md + the gate logs it cites);
where a number does not exist yet, the cell says not yet measured — never an estimate.
The paths, one by one
gpt2_gpu_mvp.sh · gpt2_demo_attest.shconfig.json), and still matches the
oracle token-for-token.gpt2_scale.sh (evidence: scripts/scale_results.txt)gpt2_scale.sh · helix_serve_gate.sh (scripts/_gate_run.log)gpt2_cpu_parity.shscripts/llama_model_gate.sh (G-L1/G-L2).docs/HELIX_LLAMA_PLAN.mdMeasured, side by side
Same columns as the proof page's table, extended with the in-progress row. Every green pill cites a committed fail-closed gate; amber means the gates are still running.
| Model · path | Params · layers | Parity vs independent oracle | max-abs logit diff | Measured speed | Trusted-once boundary |
|---|---|---|---|---|---|
| GPT-2 124M · GPU | 124M · 12 | argmax 262 EXACT 25/25 ids | 2.59e-04 (logits ~130) | seconds (warm) — gpt2_gpu_mvp.sh |
hand-auditable to PTX; ptxas/driver trusted-once |
| GPT-2-Large 774M · GPU | 774M · 36 | argmax 262 25/25 ids | 3.8e-05 | not separately timed (same kernels) — gpt2_scale.sh |
same as 124M GPU — zero new ops at scale |
| GPT-2-XL 1.5B · GPU (the chat model) | 1.5B · 48 | argmax 262 25/25 ids served == offline | 4.4e-05 | ≈ 9.8 s/token (195.5 s / 20 tok, serve gate) | same as 124M GPU — fits the 8 GB sm_86 box at fp32 |
| GPT-2 124M · CPU no-ptxas | 124M · 12 | argmax 262 == oracle token-for-token (measured) | 2.75e-04 (block-0 hidden: 1.144e-04) | ≈ 130 s/token — slow by design | no GPU boundary at all — zero trusted arithmetic above the seed (shared host TCB disclosed) |
| SmolLM2-135M · GPU Llama-arch, NEW | 135M · 30 (per config) | token-for-token 25/25; argmax-exact, max-abs 4.9e-05 / 49,152 | not yet measured | 3.2e-05 (layer-0 parity) | same as the GPU paths; verified to PTX, then ptxas/driver trusted-once |
Sources: scripts/gpt2_gpu_mvp.sh, scripts/gpt2_scale.sh
(+ committed PRIMARY-mode evidence in scripts/scale_results.txt),
scripts/gpt2_cpu_parity.sh, scripts/helix_serve_gate.sh
(scripts/_gate_run.log), scripts/llama_ops_parity.sh (SmolLM2 G-L0;
full-model gates in progress — docs/HELIX_LLAMA_PLAN.md). fp32 everywhere; greedy
decoding; the oracle is an independent numpy fp32 implementation of each model's spec. No
cuBLAS/vendor comparisons are claimed anywhere — that is not what this stack is for.
"Trusted once" — what each boundary actually means
Verification is only honest if you can say exactly where it stops. These are the stops, in plain language (the full residuals list lives on the proof page).
GPU paths TO PTX
Everything from the 299 hand-typed bytes up to the PTX text of the kernels is
rebuildable and hand-auditable (hex0 → seed → kovc → PTX). Below PTX,
NVIDIA's closed ptxas assembler, the GPU driver, and the C CUDA-FFI launcher
are trusted-once: audited at the interface, not rebuilt from source. That boundary
is stated on every page — "complete to PTX, not to SASS."
CPU path NO VENDOR BOUNDARY
The no-ptxas path removes the GPU vendor entirely: every arithmetic operation in the forward pass runs in code compiled by the from-raw toolchain. Zero trusted arithmetic above the seed. The price is honesty's favorite number on this site: ≈ 130 s/token.
Both paths SHARED HOST TCB
Below the audited seed, the usual platform is still trusted: OS / kernel / gcc (for the bootstrap harness) / libc / binutils / loader / CPU + microcode / RAM. Disclosed, not hidden — the claim has always been about the compute stack above the seed. The independent oracle also shares each model's spec (not its code): it catches implementation bugs, not a shared misunderstanding of the architecture.