The model you know, on a stack you can verify.
Trust chain — from 299 bytes to the sentence above
Every link is rebuilt, not asserted. Hover any hash for its full 64-character value.
Gates — fail-closed, every number from a real run
Each tile is a gate the wrapper composes. A red line anywhere blocks the attestation — every tile here is green.
Two paths, one set of kernels
The same kovc-emitted kernels run on the GPU for speed and on a pure-CPU path for the deepest trust claim — with no GPU boundary at all.
Static fence & model provenance
A hard fence keeps host glue out of the compute-trust chain; the model itself is public GPT-2, imported unchanged.
Measured, side by side — every number from a committed gate output
The same 8 kovc-emitted kernels, three model sizes, two execution paths. Nothing below is a projection — each row cites the fail-closed gate that produced it. Slow is part of the pitch: the product is verifiability, not speed.
| Model · path | Params · layers | Parity vs independent oracle | max-abs logit diff | Measured speed | Trust boundary |
|---|---|---|---|---|---|
| GPT-2 124M · GPU | 124M · 12 | argmax 262 EXACT 25/25 ids | 2.59e-04 (logits ~130) | seconds (warm) — gpt2_gpu_mvp.sh |
hand-auditable to PTX; ptxas/driver trusted-once |
| GPT-2-Large 774M · GPU | 774M · 36 | argmax 262 25/25 ids | 3.8e-05 | — (not separately timed; same kernels) — gpt2_scale.sh |
same as 124M GPU — zero new ops at scale |
| GPT-2-XL 1.5B · GPU (the chat model) | 1.5B · 48 | argmax 262 25/25 ids served == offline | 4.4e-05 | ≈ 9.8 s/token (195.5 s / 20 tok, serve gate) | same as 124M GPU — fits the 8 GB sm_86 box at fp32 |
| GPT-2 124M · CPU no-ptxas | 124M · 12 | argmax 262 == oracle token-for-token (measured) | 2.75e-04 (block-0 hidden: 1.144e-04) | ≈ 130 s/token — slow by design | no GPU boundary at all — zero trusted arithmetic above the seed (shared host TCB disclosed in the residuals) |
Sources: scripts/gpt2_gpu_mvp.sh, scripts/gpt2_scale.sh
(+ the committed PRIMARY-mode evidence in scripts/scale_results.txt),
scripts/gpt2_cpu_parity.sh, scripts/helix_serve_gate.sh
(scripts/_gate_run.log). fp32 everywhere; greedy decoding; the oracle is an
independent numpy fp32 implementation of the GPT-2 spec. No cuBLAS/vendor comparisons are
claimed anywhere — that is not what this stack is for.
Prove it yourself — one command, no faith required
Don't take this page's word for any of it. On any x86-64 Linux box (no GPU, no Python, no model weights needed):
git clone https://github.com/Questeria/helix && cd helix && bash scripts/reproduce_trust.sh
That deletes every pre-built compiler rung, rebuilds hex0 → seed → kovc from the
299 hand-typed bytes, re-runs the self-host fixpoint and the gcc diverse-double-compile, and
asserts the pinned anchors — printing REPRODUCE_TRUST: PASS in about a minute. The
same check runs in CI on a clean ubuntu-latest runner.
Tier A — repo-only
The from-raw trust core above: fully third-party-reproducible from the committed repo alone. This is the load-bearing claim.
Tier B — demo legs
The GPT-2 parity/serve legs additionally need the public HuggingFace GPT-2 weights (MIT), converted by the committed Python-free C importer, plus an independent numpy oracle (ours ships with the demo bundle, or supply your own — independence is the point). They are reproducible given those artifacts, not repo-only.
Honest residuals — we state these unprompted
Trust is only as good as its disclosed edges. These are the boundaries of the claim, in plain language.