Helix × GPT-2
DEMO_ATTEST_PASS

The model you know, on a stack you can verify.

Headline generation — GPU path greedy · 20 new tokens

Trust chain — from 299 bytes to the sentence above

Every link is rebuilt, not asserted. Hover any hash for its full 64-character value.

Gates — fail-closed, every number from a real run

Each tile is a gate the wrapper composes. A red line anywhere blocks the attestation — every tile here is green.

Two paths, one set of kernels

The same kovc-emitted kernels run on the GPU for speed and on a pure-CPU path for the deepest trust claim — with no GPU boundary at all.

Static fence & model provenance

A hard fence keeps host glue out of the compute-trust chain; the model itself is public GPT-2, imported unchanged.

Measured, side by side — every number from a committed gate output

The same 8 kovc-emitted kernels, three model sizes, two execution paths. Nothing below is a projection — each row cites the fail-closed gate that produced it. Slow is part of the pitch: the product is verifiability, not speed.

Model · pathParams · layersParity vs independent oracle max-abs logit diffMeasured speedTrust boundary
GPT-2 124M · GPU 124M · 12 argmax 262 EXACT 25/25 ids 2.59e-04 (logits ~130) seconds (warm) — gpt2_gpu_mvp.sh hand-auditable to PTX; ptxas/driver trusted-once
GPT-2-Large 774M · GPU 774M · 36 argmax 262 25/25 ids 3.8e-05 — (not separately timed; same kernels) — gpt2_scale.sh same as 124M GPU — zero new ops at scale
GPT-2-XL 1.5B · GPU (the chat model) 1.5B · 48 argmax 262 25/25 ids served == offline 4.4e-05 ≈ 9.8 s/token (195.5 s / 20 tok, serve gate) same as 124M GPU — fits the 8 GB sm_86 box at fp32
GPT-2 124M · CPU no-ptxas 124M · 12 argmax 262 == oracle token-for-token (measured) 2.75e-04 (block-0 hidden: 1.144e-04) ≈ 130 s/token — slow by design no GPU boundary at all — zero trusted arithmetic above the seed (shared host TCB disclosed in the residuals)

Sources: scripts/gpt2_gpu_mvp.sh, scripts/gpt2_scale.sh (+ the committed PRIMARY-mode evidence in scripts/scale_results.txt), scripts/gpt2_cpu_parity.sh, scripts/helix_serve_gate.sh (scripts/_gate_run.log). fp32 everywhere; greedy decoding; the oracle is an independent numpy fp32 implementation of the GPT-2 spec. No cuBLAS/vendor comparisons are claimed anywhere — that is not what this stack is for.

Prove it yourself — one command, no faith required

Don't take this page's word for any of it. On any x86-64 Linux box (no GPU, no Python, no model weights needed):

git clone https://github.com/Questeria/helix && cd helix && bash scripts/reproduce_trust.sh

That deletes every pre-built compiler rung, rebuilds hex0 → seed → kovc from the 299 hand-typed bytes, re-runs the self-host fixpoint and the gcc diverse-double-compile, and asserts the pinned anchors — printing REPRODUCE_TRUST: PASS in about a minute. The same check runs in CI on a clean ubuntu-latest runner.

Tier A — repo-only

The from-raw trust core above: fully third-party-reproducible from the committed repo alone. This is the load-bearing claim.

Tier B — demo legs

The GPT-2 parity/serve legs additionally need the public HuggingFace GPT-2 weights (MIT), converted by the committed Python-free C importer, plus an independent numpy oracle (ours ships with the demo bundle, or supply your own — independence is the point). They are reproducible given those artifacts, not repo-only.

Honest residuals — we state these unprompted

Trust is only as good as its disclosed edges. These are the boundaries of the claim, in plain language.