The model you know, on a stack you can verify from the first byte.
Real, unchanged GPT-2-XL (1.5B) generating text on a compiler you can rebuild from 299 hand-typed bytes — its output matched token-for-token against an independent oracle, byte-reproducible, with every per-layer kernel launch streamed to your browser as it happens. SmolLM2-135M (modern Llama architecture) is being added on the same stack — its fail-closed gates are running now, and it appears here only when they are green.
Two honest modes — the page always tells you which one you're in
Live WHEN THE GPU BOX IS UP
A C HTTP+SSE worker streams the real forward pass from an RTX 3070 — every
layer_begin, every kernel launch, every token, as it
happens, at the real ≈ 10 s/token pace. Liveness is health-gated: the chat page
claims LIVE only after /api/health reports a ready
worker, and the GPU runs one generation at a time (you may briefly wait your turn —
the page says so plainly).
Captured replay ALWAYS AVAILABLE
No GPU online? You still see something real: a replay of an actual gated GPT-2-XL run — its ids, text and totals are verbatim from the fail-closed serve gate (served output == oracle, token-for-token 25/25). The replay is labeled on every surface, its animation is disclosed as time-compressed, and it never pretends to be live.
The default is replay. The public page never claims a live run it cannot back; when the backend comes up, the chat page offers an explicit "Switch to LIVE" — it never silently upgrades. (A third state, an amber MOCK layout preview, exists for development and is unmistakably labeled as simulated.)
Prove it yourself — one command, no faith required
Don't take this site's word for any of it. On any x86-64 Linux box — no GPU, no Python, no model weights:
git clone https://github.com/Questeria/helix && cd helix && bash scripts/reproduce_trust.sh
Deletes every pre-built compiler rung, rebuilds hex0 → seed → kovc from
the 299 hand-typed bytes, re-runs the self-host fixpoint and the gcc diverse-double-compile,
and asserts the pinned anchors — printing REPRODUCE_TRUST: PASS in about a minute.
The same check runs green in CI on a clean ubuntu-latest runner.
(The GPT-2 demo legs additionally need the public HuggingFace weights + an independent
oracle — see the proof page for the two reproducibility tiers.)
What this is — and what it is not
This is
- A verifiable execution layer: real public models, unchanged, running on a compiler stack rebuildable from 299 hand-typed bytes — with the rebuild one command away.
- Fail-closed and deterministic: greedy decoding, byte-identical re-runs, every claim behind a gate that refuses to fake a green.
- Independently cross-checked: output matched token-for-token against an independent numpy oracle (25/25 at 124M, 774M and 1.5B).
- Honest about its edges: the residuals are stated unprompted on every page — verified to PTX (not SASS) on the GPU path, fp32-only, single GPU, oracle shares the spec.
This is not
- Not an assistant. GPT-2-XL is a 2019 base completion model: it continues text. It doesn't follow instructions, chat, or aim for factual accuracy — and this site never dresses it up as if it did.
- Not fast. Live XL runs at ≈ 10 s/token measured (the CPU purest-trust path: ≈ 130 s/token). Deliberate: the product is trust, not speed — no speed claims are made anywhere.
- Not open-source. The code is source-available, non-commercial under the
Helix Non-Commercial License — free to use, study, modify and share for any
non-commercial purpose; commercial use requires a separate license. Read the
LICENSE
(the
LICENSEfile at the repo root). - Not a frontier-capability claim. No "fully verified GPU", no benchmark-beating, no AGI — never claimed, anywhere.