Helix THE VERIFIABLE EXECUTION LAYER
github.com/Questeria/helix
Real models · from-raw compiler · fail-closed proof

The model you know, on a stack you can verify from the first byte.

Real, unchanged GPT-2-XL (1.5B) generating text on a compiler you can rebuild from 299 hand-typed bytes — its output matched token-for-token against an independent oracle, byte-reproducible, with every per-layer kernel launch streamed to your browser as it happens. SmolLM2-135M (modern Llama architecture) is being added on the same stack — its fail-closed gates are running now, and it appears here only when they are green.

299 B → full compiler, rebuilt in ~1 min token-for-token vs independent oracle · 25/25 1.5B params on one 8 GB GPU · fp32 live XL ≈ 10 s/token — by design: trust, not speed

Two honest modes — the page always tells you which one you're in

Live WHEN THE GPU BOX IS UP

A C HTTP+SSE worker streams the real forward pass from an RTX 3070 — every layer_begin, every kernel launch, every token, as it happens, at the real ≈ 10 s/token pace. Liveness is health-gated: the chat page claims LIVE only after /api/health reports a ready worker, and the GPU runs one generation at a time (you may briefly wait your turn — the page says so plainly).

Captured replay ALWAYS AVAILABLE

No GPU online? You still see something real: a replay of an actual gated GPT-2-XL run — its ids, text and totals are verbatim from the fail-closed serve gate (served output == oracle, token-for-token 25/25). The replay is labeled on every surface, its animation is disclosed as time-compressed, and it never pretends to be live.

The default is replay. The public page never claims a live run it cannot back; when the backend comes up, the chat page offers an explicit "Switch to LIVE" — it never silently upgrades. (A third state, an amber MOCK layout preview, exists for development and is unmistakably labeled as simulated.)

Prove it yourself — one command, no faith required

Don't take this site's word for any of it. On any x86-64 Linux box — no GPU, no Python, no model weights:

git clone https://github.com/Questeria/helix && cd helix && bash scripts/reproduce_trust.sh

Deletes every pre-built compiler rung, rebuilds hex0 → seed → kovc from the 299 hand-typed bytes, re-runs the self-host fixpoint and the gcc diverse-double-compile, and asserts the pinned anchors — printing REPRODUCE_TRUST: PASS in about a minute. The same check runs green in CI on a clean ubuntu-latest runner. (The GPT-2 demo legs additionally need the public HuggingFace weights + an independent oracle — see the proof page for the two reproducibility tiers.)

What this is — and what it is not

This is

  • A verifiable execution layer: real public models, unchanged, running on a compiler stack rebuildable from 299 hand-typed bytes — with the rebuild one command away.
  • Fail-closed and deterministic: greedy decoding, byte-identical re-runs, every claim behind a gate that refuses to fake a green.
  • Independently cross-checked: output matched token-for-token against an independent numpy oracle (25/25 at 124M, 774M and 1.5B).
  • Honest about its edges: the residuals are stated unprompted on every page — verified to PTX (not SASS) on the GPU path, fp32-only, single GPU, oracle shares the spec.

This is not

  • Not an assistant. GPT-2-XL is a 2019 base completion model: it continues text. It doesn't follow instructions, chat, or aim for factual accuracy — and this site never dresses it up as if it did.
  • Not fast. Live XL runs at ≈ 10 s/token measured (the CPU purest-trust path: ≈ 130 s/token). Deliberate: the product is trust, not speed — no speed claims are made anywhere.
  • Not open-source. The code is source-available, non-commercial under the Helix Non-Commercial License — free to use, study, modify and share for any non-commercial purpose; commercial use requires a separate license. Read the LICENSE (the LICENSE file at the repo root).
  • Not a frontier-capability claim. No "fully verified GPU", no benchmark-beating, no AGI — never claimed, anywhere.