Helix The verifiable execution layer

The model you know, on a stack you can verify from the first byte.

Every AI stack asks for blind trust in compilers built by compilers nobody audits. Helix is a self-hosting compiler rebuildable from 299 hand-typed bytes that runs real, unchanged GPT-2 (124M → 1.5B) inference through its own emitted GPU kernels — output matched token-for-token against an independent oracle, byte-reproducible, with a signed attestation. The product is verified execution — deliberately not speed.

The trust chain — rebuilt, not asserted

299 B hand-typed hex0 seed 9837db12… kovc fixpoint 0992dddd… (K2=K3=K4) 8 compiler-emitted GPU kernels GPT-2's verified output

Corroborated by gcc diverse-double-compile (84363adb…) — an unrelated compiler lineage reproduces a key rung byte-for-byte: the canonical defense against the Ken Thompson "trusting trust" attack.

Measured results (committed, fail-closed gates)

LegResult
GPT-2 124M · GPUargmax exact · 25/25 ids vs oracle · max-abs 2.59e-04
774M + 1.5B · GPUsame 8 kernels, zero new ops · 25/25 ids · 3.8e-05 / 4.4e-05
Live XL chatserved == offline oracle 25/25 · ≈ 9.8 s/token (by design)
124M · CPU, no ptxaszero trusted arithmetic above the seed · token-for-token · ~130 s/token
Reproducibilitybyte-identical re-runs · one-command from-raw rebuild (~1 min, CI-corroborated)

What exists today

  • From-raw toolchain (hex0→seed→kovc), self-host fixpoint + gcc-DDC, reproducible by anyone in ~1 min.
  • Python-free production path: C tokenizer + C weight importer, bit-exact gated.
  • Live chat demo: C HTTP+SSE server streams real per-layer/per-kernel telemetry from the XL forward; website-ready replay of a gated captured run.
  • Signed attestation binding the anchors, the model hash and the output of each green run.
  • Next (authored, pre-gate): the 3 kernels that extend the same verified stack to a modern Apache-2.0 Llama-arch model (RMSNorm · RoPE · SwiGLU; GQA is host wiring).

The honesty that sells it

Disclosed residuals, unprompted: verified to PTX, not SASS (ptxas/driver trusted-once — the CPU path removes even that); fp32-only; single RTX-3070-class GPU; the oracle shares GPT-2's spec, not its code; a 2019 base model demonstration, not frontier capability — and never claimed otherwise. Intentionally slow: ≈10 s/token at 1.5B. Trust, not speed.

Check it yourself (no GPU, no Python, no weights)

git clone https://github.com/Questeria/helix && cd helix && bash scripts/reproduce_trust.sh # → REPRODUCE_TRUST: PASS (~1 minute, CPU-only; same check runs green in CI)
github.com/Questeria/helix · demo: index.html (chat) · dashboard.html (proof) All figures from committed gate logs · 2026-06