Nine rungs from 299 bytes to a compiler.
No pre-built compiler is trusted anywhere in the chain. Each rung is built only by the rung before it, starting from 299 hand-authored bytes you can verify by reading. The chain was declared closed on 2026-06-07, after a live joint reproduction — with every residual disclosed alongside the claim.
Every rung built only by the rung before it.
From hex characters typed by hand to kovc, the Helix compiler written in Helix. Nothing on this ladder was ever compiled by a compiler you didn't watch get built.
| Rung | Size | Role | Built by |
|---|---|---|---|
| hex0 | 299 B | Hex characters → bytes. The only thing you trust by reading — 299 hand-authored bytes at stage0/hex0/hex0.bin, sha cc1d1741… | your own eyes |
| hex1 | 622 B | Adds single-character labels | hex0 |
| hex2 | 1,519 B | Long labels and linking | hex1 |
| catm | 299 B | File concatenation | hex2 |
| M0 | 1,684 B | Macro assembler | catm + hex2 |
| cc_amd64 | 17,976 B | Minimal C compiler | M0 |
| M2-Planet | 200,561 B | Full C compiler — the last vendored rung | cc_amd64 |
| seed | 62,467 B | Original C-subset bootstrap compiler — seed.c, sha256 9837db12… | M2-Planet |
| kovc | 698,392 B | The Helix compiler, written in Helix — helixc/bootstrap/{lexer,parser,kovc}.hx | seed |
About the committed binaries
The rung binaries committed in the repo are reference copies only — the reproduction script deletes every one of them and rebuilds the entire ladder from raw before checking a single hash. The vendored rungs (through M2-Planet) come from upstream bootstrap projects and keep their own upstream licenses.
Four claims, separately verified.
Each claim stands on its own check, with its own pinned hashes — none of them depends on believing the others.
A hand-typed root becomes a compiler
Nine rungs, each built only by the prior rung, every artifact's SHA pinned — from hex0 (sha cc1d1741…) through the audited seed.c (sha256 9837db12…) up to kovc. No pre-built compiler is trusted anywhere.
K2 == K3 == K4, byte-identical
seed compiles K1, K1 compiles K2, and onward: K2, K3 and K4 are byte-identical — sha 0992dddd…. This is the same test a self-hosted C compiler passes when stage2 equals stage3, and it is asserted on every reproduction.
Wheeler diverse double-compile
gcc — an independent lineage with zero M2-Planet ancestry — and the from-raw seed both compile k1src.hx to a byte-identical K1: sha 84363adb…. gcc is only an auditor, never the shipped root. Scope, stated plainly: the DDC covers the seed→K1 rung.
The transformer capstone
A 2-layer transformer trains end-to-end on kovc-emitted GPU kernels (RTX 3070), converging to within 0.0009% of an independent numpy oracle — the bar was 2% — with a sampled finite-difference gradient check and a load-bearing negative control. The oracle computes its own loss curve from the shared initial weights; it never reads Helix's trajectory as input.
No Python in the toolchain
Exactly one committed .py file exists in the repo — verification/oracle/oracle_train.py, a fenced numpy audit oracle that is never on the compile or run path. The compiler and runtime are Helix plus a small hand-authored C subset.
How the chain was declared closed
The trust chain was declared closed on 2026-06-07, at tag v1.3-release, by the project owner after a live joint reproduction. The declaration rests on: the committed one-command reproduction running green in CI; four whole-repo read-only review passes by a reviewer of a different model lineage (ChatGPT), with findings remediated; a context-isolated fresh auditor that independently rebuilt from a clean clone and re-derived every hash; and five earlier context-isolated adversarial reproductions.
One committed command. About a minute.
On a clean checkout, CPU-only, no GPU, no model weights, no oracle. Fail-closed: it exits nonzero if anything mismatches.
git clone https://github.com/Questeria/helix && cd helix bash scripts/reproduce_trust.sh
What that one command does: it deletes every pre-built rung binary in the checkout, rebuilds the entire ladder from the raw sources, runs the self-host fixpoint and the gcc diverse double-compile, asserts the pinned anchors, and exits nonzero on any mismatch. It runs green in CI on a clean GitHub ubuntu-latest runner (.github/workflows/trust-reproduce.yml) on every push — push-button for any third party with a fork.
Beyond the trust core, every kovc build is held to the universal gate, scripts/gate_kovc.sh: the self-host fixpoint, a 109-program feature corpus, 4 negative-diagnostic checks, and a PTX byte-diff.
Further reading: CLEAN_REPRODUCTION.md · QUICKSTART.md · reproduce_trust.sh itself.
Isn't this just “Reflections on Trusting Trust” again?
It's the answer to it, scoped honestly. Wheeler's diverse double-compile uses two compilers of independent lineage: if gcc (zero M2-Planet ancestry) and the from-raw seed produce a byte-identical K1, a self-reproducing backdoor would have to live in both independent lineages at once. The DDC covers the seed→K1 rung — that scope is stated, not hidden.
Why are there binaries in the repo at all?
Convenience only. They are reference copies — the reproduction script deletes them before doing anything else and rebuilds from raw. Nothing in the verification path ever executes a committed binary it didn't just rebuild and hash-match.
Who has checked this besides the author?
A different-model-lineage reviewer (ChatGPT) made four whole-repo read-only review passes, with findings remediated; a context-isolated fresh auditor rebuilt from a clean clone and re-derived every hash; five earlier context-isolated adversarial reproductions preceded that; and the public CI reruns the whole thing on every push. Reproduction by a party with no connection to the author is the one outstanding increment — now push-button via the public CI and a fork.
If gcc is involved, doesn't the chain trust gcc?
No. gcc is only an auditor in the diverse double-compile — a second, independent lineage used to cross-check one rung. The shipped root is the from-raw ladder; nothing that ships was built by gcc.
What remains trusted, stated plainly.
These residuals are stated so the claim is precise, not inflated. A closed trust chain has edges; here are all of them.
- The shared TCB. Host OS and kernel, filesystem, shell and coreutils, gcc/libc/binutils/loader, CPU and microcode, RAM — and the audited
seed.csource — remain trusted. A diverse double-compile says nothing about layers both compilers share.seed.cis auditable line-by-line, but it is trusted-by-reading, and we say so. - Complete to PTX, not SASS. The CPU path is all-the-way-down from raw binary. The GPU path is hex0→PTX, then trusts NVIDIA's closed ptxas — the one trusted-once boundary — plus the C host launcher. Porting the launcher to Helix would move that boundary, not close it.
- GPU performance is ~50–67% of cuBLAS, not parity. The end-to-end capstone speedup is 7.0–8.7×, Amdahl-bound. Helix emits correct, reasonably-performant kernels; it does not beat NVIDIA's hand-tuned library, and never claimed to.
- The broader v1.1 language surface is checked behaviorally, not byte-identically. Generics, traits, closures, turbofish, wide-field, bf16 are cross-checked by a zero-lineage interpreter — byte-identical comparison is impossible by construction there — and that witness is kept out-of-tree, so it is not clean-checkout reproducible. The byte-identical, hash-pinned DDC covers the seed→K1 rung.
- Single hardware target. sm_86, fp32. No other targets or precisions are claimed.
- One increment outstanding. Reproduction by a party with no connection to the author — now push-button via the public CI and a fork.
See what runs on it.
The same chain that closes here emits the GPU kernels that ran GPT-2 and SmolLM2 — verified token-for-token against an independent reference.