from raw binary · trust chain closed · v1.3-release

Nine rungs from 299 bytes to a compiler.

No pre-built compiler is trusted anywhere in the chain. Each rung is built only by the rung before it, starting from 299 hand-authored bytes you can verify by reading. The chain was declared closed on 2026-06-07, after a live joint reproduction — with every residual disclosed alongside the claim.

The ladder

Every rung built only by the rung before it.

From hex characters typed by hand to kovc, the Helix compiler written in Helix. Nothing on this ladder was ever compiled by a compiler you didn't watch get built.

rungs 9 trust root 299 hand-typed bytes fixpoint K2 == K3 == K4 sha 0992dddd… gcc-DDC K1 sha 84363adb… reproduction ~1 min · CPU-only
Rung Size Role Built by
hex0 299 B Hex characters → bytes. The only thing you trust by reading — 299 hand-authored bytes at stage0/hex0/hex0.bin, sha cc1d1741… your own eyes
hex1 622 B Adds single-character labels hex0
hex2 1,519 B Long labels and linking hex1
catm 299 B File concatenation hex2
M0 1,684 B Macro assembler catm + hex2
cc_amd64 17,976 B Minimal C compiler M0
M2-Planet 200,561 B Full C compiler — the last vendored rung cc_amd64
seed 62,467 B Original C-subset bootstrap compiler — seed.c, sha256 9837db12… M2-Planet
kovc 698,392 B The Helix compiler, written in Helix — helixc/bootstrap/{lexer,parser,kovc}.hx seed

About the committed binaries

The rung binaries committed in the repo are reference copies only — the reproduction script deletes every one of them and rebuilds the entire ladder from raw before checking a single hash. The vendored rungs (through M2-Planet) come from upstream bootstrap projects and keep their own upstream licenses.

The evidence

Four claims, separately verified.

Each claim stands on its own check, with its own pinned hashes — none of them depends on believing the others.

No Python in the toolchain

Exactly one committed .py file exists in the repo — verification/oracle/oracle_train.py, a fenced numpy audit oracle that is never on the compile or run path. The compiler and runtime are Helix plus a small hand-authored C subset.

How the chain was declared closed

The trust chain was declared closed on 2026-06-07, at tag v1.3-release, by the project owner after a live joint reproduction. The declaration rests on: the committed one-command reproduction running green in CI; four whole-repo read-only review passes by a reviewer of a different model lineage (ChatGPT), with findings remediated; a context-isolated fresh auditor that independently rebuilt from a clean clone and re-derived every hash; and five earlier context-isolated adversarial reproductions.

Reproduce it yourself

One committed command. About a minute.

On a clean checkout, CPU-only, no GPU, no model weights, no oracle. Fail-closed: it exits nonzero if anything mismatches.

clean checkout · tier A · trust core
git clone https://github.com/Questeria/helix && cd helix
bash scripts/reproduce_trust.sh
~1 minute · CPU-only · fail-closed asserts: seed 9837db12 · fixpoint 0992dddd · DDC K1 84363adb

What that one command does: it deletes every pre-built rung binary in the checkout, rebuilds the entire ladder from the raw sources, runs the self-host fixpoint and the gcc diverse double-compile, asserts the pinned anchors, and exits nonzero on any mismatch. It runs green in CI on a clean GitHub ubuntu-latest runner (.github/workflows/trust-reproduce.yml) on every push — push-button for any third party with a fork.

Beyond the trust core, every kovc build is held to the universal gate, scripts/gate_kovc.sh: the self-host fixpoint, a 109-program feature corpus, 4 negative-diagnostic checks, and a PTX byte-diff.

Further reading: CLEAN_REPRODUCTION.md · QUICKSTART.md · reproduce_trust.sh itself.

Isn't this just “Reflections on Trusting Trust” again?

It's the answer to it, scoped honestly. Wheeler's diverse double-compile uses two compilers of independent lineage: if gcc (zero M2-Planet ancestry) and the from-raw seed produce a byte-identical K1, a self-reproducing backdoor would have to live in both independent lineages at once. The DDC covers the seed→K1 rung — that scope is stated, not hidden.

Why are there binaries in the repo at all?

Convenience only. They are reference copies — the reproduction script deletes them before doing anything else and rebuilds from raw. Nothing in the verification path ever executes a committed binary it didn't just rebuild and hash-match.

Who has checked this besides the author?

A different-model-lineage reviewer (ChatGPT) made four whole-repo read-only review passes, with findings remediated; a context-isolated fresh auditor rebuilt from a clean clone and re-derived every hash; five earlier context-isolated adversarial reproductions preceded that; and the public CI reruns the whole thing on every push. Reproduction by a party with no connection to the author is the one outstanding increment — now push-button via the public CI and a fork.

If gcc is involved, doesn't the chain trust gcc?

No. gcc is only an auditor in the diverse double-compile — a second, independent lineage used to cross-check one rung. The shipped root is the from-raw ladder; nothing that ships was built by gcc.

The honest boundary

What remains trusted, stated plainly.

These residuals are stated so the claim is precise, not inflated. A closed trust chain has edges; here are all of them.

  1. The shared TCB. Host OS and kernel, filesystem, shell and coreutils, gcc/libc/binutils/loader, CPU and microcode, RAM — and the audited seed.c source — remain trusted. A diverse double-compile says nothing about layers both compilers share. seed.c is auditable line-by-line, but it is trusted-by-reading, and we say so.
  2. Complete to PTX, not SASS. The CPU path is all-the-way-down from raw binary. The GPU path is hex0→PTX, then trusts NVIDIA's closed ptxas — the one trusted-once boundary — plus the C host launcher. Porting the launcher to Helix would move that boundary, not close it.
  3. GPU performance is ~50–67% of cuBLAS, not parity. The end-to-end capstone speedup is 7.0–8.7×, Amdahl-bound. Helix emits correct, reasonably-performant kernels; it does not beat NVIDIA's hand-tuned library, and never claimed to.
  4. The broader v1.1 language surface is checked behaviorally, not byte-identically. Generics, traits, closures, turbofish, wide-field, bf16 are cross-checked by a zero-lineage interpreter — byte-identical comparison is impossible by construction there — and that witness is kept out-of-tree, so it is not clean-checkout reproducible. The byte-identical, hash-pinned DDC covers the seed→K1 rung.
  5. Single hardware target. sm_86, fp32. No other targets or precisions are claimed.
  6. One increment outstanding. Reproduction by a party with no connection to the author — now push-button via the public CI and a fork.
From 299 bytes to GPT-2

See what runs on it.

The same chain that closes here emits the GPU kernels that ran GPT-2 and SmolLM2 — verified token-for-token against an independent reference.