from raw binary · trust chain closed · v1.3-release

Nine rungs from 299 bytes to a compiler.

No pre-built compiler is trusted anywhere in the chain. Each rung is built only by the rung before it, starting from 299 hand-authored bytes you can verify by reading. The chain was declared closed on 2026-06-07, after a live joint reproduction — with every residual disclosed alongside the claim.

Reproduce it (1 min, CPU-only) → Read the full record

The ladder

Every rung built only by the rung before it.

From hex characters typed by hand to kovc, the Helix compiler written in Helix. Nothing on this ladder was ever compiled by a compiler you didn't watch get built.

rungs 9 trust root 299 hand-typed bytes fixpoint K2 == K3 == K4 sha 0992dddd… gcc-DDC K1 sha 84363adb… reproduction ~1 min · CPU-only

Rung	Size	Role	Built by
hex0	299 B	Hex characters → bytes. The only thing you trust by reading — 299 hand-authored bytes at stage0/hex0/hex0.bin, sha cc1d1741…	your own eyes
hex1	622 B	Adds single-character labels	hex0
hex2	1,519 B	Long labels and linking	hex1
catm	299 B	File concatenation	hex2
M0	1,684 B	Macro assembler	catm + hex2
cc_amd64	17,976 B	Minimal C compiler	M0
M2-Planet	200,561 B	Full C compiler — the last vendored rung	cc_amd64
seed	62,467 B	Original C-subset bootstrap compiler — seed.c, sha256 9837db12…	M2-Planet
kovc	698,392 B	The Helix compiler, written in Helix — helixc/bootstrap/{lexer,parser,kovc}.hx	seed

About the committed binaries

The rung binaries committed in the repo are reference copies only — the reproduction script deletes every one of them and rebuilds the entire ladder from raw before checking a single hash. The vendored rungs (through M2-Planet) come from upstream bootstrap projects and keep their own upstream licenses.

The evidence

Four claims, separately verified.

Each claim stands on its own check, with its own pinned hashes — none of them depends on believing the others.

Claim I · provenance

A hand-typed root becomes a compiler

Nine rungs, each built only by the prior rung, every artifact's SHA pinned — from hex0 (sha cc1d1741…) through the audited seed.c (sha256 9837db12…) up to kovc. No pre-built compiler is trusted anywhere.

Claim II · self-host fixpoint

K2 == K3 == K4, byte-identical

seed compiles K1, K1 compiles K2, and onward: K2, K3 and K4 are byte-identical — sha 0992dddd…. This is the same test a self-hosted C compiler passes when stage2 equals stage3, and it is asserted on every reproduction.

Claim III · trusting-trust defense

Wheeler diverse double-compile

gcc — an independent lineage with zero M2-Planet ancestry — and the from-raw seed both compile k1src.hx to a byte-identical K1: sha 84363adb…. gcc is only an auditor, never the shipped root. Scope, stated plainly: the DDC covers the seed→K1 rung.

Claim IV · real capability

The transformer capstone

A 2-layer transformer trains end-to-end on kovc-emitted GPU kernels (RTX 3070), converging to within 0.0009% of an independent numpy oracle — the bar was 2% — with a sampled finite-difference gradient check and a load-bearing negative control. The oracle computes its own loss curve from the shared initial weights; it never reads Helix's trajectory as input.

No Python in the toolchain

Exactly one committed .py file exists in the repo — verification/oracle/oracle_train.py, a fenced numpy audit oracle that is never on the compile or run path. The compiler and runtime are Helix plus a small hand-authored C subset.

How the chain was declared closed

The trust chain was declared closed on 2026-06-07, at tag v1.3-release, by the project owner after a live joint reproduction. The declaration rests on: the committed one-command reproduction running green in CI; four whole-repo read-only review passes by a reviewer of a different model lineage (ChatGPT), with findings remediated; a context-isolated fresh auditor that independently rebuilt from a clean clone and re-derived every hash; and five earlier context-isolated adversarial reproductions.

Reproduce it yourself

One committed command. About a minute.

On a clean checkout, CPU-only, no GPU, no model weights, no oracle. Fail-closed: it exits nonzero if anything mismatches.

Tier A · trust core · ~1 min · CPU-only

The step-by-step walkthrough

One committed command on a clean checkout deletes the pre-built rungs, rebuilds the entire ladder from raw, runs the self-host fixpoint and the gcc diverse-double-compile, and asserts the pinned anchors — fail-closed, and green in CI on a clean ubuntu-latest runner on every push. The full guide, the expected output, and the current anchors are on the Reproduce page.

Reproduce →

Beyond the trust core, every kovc build is held to the universal gate, scripts/gate_kovc.sh: the self-host fixpoint, a 109-program feature corpus, 4 negative-diagnostic checks, and a PTX byte-diff.

Further reading: CLEAN_REPRODUCTION.md · QUICKSTART.md · reproduce_trust.sh itself.

Isn't this just “Reflections on Trusting Trust” again?

It's the answer to it, scoped honestly. Wheeler's diverse double-compile uses two compilers of independent lineage: if gcc (zero M2-Planet ancestry) and the from-raw seed produce a byte-identical K1, a self-reproducing backdoor would have to live in both independent lineages at once. The DDC covers the seed→K1 rung — that scope is stated, not hidden.

Why are there binaries in the repo at all?

Convenience only. They are reference copies — the reproduction script deletes them before doing anything else and rebuilds from raw. Nothing in the verification path ever executes a committed binary it didn't just rebuild and hash-match.

Who has checked this besides the author?

A different-model-lineage reviewer (ChatGPT) made four whole-repo read-only review passes, with findings remediated; a context-isolated fresh auditor rebuilt from a clean clone and re-derived every hash; five earlier context-isolated adversarial reproductions preceded that; and the public CI reruns the whole thing on every push. Reproduction by a party with no connection to the author is the one outstanding increment — now push-button via the public CI and a fork.

If gcc is involved, doesn't the chain trust gcc?

No. gcc is only an auditor in the diverse double-compile — a second, independent lineage used to cross-check one rung. The shipped root is the from-raw ladder; nothing that ships was built by gcc.

The honest boundary

What remains trusted, stated plainly.

These residuals are stated so the claim is precise, not inflated. A closed trust chain has edges; here are all of them.

The shared TCB. Host OS and kernel, filesystem, shell and coreutils, gcc/libc/binutils/loader, CPU and microcode, RAM — and the audited seed.c source — remain trusted. A diverse double-compile says nothing about layers both compilers share. seed.c is auditable line-by-line, but it is trusted-by-reading, and we say so.
Complete to PTX — and, as of v1.5, to SASS for one kernel. The CPU path is all-the-way-down from raw binary. The GPU path is hex0→PTX, then trusts NVIDIA's closed ptxas — the one trusted-once boundary — plus the C host launcher. As of v1.5 that boundary is broken for a first named kernel (vector_add, sm_86): a from-scratch translation-validation independently decodes, interprets, and proves ptxas's emitted SASS computes the spec for all inputs, removing ptxas from the trusted base for that kernel. It is a per-compilation witness — ptxas still runs and the GPU still executes the SASS — covering one kernel so far; extending it across kernels is ongoing. Porting the C host launcher to Helix would move the remaining boundary, not close it.
GPU performance is ~50–67% of cuBLAS, not parity. The end-to-end capstone speedup is 7.0–8.7×, Amdahl-bound. Helix emits correct, reasonably-performant kernels; it does not beat NVIDIA's hand-tuned library, and never claimed to.
The broader v1.1 language surface is checked behaviorally, not byte-identically. Generics, traits, closures, turbofish, wide-field, bf16 are cross-checked by a zero-lineage interpreter — byte-identical comparison is impossible by construction there — and that witness is kept out-of-tree, so it is not clean-checkout reproducible. The byte-identical, hash-pinned DDC covers the seed→K1 rung.
Single hardware target. sm_86, fp32. No other targets or precisions are claimed.
One increment outstanding. Reproduction by a party with no connection to the author — now push-button via the public CI and a fork.

From 299 bytes to GPT-2

See what runs on it.

The same chain that closes here emits the GPU kernels that ran GPT-2 and SmolLM2 — verified token-for-token against an independent reference.

Verifiable execution → Browse the repo