Helix — the verifiable execution layer
Watch a 1.5-billion-parameter model think on a stack you can rebuild from 299 bytes.
GPT-2-XL is a 2019 base completion model. It continues text; it is not tuned to chat, follow instructions, or be factual. The live thing being proven here is the verified compute underneath — every layer and kernel comes from the from-raw Helix toolchain.
1.5B params · 48 layers
8 kovc-emitted kernels
fp32 · forward-only · greedy
live XL ≈ 10 s/token — intentionally slow; the pitch is trust, not speed
Conversation — text completion
GPT-2-XL · fp32 · greedy
Give GPT-2 some text to continue
Pick a seed below or type your own. GPT-2-XL will continue it token-by-token while the
48 transformer layers and kovc kernels light up on the right. It is a base model — expect
continuations, not answers.
Conversation = repeated completion with carried context.
Each turn re-sends the conversation so far as one completion prompt — the model itself is
stateless between requests: a 2019 base completion model, not an assistant. The live
server caps the prompt at ~320 tokens (
--max-ctx); when the carried text would
blow that budget, the oldest text is cut first and the page says so.
Enter ↵ to run · Shift+Enter for a newline
GPU busy — one generation at a time (single-flight; the server keeps no queue, so the page just waits politely and retries).
Honest residuals:
fp32-only ·
complete-to-PTX-not-SASS ·
single GPU (sm_86) ·
base-model-not-assistant ·
oracle-shares-spec ·
never-claimed-AGI.
This is a demonstration of verifiable execution — not a claim of model quality, speed records,
or full-GPU verification. No live parity verdict appears in this chat.