OpenMythos

Visualizer

An interactive guide to the Recurrent-Depth Transformer — a theoretical reconstruction of the hypothesized Claude Mythos architecture. Same weights, more loops → deeper reasoning.

Embedding

vocab → dim

↓

Prelude

dense blocks · ×N

↓

Recurrent

MoE block · looped ×T

↓

Coda

dense blocks · ×M

↓

LM Head

→ logits

Prelude runs once, Recurrent loops with shared weights, Coda runs once. The recurrent block is the unique core.

h_t+1 = A·h_t+ B·e + Transformer(h_t, e)

The recurrent update rule, applied once per loop. Everything on this site is computed client-side from the model’s formulas.

Explore the architecture See the recurrent loop

Why it is not a vanilla transformer

Looped recurrence

One transformer block run many times — depth comes from loops, not from more parameters.

Input injection

The encoded input e is re-injected every loop, keeping the original signal alive at any depth.

Adaptive compute

ACT halting lets easy tokens stop early while hard tokens keep reasoning — in the same batch.

Stable by construction

An LTI-constrained update guarantees the spectral radius ρ(A) < 1, so training never explodes.

Explore

Architecture

The full Prelude to Coda pipeline.

Recurrent Loop

Input injection, ACT halting, depth-wise LoRA.

Attention

MLA vs GQA and KV-cache tradeoffs.

MoE

Per-token, per-loop expert routing.

Stability

LTI injection and spectral radius bounds.

Depth

More loops at inference, and overthinking.

Variants

Model scales from 1B to 1T parameters.

MoDA

Experimental depth-aware attention and MoE.

References

Papers, threads, and citations.