References
Papers, threads, and citations behind OpenMythos.
Papers
The research the architecture draws on, grouped by theme.
Core — looped & recurrent-depth
The recurrent-depth reasoning lineage that OpenMythos reconstructs: looping a shared block to trade compute for effective depth.
Attention
Compressed KV caches and grouped queries that keep the recurrent decode loop affordable.
Mixture-of-Experts
Fine-grained expert segmentation and shared-expert isolation behind the routed FFN.
Foundations
The building blocks — adaptive computation, normalization, and positional encoding.
Threads & discussion
Community analysis and debate on X about looped transformers.
Why Claude Mythos is so good — looped transformer theory
Sigrid Jin
LT implicit reasoning over parametric knowledge unlocks generalization
Yuekun Yao
Looped transformer cyclic trajectories and input injection
rosinality
Parcae scaling laws for stable looped language models — thread
Hayden Prairie
RoPE-like loop index embedding idea
davidad
On the Looped Transformers Controversy
ChrisHayduk
On the Looped Transformers Controversy — Summary
Sigrid Jin
Source map
Every architectural concept on this site maps to a real span of the OpenMythos implementation.
Full forward pass: Prelude → Recurrent → Coda
open_mythos/main.py · lines 992-1034 · OpenMythos.forward
The recurrent loop with ACT early exit
open_mythos/main.py · lines 825-891 · RecurrentBlock.forward
Stable input injection — ρ(A) < 1 by construction
open_mythos/main.py · lines 684-742 · LTIInjection
Per-position halting probability head
open_mythos/main.py · lines 750-780 · ACTHalting
Sinusoidal loop-index signal over recurrence depth
open_mythos/main.py · lines 541-570 · loop_index_embedding
Depth-wise LoRA: per-loop low-rank delta
open_mythos/main.py · lines 578-624 · LoRAAdapter
Fine-grained MoE: routed + shared experts
open_mythos/main.py · lines 456-533 · MoEFFN
Multi-Latent Attention with compressed KV cache
open_mythos/main.py · lines 284-418 · MLAttention
Grouped Query Attention
open_mythos/main.py · lines 177-276 · GQAttention
Pre-norm block: attention + (MoE | dense) FFN
open_mythos/main.py · lines 627-676 · TransformerBlock
SwiGLU feed-forward expert
open_mythos/main.py · lines 426-453 · Expert
Full configuration surface
open_mythos/main.py · lines 17-81 · MythosConfig
Pre-configured scales 1B → 1T
open_mythos/variants.py · lines 1-199 · variants
Depth-aware unified attention (experimental)
open_mythos/moda.py · lines 671-821 · MoDAAttention
DeepSeek-style MoE with shared experts (experimental)
open_mythos/moda.py · lines 452-630 · DeepSeekMoE
Citation
If you reference OpenMythos, please cite it as follows.