Stability

LTI injection and the spectral radius ρ(A) < 1.

Ignoring the nonlinear transformer term, the recurrent hidden state is a discrete linear time-invariant system h_{t+1} = A·h_t + B·e. Its stability is governed entirely by the spectral radius ρ(A): when ρ < 1 the state contracts onto a fixed point, when ρ ≥ 1 it diverges and training explodes. OpenMythos sidesteps the failure mode entirely — Parcae parameterizes A so that ρ(A) < 1 by construction.

Spectral radius ρ(A)
The single number that decides stability.
1.001.20.3679ρ(A)Stable by construction · ρ(A) < 1
Injection parameterscomputed, not measured
Drag the learned parameters — A_disc never leaves (0, 1), so ρ(A) never reaches 1.
0.00
-44
0.00
-44
Δt = exp(log_dt)1.0000
Aᶜ = −exp(log_A)-1.0000
A_disc0.3679∈ (0, 1) ✓
Adisc=exp ⁣(exp(logΔt+logA))A_{\text{disc}} = \exp\!\big(-\exp(\log\Delta t + \log A)\big)
The combined log-space form — exactly LTIInjection.get_A().
Ac=exp(logA)A_c = -\exp(\log A)
Adisc=exp(ΔtAc)A_{\text{disc}} = \exp(\Delta t \cdot A_c)

Because Ac<0A_c < 0 always, the ZOH map exp(ΔtAc)\exp(\Delta t \cdot A_c) lands strictly in (0,1)(0, 1) for every choice of the learned parameters.

Hidden-state trajectory
h_t over loops for the current constrained A — a decaying curve onto its fixed point.

a = A_disc = 0.3679, b = 0.1 (model B init), e = 1. Converges to the fixed point because ρ(A) < 1.

Constrained vs unconstrained
Push the unconstrained a past 1 and watch the state explode — the failure Parcae prevents.
UNSTABLE — ρ(A) ≥ 1
unconstrained a1.05
0 (stable)1.0 (boundary)1.6 (explodes)
Constrained (Parcae) → convergesUnconstrained → diverges (clipped)

Every divergent training run learns ρ(A) ≥ 1; Parcae makes ρ(A) < 1 impossible to violate.

ρ(A) across all log_A
The constraint holds everywhere — the curve never touches ρ = 1.

For log_dt = 0.00, ρ(A) stays inside (0, 1) for every log_A. Current point: ρ = 0.3679.

Verify it locally
Cross-check the gauge against a real model init.
Confirm the gauge matches a real init:
A = model.recurrent.injection.get_A()
rho = torch.linalg.eigvals(torch.diag(A)).abs().max()  # < 1

The value shown on this page is computed in TypeScript from the same formula as LTIInjection.get_A() — it is not measured from a trained checkpoint, but it will match one exactly because the constraint is structural.

open_mythos/main.py · lines 684-742 · LTIInjection

Stable input injection — ρ(A) < 1 by construction

python
class LTIInjection(nn.Module):
    """
    Stable input injection for the recurrent update rule (Parcae, Prairie et al., 2026).

    The recurrent hidden state evolves as:
        h_{t+1} = A · h_t  +  B · e  +  Transformer(h_t, e)

    where e is the encoded input injected at every loop step to prevent drift.
    Without constraints, A can develop spectral radius ≥ 1, causing the hidden
    state to explode across loop iterations and destabilize training.

    This class guarantees ρ(A) < 1 by construction via a ZOH discretization:
        A_continuous = Diag(-exp(log_A))       always negative diagonal
        A_discrete   = exp(Δt · A_continuous)  element-wise, values in (0, 1)

    where log_A and log_dt are learned parameters and exp ensures positivity.
    This makes looped model training robust to hyperparameter choices and stable
    even at high learning rates.
    """

    def __init__(self, dim: int):
        """
        Args:
            dim -- hidden state dimension; one scalar per channel for A and B
        """
        super().__init__()
        self.log_A = nn.Parameter(torch.zeros(dim))  # log of A_continuous magnitude
        self.log_dt = nn.Parameter(torch.zeros(1))  # log of discretization step Δt
        self.B = nn.Parameter(torch.ones(dim) * 0.1)

    def get_A(self) -> torch.Tensor:
        """
        Compute the discretized diagonal state matrix A_discrete.

        Returns:
            1-D tensor of shape (dim,) with all values strictly in (0, 1),
            guaranteeing ρ(A) < 1 regardless of learned parameter values.
        """
        # Compute in log space to avoid 0 * inf = NaN when log_dt → -∞, log_A → +∞.
        # dt * A_c = -exp(log_dt) * exp(log_A) = -exp(log_dt + log_A)
        # Clamp keeps the product finite in float32 for any gradient step size.
        return torch.exp(-torch.exp((self.log_dt + self.log_A).clamp(-20, 20)))

    def forward(
        self, h: torch.Tensor, e: torch.Tensor, transformer_out: torch.Tensor
    ) -> torch.Tensor:
        """
        Compute h_{t+1} = A·h_t + B·e + transformer_out.

        Args:
            h               -- current hidden state (B, T, dim)
            e               -- encoded input from Prelude, frozen across loops (B, T, dim)
            transformer_out -- output of the recurrent TransformerBlock at this step (B, T, dim)

        Returns:
            Updated hidden state of shape (B, T, dim)
        """
        A = self.get_A()
        return A * h + self.B * e + transformer_out