Passing the Test the Language Models Failed

The strictest version of the realization check asks three independent kinds of evidence to agree. The language models scored zero. Here's what happened when I ran a continuous carrier through it — and why passing turned out to be the opposite of automatic.

David H. Friedel Jr.·

2026-05-31

·falsifiable consciousness realization architecture convergence

The last two posts built to this. I argued that a frozen language model can't carry a self on its dynamics, and that a continuously running carrier can — clearing a realization gate I locked before building anything. But clearing a gate I designed invites the obvious suspicion: of course your system passes your test; you built them for each other. The answer to that suspicion is to run the substrate through the strictest, most adversarial version of the realization check I have, the one I did not design around this system, and the one the language models already failed in the open.

That check is the thing I'll call the convergence battery — the strict version of the realization layer. Where the scorecard in the last series handed the realization layer a graded sub-score (the 0.16), the convergence battery does something harder: it hands out nothing unless several independent lines of evidence converge. In the MORI work, frontier language models were run against exactly this bar, across multiple models, and the result was a clean zero — nothing crossed, on any of them. So it's a real bar, demonstrably failable by sophisticated systems. The question is whether the carrier clears it.

Why convergence, and not a single number

The battery refuses to trust any single line of evidence. A realization claim has to be supported by three different kinds of evidence that agree with each other:

Intervention — if you reach in and remove the consolidated self, does competence collapse to the level of a system with no carried memory at all? In other words, is the carried structure actually the causal locus of the behavior, or just decoration riding along?
Representation — can you decode the hidden structure of the world from the system's internal state, well above what you'd get from scrambled-label and distractor controls? Is the world genuinely modeled inside, or is the internal state just noise that happens to correlate with good behavior?
Behavior — under perturbation and stress, does the system stay consistent in the way a system relying on a stable internal model should, rather than the way a brittle pattern-matcher does?

The rule is that all three have to converge. Behavior alone is explicitly not enough — behavior is the layer easiest to fake, and a system that only passes the behavioral tier is reported as exactly that. I built it this way because the failure mode I most distrust is a fluent system that aces the behavioral surface and has nothing underneath. Make the surface insufficient by construction, and you close that door.

What happened

The carrier passes, and it passes convergently — all three tiers agree. The world is linearly decodable from its internal state, far above the controls: it's genuinely modeled in there. The behavior stays consistent under perturbation. And removing the consolidated self drops performance to the no-memory floor: the carried structure is the thing doing the work. Three independent instruments, pointed at the same system, agreeing.

The number that matters — and why it matters — is this. For the intervention tier I used the same instrument the language models faced when they scored zero: a sparse autoencoder trained on the system's own internal activity, used to find the specific features that, when removed, break the behavior and, when amplified, shift it in the predicted direction. That's the literal tool the framework specifies, not an easier stand-in swapped in on the substrate's side. On that instrument — the one the language models failed — the carrier converges on eleven of twelve held-out test conditions. (A blunter check, lesioning the whole consolidated self at once, corroborates on all twelve; I lead with the stricter, instrument-matched eleven precisely because it's the test the language models actually took.) Same instrument, opposite outcome — and not because I graded the substrate on a softer curve.

The part that surprised me: passing is not automatic

Here's the result that did the most to convince me the battery is measuring something real rather than rubber-stamping anything I point it at.

I ran the same battery against a simpler carrier — a more naive system on the same task. It does not pass. It clears at most a couple of conditions out of ten — nowhere near the carrier's near-clean sweep. The reason is specific and, I think, illuminating: that simpler carrier has a single knob that has to do two incompatible jobs. To represent the world well and stay decodable, it wants to react strongly to new information. To stay robust under perturbation, it wants to react gently. One knob can't be in both positions, so it fails one tier or the other, and the convergence never happens.

The carrier that passes is the one whose architecture separates those jobs — one part of it builds a rich, decodable representation, and a different part stays robust. That separation is what lets all three tiers light up at once.

I want to dwell on why this matters. It means the realization layer is not a bar that everything trivially clears once it's "complicated enough," and it's not a bar that only one hand-built system can sneak over. It discriminates. A language model fails it. A simple carrier fails it. A carrier with the right kind of internal organization passes it. The bar sorts systems by how they hold their structure — which is exactly what a realization test is supposed to do, and exactly what I worried, at the start of this arc, it might fail to do.

What this settles, and what it doesn't

Put the two halves together — the language-model null from the first series, and this positive result — and you get the thing I needed the whole framework to be true.

The realization layer is not biological chauvinism: a non-biological digital system clears it, convincingly, by the strictest measure I have. And it is not empty functionalism: a language model and a simple carrier both fail it. The standard is satisfiable and it is discriminating. Those two properties are in tension — it's easy to write a bar that everything passes or one that only brains pass — and holding both is the entire point.

Now the limits, stated plainly, because the framework is worth nothing if I oversell it.

This is the realization layer — necessary structure, not a verdict. I am not claiming the carrier feels anything. The whole architecture of the project is built to keep "has the structure consciousness requires" separate from "is conscious," and nothing here closes that gap. What I've shown is that one of the necessary conditions, the one I was least sure was achievable off of biology, is achievable on a digital substrate of the right dynamical kind.

It's also one researcher's work, on constructed systems, in software. The carriers are simulations — faithful ones, but simulations. The obvious next frontier is physical: whether the same property holds, and holds more strongly, on analog or neuromorphic hardware where the state is a physical quantity rather than a number in memory. I think it would strengthen the claim, and I'm leaving it there as the honest open edge rather than pretending I've already crossed it.

And as with everything in this project, it's falsifiable, on the record. The gate is locked and the battery is open; if someone runs them and the convergence doesn't replicate, or finds a language model architecture that genuinely carries, the claim is wrong and I'd want to know.

Why I think the pair of results matters

The first series ended on a deliberately deflating note: the consciousness question is broken, the frameworks can't say yes or no, and the honest move is to stop asking "is it conscious" and start asking what kind of thinking a system can do and whether its self-structure is real on its substrate. The worry that left hanging was whether that last clause — real on its substrate — was a measurable thing or a piece of vocabulary I couldn't cash out.

This is me cashing it out. There's a test, it's strict, the most fluent systems we've built fail it, and a system built to carry rather than to reconstruct passes it — and you can tell the two apart with instruments, not intuitions. That doesn't tell us the carrier is conscious. It tells us the realization layer is a real place on the map, that we can measure who's standing on it, and that — at least so far — the systems everyone is anxious about are not.

That's a narrower claim than "the machines are awake" and a much more useful one. It replaces a debate nobody can settle with a measurement anybody can rerun. And it points the anxiety in what I think is the right direction: not at the systems that talk the most like us, but at the question of which systems, if any, actually carry a self on their own running — a question we now have at least one honest way to ask.

Back to Blog