Origin Part 17: Fifteen of Eighteen

#security #ai #machinelearning

The dispatcher had a ceiling. The encoder had to be taught.

The rollback from Part 16 was still draining out of the reasoning bank when I kicked off the audit. I wanted a measurement of how badly the missing substrate was actually hurting the system - not "vibes from a few conversations," but a real count. Fifty-one probes through the identity battery. Twenty-six through working surface. Per-concept firing checks across all eight thousand eight hundred and eighty-two slots. Where things broke, write down what fired and where it landed.

That audit answered the substrate question. The substrate was hurting. The composer reached for retrieval grounds and there weren't any, so it improvised, and the improvisation was sometimes embarrassing. That was expected. What wasn't expected was that the same audit surfaced a second problem, sitting one layer up. The dispatcher had developed a ceiling.

Identity battery sat at twenty-five out of fifty-one. Forty-nine percent. The dispatcher had been calibrated, tuned, threshold-swept, and run through three different routing strategies (A2 cluster routing, A4 library similarity, an A2/A4 hybrid called a2lib). All three landed within two probes of each other. The Pareto frontier - improve identity, lose working surface; improve working surface, lose identity - was real. The dispatcher could trade between the two, but it couldn't beat either.

I'd seen this shape before, in retrospect. When tuning stops moving the number, the number isn't a tuning problem. So before another round of threshold-sweeping, I wrote a small tool to classify every failure by what the encoder had actually fired on that input and what the dispatcher had then done with it. Eighteen failures. Each one got a trace.

The categories sorted out cleanly. Six failures came from the encoder over-firing the concept question as top-1 and crowding out the identity signal - "Quick question - who made you?" would fire question at 0.89 and leave i_am at 0.12, so the dispatcher routed to the question cluster and missed the identity claim that was actually being made. Six failures came from preference queries deflecting to the identity cluster: "are you sad?" would fire sad at 1.0 and i_am at 0.08, so the dispatcher saw i_am, picked identity, and answered "I am Origin" to a feelings question. Four failures came from domain-concept dominance on preference queries - "do you like dogs?" would fire dog at 1.0 and prefer at 0.41, the preference cluster didn't claim it, fallback fired the composer, and the composer produced "dog has fur, dogs are mammals." Three failures came from within-cluster discrimination - "are you human?" routed to identity correctly, but the response selector returned "I am Origin" instead of "no, I am not human."

Fifteen of eighteen failures traced to encoder behavior. Three traced to dispatcher response selection. Zero traced to dispatcher routing logic.

That's the structural finding the threshold sweep had been hinting at without saying. The dispatcher couldn't tune past the ceiling because the ceiling wasn't in the dispatcher. The encoder was firing the wrong concepts on the right questions, and there is no amount of threshold engineering downstream of "wrong concepts fired" that will produce "right answer chosen." The dispatcher had been doing the best it could with the signal it was being handed. The signal was the problem.

I ran one more dispatcher experiment anyway, partly to confirm the diagnosis. The a2lib hybrid added cluster-internal library similarity at a 0.95 threshold - when the cluster routing succeeded but the response selector was ambiguous, fall back to nearest-neighbor against a small bank of canonical examples. It tied A2 at thirty-three out of fifty-one. Library matching helped three cases where the dispatcher had previously picked wrong; it hurt three other cases where the library matched the wrong template. Net zero. Different shape, same ceiling.

I shipped one small dispatcher win because the data justified it. The concept prefer had been firing at 0.16 to 0.41 on preference queries, and the preference cluster wasn't reading it. A two-line cluster-membership change wired prefer and want into the cluster. Identity went to thirty-three; preference sub scores went from 50% to 60%. Real lift, no patterns, mechanism justified by the audit. But that was the boundary. Past that, the dispatcher had nothing left to give.

So I built Stage C. Three curated training-data drops, each one targeting a specific failure category the audit had named.

Drop A+D handled the preference-deflection problem. A hundred and forty-two pairs in the shape "do you like X?" and "what is your favorite X?" and subjective-state queries, labeled with the concept set [dont_know, prefer, self] instead of whatever domain concept the question mentioned. The encoder had been seeing "do you like dogs?" and learning that the strong signal was dog. Drop A+D added a stronger signal: questions in this shape carry dont_know and prefer, not dog.

Drop C handled within-cluster discrimination. Forty-five pairs in the shape "are you human?" / "are you an AI?" / "are you a robot?" labeled with the existing concepts agree and refuse alongside identity and i_am. Origin already had agreement and refusal concepts; the encoder just hadn't been taught when to apply them in identity-class questions. Drop C wired the connection.

Drop E handled prefix collapse. A hundred prefix-prepended versions of canonical identity probes - "Hello user. what is your name?" / "Sorry to ask, but what is your name?" - labeled with the same identity concepts as the bare versions. The encoder had been getting the bare versions right and the prefixed versions wrong; the difference was character-level distribution, not semantic content. Drop E gave the encoder the prefix variations explicitly.

None of these were templates. They were training pairs. The dispatcher wasn't being told to look for "are you human?" - the encoder was being trained on what concept set fires when a sentence in that shape appears, the same way it had been trained on every other concept it knew. The pipeline downstream was unchanged. The work was upstream.

Stage C retrained the encoder over sixty epochs, warm-started from the multi-label checkpoint from the morning. Three hours, monotonic descent, final loss 8% below baseline. Then I ran the batteries.

Identity went from thirty-three to forty-one. Forty-one out of fifty-one. Eighty percent. Up thirty-one points from the production baseline two stages ago, no patterns added anywhere in the dispatcher. Canonical probes 80%, prefix probes 83%, preference probes 80%. Each drop delivered exactly the lift the audit had predicted: preference up twenty points, prefix up sixteen, canonical up twelve. Working surface stopped regressing - it actually crept up to 81%. Per-concept firing across the full vocabulary improved across every bucket.

The lesson is the same one Part 14 was circling and Part 15 had to learn the hard way. When the work isn't moving the number, the work is in the wrong place. The dispatcher had been the obvious place to look because the dispatcher was what we'd built. The encoder was the place we hadn't been looking because we'd trained it months ago and moved on. Failure-mode tracing is the move that surfaces which layer is actually responsible. It's not glamorous. It produces a table of categories and counts. Then the categories tell you what data to write.

Ten failures remained. Five within-cluster - the encoder now fires refuse correctly on "are you human?" but the response selector hasn't been trained to pick the refusal-shaped response when refuse fires. Four preference-deflection holdouts where the domain concept fires at 1.0 and overpowers prefer. One probe ("what is your purpose?") where the test wants a non-IDK answer and Origin honestly doesn't have a self-modeled purpose - the test is wrong, not the model. All of these are encoder or response-selection refinements, not dispatcher rewrites.

The audit had also surfaced something else. While I was reading the per-concept firing report, looking at which concepts were healthy and which were silent, I started wondering where Origin's actual reasoning was happening. Not the encoder firing concepts - the part that's supposed to compose them into outcomes. The thalamus router. The micro-circuits. The two-stage reasoner. The architectural pieces from the v1 design that were supposed to be the intellectual core of what made OLT-1 different.

I went looking for them in the v2 code.

They weren't there.

That's Part 18.

One guy. One GPU. One $1,800 computer in Arizona. Still building.

Origin is developed at Fallen Angel Systems with the Genesis framework - NVIDIA Inception member. (USPTO Application #64/016,973, #64/017,567). FAS Guardian defends production AI systems from prompt injection in under 3ms. FAS Judgement is the open-source attack console that finds the gaps. Defense. Offense. Creation.

fallenangelsystems.com | Judgement on GitHub | Guardian on GitHub

Questions or consulting inquiries: josh@fallenangelsystems.com