We built something to replace the teacher. It worked. Then something else went wrong.
Part 6 ended with a problem we couldn't patch: a token model cannot reliably grade a concept model. The mismatch isn't fixable with a better rubric or a better teacher model. It's architectural.
So we stopped trying to fix the teacher and built a replacement.
Discovery: The Teacher Replacement
The idea was simple. Instead of asking Gemma to generate questions and grade responses, we'd build a rule-based system that already knew the right answers.
Each rule is a (pattern, expected response signature) pair. "does ice float?" expects a response containing "float" and "water." "what is your name?" expects a response containing "origin." No LLM anywhere in the loop. No drift. No mode collapse. No token-fluency bias.
We called it Discovery. We ran the first test.
The numbers: 0.79 seconds for 180 tests. 94.6% pass rate on Tier 1. Zero duplicates. Zero hallucinations.
Compare that to Gemma: 20 minutes for 200 rounds, 50%+ duplicates, 65.6% pass rate that was actually measuring fluency, not understanding.
Discovery was 1,300x faster, cleaner signal, and actually measuring what we cared about. We committed the code. Gemma went into reference-only status. The teacher loop was retired.
Then Discovery exposed the next problem.
What Discovery Actually Exposed
Running clean evaluations against a decoder we thought was "working" revealed something we'd been hiding from ourselves: most of the decoder wasn't understanding at all. It was text-matching.
The decoder had heads like:
if "hello" in text: return "hello."
if "what is your name" in text: return "my name is origin."
if "count to three" in text: return "one two three."
Every "working" response was a text substring lookup. The encoder's concept activations barely influenced routing. Tier 1 and Tier 2 had been passing at 100% on our deterministic suite because the decoder was pattern-matching against the same keyword lists the grader used. A pattern-matcher acing a test written by a pattern-matcher. Circular.
When you typed "hello," the decoder matched the string "hello" and returned "hello." The encoder might as well not have been there.
We'd spent weeks calling it concept-driven and it was text-driven with concepts as decoration.
The Moment It Broke Open
The way we caught it was anticlimactic. After Discovery reported 100% pass rates, we opened an interactive chat and typed:
you > how are you
origin > i don't know
Every tier test had passed. The most basic conversational question failed.
Why? "how are you" wasn't in any head's pattern list. The encoder might have fired relevant concepts - self, question, state - but the decoder wasn't looking at the encoder. It was scanning the input string for known trigger phrases and hadn't been given that one.
The 100% had been measuring whether the patterns we'd written matched the patterns we'd tested for. Nothing more.
That's what Discovery exposed by running clean. And that's the wall v2 had to break through next.
Part 8 is the day we did.
*
Origin is developed at Fallen Angel Systems with the Genesis framework — NVIDIA Inception member. (USPTO Application #64/016,973, #64/017,567). FAS Guardian defends production AI systems from prompt injection in under 3ms. FAS Judgement is the open-source attack console that finds the gaps. Defense. Offense. Creation.
*
*
fallenangelsystems.com | Judgement on GitHub | Guardian on GitHub
*
*
Questions or consulting inquiries: josh@fallenangelsystems.com
*
Top comments (0)