Tim Maximov

Posted on Mar 31

Your Knowledge, Your Model — Part 3: Determinism Is Not Accuracy

#ai #llm #architecture #knowledge

Two agents. Same knowledge base. Same question. Different answers.

Both answers are internally consistent. Both are traceable to real sources. Neither agent made anything up. And yet they disagree.

This is not a hallucination problem. It's not an agent quality problem. It's a determinism problem — and it's the one nobody talks about.

What determinism means in a knowledge system

Most people ask two things of their knowledge system:

Is the information there? (completeness)
Is it correct? (accuracy)

This method adds a third requirement that almost nobody names explicitly:

Any agent, reading your sources in any order, must arrive at the same model of the system.

This is not the same as accuracy. Data can be accurate in every individual file and still produce different models depending on reading order. The failure modes are subtle:

A connection is described from side A but not from side B
The same concept has two different names in two different places
One file says "may", another says "always" — for the same behavior
A rule exists in one layer but not in the layer where agents expect to find it

None of these are factual errors. Every file is "correct." But the system as a whole is non-deterministic — its output depends on which file the agent happened to read first.

If two agents read the same knowledge base and build different models — the knowledge base is non-deterministic. That's a bug, not a disagreement, not a matter of interpretation.

Why this is harder to catch than hallucination

Hallucination is visible in the wrong direction. The output doesn't match anything in the sources — you can check.

Non-determinism is invisible because the output matches something in the sources. It's just not the right something. If you ask "is this in my knowledge base?" — the answer is yes. The answer is always yes. You just got the wrong version of yes.

This is why COLLAPSE markers matter so much. Without them, every silent choice looks like a confident answer. With them, you can see exactly where the system branched — and which branch was taken.

But COLLAPSE markers only help after a choice has been made. Determinism is about preventing the ambiguity that forces the choice in the first place.

The three sources of non-determinism

1. Asymmetric connections

If your knowledge includes relationships between concepts — A connects to B — both sides need to describe the connection. If only A mentions B but B doesn't mention A, then an agent starting from B will build a model where that connection doesn't exist.

This is the most common failure. It feels like thorough documentation. It isn't.

The test: for every connection in your system, can you find the description from both sides? If not — you have asymmetric coverage, and reading order determines what agents know.

2. Terminology drift

The same concept named differently in different places. "Decision" in one file, "resolution" in another, "outcome" in a third — all meaning the same thing.

Each individual file is internally consistent. But across files, an agent has no way to know these are the same thing. It builds three separate concepts. And when it reasons about them, the models diverge.

The fix is not renaming everything in one pass — that introduces iatrogenesis. The fix is a terminology map: here are all the names we use for this concept, and this one is canonical. Then agents can normalize before reasoning.

3. Layer violations

When information lives on the wrong layer, agents reading only the correct layer miss it. An agent doing a deep-spec pass reads specs — and misses the business rule that was written in the navigation layer because it seemed important at the time.

This creates a specific kind of non-determinism: the model depends not just on reading order, but on reading depth. An agent doing a shallow pass and a deep pass build different models — even from the same starting point.

How to test for determinism

Three practical tests. You don't need multiple agents to run them — you can do them manually.

Test 1 — The reverse order test

Pick your five most important questions about your domain. Answer them from your knowledge base, starting from the first file you'd naturally open.

Now answer them again, but start from a file you'd normally read last.

Do the answers change? If yes — you have reading-order dependency. Something is determined by entry point, not by content.

Test 2 — The two-path test

Pick one connection or relationship in your system. Find where it's described from side A. Find where it's described from side B (if it exists). Do both descriptions agree on: what the connection is, when it applies, who initiates it?

If they disagree on any of these — you have a COLLAPSE:RED that hasn't been marked yet.

If side B doesn't exist — you have an asymmetric connection. An agent starting from B will never know this relationship exists.

Test 3 — The empty layer test

For each layer in your pyramid, ask: what would an agent know if it read only this layer and nothing below?

Then ask: is that the right amount for this layer to communicate?

If the navigation layer contains business rules — agents reading only the navigation layer will build an overconfident model. If the spec layer is missing fields that appear in the scenario layer — agents building from specs will have an incomplete model.

The right answer for each layer is: exactly what belongs here, nothing more, nothing less.

What done looks like

A knowledge system is deterministic when:

Every important question has one unambiguous answer traceable to a specific source
The same question answered by any agent produces the same answer regardless of reading order
Every connection is described from both sides with consistent details
Every layer contains exactly what belongs there — no more, no less
Every contradiction has a COLLAPSE marker — none are silent
Incomplete coverage is labeled explicitly, not hidden

That last point matters more than it sounds. "I've covered the main sources" is not a status. "I've read 14 of 20 sources; the remaining 6 require a second pass for: [specific topics]" is a status.

The difference is whether you know what you don't know. A system that knows its own gaps is more useful than one that presents itself as complete.

The linter vs the method

A linter checks form: file exists, link not broken, syntax valid. You can have a perfect linter score and a completely non-deterministic knowledge system. All files present, all links valid, all formats correct — and two agents still build different models.

Determinism is semantic. It's checked by:

Running the hallucination trap catalog on every file
Marking every COLLAPSE — no exceptions, no obvious choices
Verifying symmetric connections
Auditing layer distribution
Running the three tests above

This is not a one-time setup. It's a recurring check — every time you add significant content, every time you cross a domain boundary, every time you connect two previously separate knowledge areas.

The linter checks form. This method checks meaning.

Where we go from here

The series has covered the full method:

Part 1 — five core principles: write everything explicitly, use layers, catalog hallucination traps, mark collapses, be the gateway.

Part 2 — agent specialization as protection against iatrogenics, and the failure patterns that look like work but aren't.

Part 3 (this post) — determinism as the third requirement, its three sources, and how to test for it.

Next: we leave method and go into domain. Real failure modes, real examples — starting with the one that caught OpenAI off guard: confabulation versus hallucination, and why the distinction changes how you build.

Method developed from a real working system. The principle works with any stack, any tools, any domain.

References: Zhang et al. arXiv:2510.04618 (ACE, 2025), Luhmann (1981).

DEV Community