📌 The full version (with interactive SVG figures, the drift curve, the five-whys hub, the document-vs-context split, and the Harness concentric layers) is hosted on my blog:
👉 https://okikusan-public.pages.dev/context-is-the-gap.en
This dev.to post is the condensed version. The visualisations live on the original.
Introduction
More and more often, an AI agent's accuracy is decided by its context, not its prompting.
But "context" here is not a polished spec. What really moves the needle is the gap between the spec and the source of truth, and the reasons behind the drift.
An AI fed only the spec replays "past truth." Feed it the drift reasons too, and it approaches "today's truth." The blind spot of Spec-Driven Development and the real core of Harness Engineering, laid out.
TL;DR
- The frontier of AI-agent accuracy has shifted: model → prompt → context
- If you mistake "context" for a polished spec, the AI just replays "past truth" — specs drift further from the Source of Truth (running code, ops, field judgement) the longer time passes
- What actually works is the reasons for the drift. Five whys — why the spec was changed, why an exception was allowed, why the implementation compromised, why the issue went the way it did, why the review came out that way — decide the quality of the AI's output
- Documents are polished; context is accumulated. Put the spec at the core of the Harness, and layer the drift reasons around it
Spec vs Source of Truth — the gap is inevitable
The spec describes what should be. A snapshot of agreement at a moment, internally coherent, neatly polished.
As implementation and operations evolve, the actual "truth" drifts elsewhere:
- The running code — hard-coded values, exception handlers, commented-out branches
- The DB schema and the live data — migration history, unexpected records, exceptional values
- The actual API behaviour — undocumented responses, unofficial endpoints
- Customer-side operating decisions — approval routes never written down, tacit exceptions
- Field judgement — choices an operator made on the spot
These are the Source of Truth (SoT). The spec inevitably drifts away from the SoT over time. This is not laziness — it's structural.
The problem is not that the gap exists. It's that the gap is never explained.
An AI fed only the spec replays "past truth"
Typical failures:
- "The spec says X is correct, but the code shows Y." → The AI trusts the spec, returns X, and drifts from reality
- "The spec has no exception handling, so edge cases can be ignored." → Operationally impossible — a misjudgement
- "I implemented per the latest API docs." → The unofficial operating rules get missed
This is not the AI's fault. The context you fed it is frozen at a point in time, and the AI is faithful to that point. The cleaner the spec, the more confidently the AI quotes "past truth."
Reverse-engineering alone is not enough either. Code reveals "what is implemented and how," but never "why it became that."
Five whys to accumulate — that's strong context
| # | What to keep | Where it lives |
|---|---|---|
| 01 | Why was the spec changed? | Change log / meeting notes / Slack |
| 02 | Why was the exception allowed? | Ops decision log / case-by-case memos |
| 03 | Why was the implementation compromised? | Code comments / PR comments |
| 04 | Why was the issue argued this way? | Issues / discussion |
| 05 | Why did the review come out this way? | PR review comments |
Keeping these "whys" is exactly the Externalisation step in Nonaka's SECI model. The twist: you're externalising the process, not the conclusion. That's how judgement patterns become reproducible in other contexts.
Documents are polished; context is accumulated
| Documents | Context | |
|---|---|---|
| Target | Humans / clients | AI agents |
| Nature | Coherence, consistency, polish | Judgement material, contradictions, wobbles |
| Examples | Proposals / final specs / articles / manuals | Issues / PR reviews / ops notes / failure logs / rough notes |
| Verb | Polish | Accumulate |
Tolerating contradiction is the core. If you treat context as a "thinking process," contradictions are natural. Human judgement wobbles constantly; organisational decisions get overwritten. Whether you can keep that without sanding it down decides whether your AI agent can reproduce "your kind of judgement."
Spec at the core of the Harness; drift reasons on the outer rings
Agent = Model + Harness (Karpathy framing). SDD alone is not enough — you need to design the SDD outer rings.
"Issue Driven Development (IDD)" pairs well with this. SDD = the spec is the truth. IDD = the drift reasons are the truth. Let them coexist.
Good AI = how much it lowers verification load
In May 2026, on the Linux kernel 7.1 RC4 release, Linus Torvalds publicly declared the security mailing list "almost entirely unmanageable" due to the flood of AI-generated vulnerability reports1. What was a stream of 2-3 reports per week two years ago has ballooned to 5-10 reports per day.
Linus himself does not dismiss AI in security work — he asks researchers to "understand the code and contribute a patch," not just the alert. That's a miniature of AI-agent operations in general. The value of an AI is not output volume — it is how much it lowers the human's verification, correction, and review load.
A spec-only AI mass-produces plausible-looking output. It reads right, but it's drifted from the SoT and a human has to check every line to use it — the textbook case of "Slop" (low-quality, generic, templated AI output). Only the AI fed the drift reasons becomes the kind that actually lowers human verification load.
Conclusion — accumulate, don't polish
What sharpens an AI agent is no longer the model or the prompt. It is whether you can accumulate the gap between spec and Source of Truth, and the reasons for the drift.
- Polish documents (for humans / clients)
- Accumulate context (for AI agents — keep the contradictions and wobbles)
- Spec at the core of the Harness; layer "why it diverged" on the outside
Many organisations pour energy into "polishing the spec" because of the SDD boom. But the real differentiation lies elsewhere: not in polishing the spec, but in accumulating the gap with the SoT. To stop building AIs that replay "past truth," stop polishing — start accumulating.
📌 Full version with interactive SVGs: https://okikusan-public.pages.dev/context-is-the-gap.en
- FIG.0 — THE GAP (spec vs SoT drift curve)
- FIG.1 — SPEC-ONLY VS SPEC + GAP (two AIs)
- FIG.2 — FIVE WHYS (the accumulating hub)
- FIG.3 — DOCUMENTS VS CONTEXT (polish vs accumulate)
- FIG.4 — HARNESS LAYERS (spec at the core, drift reasons on the outside)
If this resonates, a 🦄 / ❤️ / 💬 helps a lot. Feedback welcome.
Related posts on my blog
- AI agents enter the territory code can't write — long-tail × tacit knowledge × tacit thoughts — the philosophical premise of this post
- Hermes Agent — execution engine for your Second Brain — a concrete Harness execution base
- "Tasks, not jobs" — reading Microsoft Suleyman's 18-month forecast — Applied Engineer / FDE
-
The Register (2026-05-18): Linus Torvalds says AI-powered bug hunters have made Linux security mailing list 'almost entirely unmanageable' / Tom's Hardware (2026-05-18): Linus Torvalds says flood of duplicate AI-generated vulnerability reports... ↩



Top comments (0)