DEV Community

Cover image for Don't build an AI that replays yesterday's spec — the gap between spec and source of truth is the real context
OKIKUSAN-PUBLIC
OKIKUSAN-PUBLIC

Posted on • Originally published at okikusan-public.pages.dev

Don't build an AI that replays yesterday's spec — the gap between spec and source of truth is the real context

📌 The full version (with interactive SVG figures, the drift curve, the five-whys hub, the document-vs-context split, and the Harness concentric layers) is hosted on my blog:

👉 https://okikusan-public.pages.dev/context-is-the-gap.en

This dev.to post is the condensed version. The visualisations live on the original.

Introduction

More and more often, an AI agent's accuracy is decided by its context, not its prompting.

But "context" here is not a polished spec. What really moves the needle is the gap between the spec and the source of truth, and the reasons behind the drift.

An AI fed only the spec replays "past truth." Feed it the drift reasons too, and it approaches "today's truth." The blind spot of Spec-Driven Development and the real core of Harness Engineering, laid out.

TL;DR

  • The frontier of AI-agent accuracy has shifted: model → prompt → context
  • If you mistake "context" for a polished spec, the AI just replays "past truth" — specs drift further from the Source of Truth (running code, ops, field judgement) the longer time passes
  • What actually works is the reasons for the drift. Five whys — why the spec was changed, why an exception was allowed, why the implementation compromised, why the issue went the way it did, why the review came out that way — decide the quality of the AI's output
  • Documents are polished; context is accumulated. Put the spec at the core of the Harness, and layer the drift reasons around it

Spec vs Source of Truth — the gap is inevitable

The spec describes what should be. A snapshot of agreement at a moment, internally coherent, neatly polished.

As implementation and operations evolve, the actual "truth" drifts elsewhere:

  • The running code — hard-coded values, exception handlers, commented-out branches
  • The DB schema and the live data — migration history, unexpected records, exceptional values
  • The actual API behaviour — undocumented responses, unofficial endpoints
  • Customer-side operating decisions — approval routes never written down, tacit exceptions
  • Field judgement — choices an operator made on the spot

These are the Source of Truth (SoT). The spec inevitably drifts away from the SoT over time. This is not laziness — it's structural.

The problem is not that the gap exists. It's that the gap is never explained.

Spec vs Source of Truth: the gap is the context

An AI fed only the spec replays "past truth"

Typical failures:

  • "The spec says X is correct, but the code shows Y." → The AI trusts the spec, returns X, and drifts from reality
  • "The spec has no exception handling, so edge cases can be ignored." → Operationally impossible — a misjudgement
  • "I implemented per the latest API docs." → The unofficial operating rules get missed

This is not the AI's fault. The context you fed it is frozen at a point in time, and the AI is faithful to that point. The cleaner the spec, the more confidently the AI quotes "past truth."

Reverse-engineering alone is not enough either. Code reveals "what is implemented and how," but never "why it became that."

Five whys to accumulate — that's strong context

Five whys to accumulate as context

# What to keep Where it lives
01 Why was the spec changed? Change log / meeting notes / Slack
02 Why was the exception allowed? Ops decision log / case-by-case memos
03 Why was the implementation compromised? Code comments / PR comments
04 Why was the issue argued this way? Issues / discussion
05 Why did the review come out this way? PR review comments

Keeping these "whys" is exactly the Externalisation step in Nonaka's SECI model. The twist: you're externalising the process, not the conclusion. That's how judgement patterns become reproducible in other contexts.

Documents are polished; context is accumulated

Documents Context
Target Humans / clients AI agents
Nature Coherence, consistency, polish Judgement material, contradictions, wobbles
Examples Proposals / final specs / articles / manuals Issues / PR reviews / ops notes / failure logs / rough notes
Verb Polish Accumulate

Tolerating contradiction is the core. If you treat context as a "thinking process," contradictions are natural. Human judgement wobbles constantly; organisational decisions get overwritten. Whether you can keep that without sanding it down decides whether your AI agent can reproduce "your kind of judgement."

Spec at the core of the Harness; drift reasons on the outer rings

Harness layers: spec at the core, drift reasons outside

Agent = Model + Harness (Karpathy framing). SDD alone is not enough — you need to design the SDD outer rings.

"Issue Driven Development (IDD)" pairs well with this. SDD = the spec is the truth. IDD = the drift reasons are the truth. Let them coexist.

Good AI = how much it lowers verification load

In May 2026, on the Linux kernel 7.1 RC4 release, Linus Torvalds publicly declared the security mailing list "almost entirely unmanageable" due to the flood of AI-generated vulnerability reports1. What was a stream of 2-3 reports per week two years ago has ballooned to 5-10 reports per day.

Linus himself does not dismiss AI in security work — he asks researchers to "understand the code and contribute a patch," not just the alert. That's a miniature of AI-agent operations in general. The value of an AI is not output volume — it is how much it lowers the human's verification, correction, and review load.

A spec-only AI mass-produces plausible-looking output. It reads right, but it's drifted from the SoT and a human has to check every line to use it — the textbook case of "Slop" (low-quality, generic, templated AI output). Only the AI fed the drift reasons becomes the kind that actually lowers human verification load.

Conclusion — accumulate, don't polish

What sharpens an AI agent is no longer the model or the prompt. It is whether you can accumulate the gap between spec and Source of Truth, and the reasons for the drift.

  • Polish documents (for humans / clients)
  • Accumulate context (for AI agents — keep the contradictions and wobbles)
  • Spec at the core of the Harness; layer "why it diverged" on the outside

Many organisations pour energy into "polishing the spec" because of the SDD boom. But the real differentiation lies elsewhere: not in polishing the spec, but in accumulating the gap with the SoT. To stop building AIs that replay "past truth," stop polishing — start accumulating.


📌 Full version with interactive SVGs: https://okikusan-public.pages.dev/context-is-the-gap.en

  • FIG.0 — THE GAP (spec vs SoT drift curve)
  • FIG.1 — SPEC-ONLY VS SPEC + GAP (two AIs)
  • FIG.2 — FIVE WHYS (the accumulating hub)
  • FIG.3 — DOCUMENTS VS CONTEXT (polish vs accumulate)
  • FIG.4 — HARNESS LAYERS (spec at the core, drift reasons on the outside)

If this resonates, a 🦄 / ❤️ / 💬 helps a lot. Feedback welcome.

Related posts on my blog


  1. The Register (2026-05-18): Linus Torvalds says AI-powered bug hunters have made Linux security mailing list 'almost entirely unmanageable' / Tom's Hardware (2026-05-18): Linus Torvalds says flood of duplicate AI-generated vulnerability reports... 

Top comments (0)