<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: OKIKUSAN-PUBLIC</title>
    <description>The latest articles on DEV Community by OKIKUSAN-PUBLIC (@okikusan-public).</description>
    <link>https://dev.to/okikusan-public</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3861953%2Ff5d51bc9-0cb0-4e26-901a-945364a91d28.jpg</url>
      <title>DEV Community: OKIKUSAN-PUBLIC</title>
      <link>https://dev.to/okikusan-public</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/okikusan-public"/>
    <language>en</language>
    <item>
      <title>Don't build an AI that replays yesterday's spec — the gap between spec and source of truth is the real context</title>
      <dc:creator>OKIKUSAN-PUBLIC</dc:creator>
      <pubDate>Tue, 19 May 2026 22:45:49 +0000</pubDate>
      <link>https://dev.to/okikusan-public/dont-build-an-ai-that-replays-yesterdays-spec-the-gap-between-spec-and-source-of-truth-is-the-9m2</link>
      <guid>https://dev.to/okikusan-public/dont-build-an-ai-that-replays-yesterdays-spec-the-gap-between-spec-and-source-of-truth-is-the-9m2</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;The full version (with interactive SVG figures, the drift curve, the five-whys hub, the document-vs-context split, and the Harness concentric layers) is hosted on my blog:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://okikusan-public.pages.dev/context-is-the-gap.en" rel="noopener noreferrer"&gt;https://okikusan-public.pages.dev/context-is-the-gap.en&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This dev.to post is the condensed version. The visualisations live on the original.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;More and more often, an AI agent's accuracy is decided &lt;strong&gt;by its context, not its prompting&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But "context" here is not a polished spec. What really moves the needle is &lt;strong&gt;the gap between the spec and the source of truth&lt;/strong&gt;, and the &lt;strong&gt;reasons behind the drift&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An AI fed only the spec replays "past truth." Feed it the drift reasons too, and it approaches "today's truth." The blind spot of Spec-Driven Development and the real core of Harness Engineering, laid out.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The frontier of AI-agent accuracy has shifted: &lt;strong&gt;model → prompt → context&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;If you mistake "context" for a polished spec, the AI just &lt;strong&gt;replays "past truth"&lt;/strong&gt; — specs drift further from the Source of Truth (running code, ops, field judgement) the longer time passes&lt;/li&gt;
&lt;li&gt;What actually works is the &lt;strong&gt;reasons for the drift&lt;/strong&gt;. Five whys — why the spec was changed, why an exception was allowed, why the implementation compromised, why the issue went the way it did, why the review came out that way — decide the quality of the AI's output&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documents are polished; context is accumulated.&lt;/strong&gt; Put the spec at the core of the Harness, and layer the drift reasons around it&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Spec vs Source of Truth — the gap is inevitable
&lt;/h2&gt;

&lt;p&gt;The spec describes what &lt;strong&gt;should be&lt;/strong&gt;. A snapshot of agreement at a moment, internally coherent, neatly polished.&lt;/p&gt;

&lt;p&gt;As implementation and operations evolve, the actual "truth" drifts elsewhere:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The running code&lt;/strong&gt; — hard-coded values, exception handlers, commented-out branches&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The DB schema and the live data&lt;/strong&gt; — migration history, unexpected records, exceptional values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The actual API behaviour&lt;/strong&gt; — undocumented responses, unofficial endpoints&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Customer-side operating decisions&lt;/strong&gt; — approval routes never written down, tacit exceptions&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Field judgement&lt;/strong&gt; — choices an operator made on the spot&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the &lt;strong&gt;Source of Truth (SoT)&lt;/strong&gt;. The spec inevitably drifts away from the SoT over time. This is not laziness — it's structural.&lt;/p&gt;

&lt;p&gt;The problem is not that the gap exists. It's that &lt;strong&gt;the gap is never explained&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdzkkjwyjpggkccvlm9n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjdzkkjwyjpggkccvlm9n.png" alt="Spec vs Source of Truth: the gap is the context" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  An AI fed only the spec replays "past truth"
&lt;/h2&gt;

&lt;p&gt;Typical failures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The spec says X is correct, but the code shows Y." → The AI trusts the spec, returns X, and &lt;strong&gt;drifts from reality&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;"The spec has no exception handling, so edge cases can be ignored." → &lt;strong&gt;Operationally impossible — a misjudgement&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;"I implemented per the latest API docs." → &lt;strong&gt;The unofficial operating rules get missed&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is not the AI's fault. &lt;strong&gt;The context you fed it is frozen at a point in time&lt;/strong&gt;, and the AI is faithful to that point. The cleaner the spec, the more confidently the AI quotes "past truth."&lt;/p&gt;

&lt;p&gt;Reverse-engineering alone is not enough either. Code reveals "&lt;strong&gt;what is implemented and how&lt;/strong&gt;," but never "&lt;strong&gt;why it became that&lt;/strong&gt;."&lt;/p&gt;

&lt;h2&gt;
  
  
  Five whys to accumulate — that's strong context
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrvtqicby0qebl9gkq4a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrvtqicby0qebl9gkq4a.png" alt="Five whys to accumulate as context" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#&lt;/th&gt;
&lt;th&gt;What to keep&lt;/th&gt;
&lt;th&gt;Where it lives&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;01&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why was the spec changed?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Change log / meeting notes / Slack&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;02&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why was the exception allowed?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ops decision log / case-by-case memos&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;03&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why was the implementation compromised?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Code comments / PR comments&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;04&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why was the issue argued this way?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Issues / discussion&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;05&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Why did the review come out this way?&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;PR review comments&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;blockquote&gt;
&lt;p&gt;Keeping these "whys" is exactly the &lt;strong&gt;Externalisation&lt;/strong&gt; step in Nonaka's SECI model. The twist: you're &lt;strong&gt;externalising the process, not the conclusion&lt;/strong&gt;. That's how judgement patterns become reproducible in other contexts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Documents are polished; context is accumulated
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Documents&lt;/th&gt;
&lt;th&gt;Context&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Target&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Humans / clients&lt;/td&gt;
&lt;td&gt;AI agents&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Nature&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Coherence, consistency, polish&lt;/td&gt;
&lt;td&gt;Judgement material, contradictions, wobbles&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Examples&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Proposals / final specs / articles / manuals&lt;/td&gt;
&lt;td&gt;Issues / PR reviews / ops notes / failure logs / rough notes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Verb&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Polish&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Accumulate&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Tolerating contradiction is the core.&lt;/strong&gt; If you treat context as a "thinking process," contradictions are natural. Human judgement wobbles constantly; organisational decisions get overwritten. &lt;strong&gt;Whether you can keep that without sanding it down&lt;/strong&gt; decides whether your AI agent can reproduce "&lt;strong&gt;your kind of judgement&lt;/strong&gt;."&lt;/p&gt;

&lt;h2&gt;
  
  
  Spec at the core of the Harness; drift reasons on the outer rings
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn9b3u6qnykxy7htpb83.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnn9b3u6qnykxy7htpb83.png" alt="Harness layers: spec at the core, drift reasons outside" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Agent = Model + Harness&lt;/strong&gt; (Karpathy framing). SDD alone is not enough — you need to &lt;strong&gt;design the SDD outer rings&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;"Issue Driven Development (IDD)" pairs well with this. SDD = the spec is the truth. IDD = the drift reasons are the truth. Let them coexist.&lt;/p&gt;

&lt;h2&gt;
  
  
  Good AI = how much it lowers verification load
&lt;/h2&gt;

&lt;p&gt;In May 2026, on the Linux kernel 7.1 RC4 release, Linus Torvalds publicly declared the security mailing list &lt;strong&gt;"almost entirely unmanageable"&lt;/strong&gt; due to the flood of AI-generated vulnerability reports&lt;sup id="fnref1"&gt;1&lt;/sup&gt;. What was a stream of 2-3 reports per week two years ago has ballooned to &lt;strong&gt;5-10 reports per day&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Linus himself does &lt;strong&gt;not&lt;/strong&gt; dismiss AI in security work — he asks researchers to "&lt;strong&gt;understand the code and contribute a patch&lt;/strong&gt;," not just the alert. That's a miniature of AI-agent operations in general. &lt;strong&gt;The value of an AI is not output volume — it is how much it lowers the human's verification, correction, and review load.&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;A spec-only AI mass-produces plausible-looking output. It reads right, but it's drifted from the SoT and a human has to check every line to use it — the textbook case of &lt;strong&gt;"Slop"&lt;/strong&gt; (low-quality, generic, templated AI output). Only the AI fed the drift reasons becomes the kind that actually lowers human verification load.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Conclusion — accumulate, don't polish
&lt;/h2&gt;

&lt;p&gt;What sharpens an AI agent is no longer the model or the prompt. It is whether you can accumulate &lt;strong&gt;the gap between spec and Source of Truth, and the reasons for the drift&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Polish documents&lt;/strong&gt; (for humans / clients)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accumulate context&lt;/strong&gt; (for AI agents — keep the contradictions and wobbles)&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Spec at the core of the Harness; layer "why it diverged" on the outside&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Many organisations pour energy into "polishing the spec" because of the SDD boom. But the real differentiation lies elsewhere: &lt;strong&gt;not in polishing the spec, but in accumulating the gap with the SoT&lt;/strong&gt;. To stop building AIs that replay "past truth," stop polishing — start accumulating.&lt;/p&gt;




&lt;blockquote&gt;
&lt;p&gt;📌 &lt;strong&gt;Full version with interactive SVGs:&lt;/strong&gt; &lt;a href="https://okikusan-public.pages.dev/context-is-the-gap.en" rel="noopener noreferrer"&gt;https://okikusan-public.pages.dev/context-is-the-gap.en&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;FIG.0 — THE GAP (spec vs SoT drift curve)&lt;/li&gt;
&lt;li&gt;FIG.1 — SPEC-ONLY VS SPEC + GAP (two AIs)&lt;/li&gt;
&lt;li&gt;FIG.2 — FIVE WHYS (the accumulating hub)&lt;/li&gt;
&lt;li&gt;FIG.3 — DOCUMENTS VS CONTEXT (polish vs accumulate)&lt;/li&gt;
&lt;li&gt;FIG.4 — HARNESS LAYERS (spec at the core, drift reasons on the outside)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this resonates, &lt;strong&gt;a 🦄 / ❤️ / 💬 helps a lot.&lt;/strong&gt; Feedback welcome.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Related posts on my blog
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://okikusan-public.pages.dev/longtail-tacit-agent.en" rel="noopener noreferrer"&gt;AI agents enter the territory code can't write — long-tail × tacit knowledge × tacit thoughts&lt;/a&gt; — the philosophical premise of this post&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://okikusan-public.pages.dev/hermes-agent-second-brain-engine.en" rel="noopener noreferrer"&gt;Hermes Agent — execution engine for your Second Brain&lt;/a&gt; — a concrete Harness execution base&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://okikusan-public.pages.dev/ai-tasks-not-jobs.en" rel="noopener noreferrer"&gt;"Tasks, not jobs" — reading Microsoft Suleyman's 18-month forecast&lt;/a&gt; — Applied Engineer / FDE&lt;/li&gt;
&lt;/ul&gt;




&lt;ol&gt;

&lt;li id="fn1"&gt;
&lt;p&gt;The Register (2026-05-18): &lt;a href="https://www.theregister.com/security/2026/05/18/linus-torvalds-says-ai-powered-bug-hunters-have-made-linux-security-mailing-list-almost-entirely-unmanageable/" rel="noopener noreferrer"&gt;Linus Torvalds says AI-powered bug hunters have made Linux security mailing list 'almost entirely unmanageable'&lt;/a&gt; / Tom's Hardware (2026-05-18): &lt;a href="https://www.tomshardware.com/software/linux/linus-torvalds-says-ai-bug-reports-have-made-the-linux-security-mailing-list-almost-entirely-unmanageable" rel="noopener noreferrer"&gt;Linus Torvalds says flood of duplicate AI-generated vulnerability reports...&lt;/a&gt; ↩&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>llm</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
