<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Nwaneri</title>
    <description>The latest articles on DEV Community by Daniel Nwaneri (@dannwaneri).</description>
    <link>https://dev.to/dannwaneri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3606168%2Fa6853c87-4daa-4bea-87e8-d1bd0d8a59d7.jpg</url>
      <title>DEV Community: Daniel Nwaneri</title>
      <link>https://dev.to/dannwaneri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dannwaneri"/>
    <language>en</language>
    <item>
      <title>The Gap Andrej Karpathy Didn't Fill</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 21 Apr 2026 14:10:07 +0000</pubDate>
      <link>https://dev.to/dannwaneri/the-gap-andrej-karpathy-didnt-fill-ohc</link>
      <guid>https://dev.to/dannwaneri/the-gap-andrej-karpathy-didnt-fill-ohc</guid>
      <description>&lt;p&gt;I was debugging a customer support issue and typed the question into Claude.&lt;/p&gt;

&lt;p&gt;Perfect answer. Wrong company. The model told me what the &lt;em&gt;general&lt;/em&gt; best practice was for the problem category — accurate, well-reasoned, completely useless for my situation. Because the actual answer was in our internal runbook. Three paragraphs written eight months ago by a contractor who's since left. Not in any training set. Not in any index. Just sitting in a Notion page nobody searched anymore.&lt;/p&gt;

&lt;p&gt;That's the gap Karpathy didn't fill.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Wikipedia Argument
&lt;/h2&gt;

&lt;p&gt;Karpathy made a compelling point: LLMs are becoming the new Wikipedia. For general knowledge questions, just ask the model. Why build a retrieval pipeline to answer "what is gradient descent" when GPT-4 already knows it better than your docs ever will?&lt;/p&gt;

&lt;p&gt;He's right, for that use case.&lt;/p&gt;

&lt;p&gt;But most knowledge work isn't general knowledge questions. It's "what did we decide about the pricing model in Q3" and "which customer reported this exact error last month" and "what's the current state of the integration with Salesforce." None of that is in any model. None of it ever will be.&lt;/p&gt;

&lt;p&gt;The moment your question is about &lt;em&gt;your&lt;/em&gt; specific context — your decisions, your customers, your code, your history — you need retrieval. The LLM is still doing the reasoning. But it needs your data to reason over.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Cheap/Expensive Pipeline Problem
&lt;/h2&gt;

&lt;p&gt;The frustrating thing about most RAG tutorials is they hand you the expensive version by default.&lt;/p&gt;

&lt;p&gt;Every query embeds. Every query hits the vector index. Every query reruns the full pipeline, even for questions it's already answered. Cache? Optional. Keyword search for exact matches? Not shown. A model worth $0.003/call handling routing decisions? Never comes up.&lt;/p&gt;

&lt;p&gt;The result: hobby projects that work fine, production deployments that surprise you at month end.&lt;/p&gt;

&lt;p&gt;The routing insight that changed how I think about this: not every query needs vector search. "Show me documents tagged 'finance' from last week" is a SQL query. "Find anything mentioning GPT-4o" is BM25. "What do we know about our churn patterns" is the full vector pipeline. Running everything through embeddings when the first two cases cover maybe 40% of real-world queries is just waste.&lt;/p&gt;

&lt;p&gt;V4 of this project classifies query intent first — SQL / BM25 / VECTOR / GRAPH / VISION / OCR — then routes accordingly. In testing, that cuts average embedding cost by 71% compared to running everything through vectors. The expensive path runs when it actually needs to.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Actually Running
&lt;/h2&gt;

&lt;p&gt;For anyone who wants the specific stack:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;What's used&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runtime&lt;/td&gt;
&lt;td&gt;Cloudflare Workers (TypeScript)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vector store&lt;/td&gt;
&lt;td&gt;Cloudflare Vectorize&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Keyword + metadata store&lt;/td&gt;
&lt;td&gt;D1 (SQLite at the edge)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding (default)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;@cf/qwen/qwen3-embedding-0.6b&lt;/code&gt; — 1024d&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reranker&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@cf/baai/bge-reranker-base&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Vision / OCR&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@cf/meta/llama-4-scout-17b-16e-instruct&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Knowledge synthesis&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@cf/moonshot/kimi-k2.5&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Query routing&lt;/td&gt;
&lt;td&gt;&lt;code&gt;@cf/meta/llama-3.2-3b-instruct&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MCP transport&lt;/td&gt;
&lt;td&gt;Streamable HTTP via Cloudflare Durable Objects&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Everything on Cloudflare. No Pinecone account. No OpenAI bill for embeddings. No Redis for caching. The cache is the CF Cache API, which is global and free at reasonable volumes.&lt;/p&gt;

&lt;p&gt;Rough cost at 1,000 queries/day with intelligent routing: ~$0.11/month. At 10,000/day: $1–5/month. The Workers free tier absorbs the first 100,000 requests/day with no compute charge.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part That Surprised Me
&lt;/h2&gt;

&lt;p&gt;The reflection layer was the thing I didn't expect to matter as much as it does.&lt;/p&gt;

&lt;p&gt;Standard RAG retrieves documents. It doesn't learn. Every query goes in cold — no memory of patterns across documents, no synthesis of what the knowledge base collectively knows, no growing sense of what questions remain unanswered.&lt;/p&gt;

&lt;p&gt;After every ingest, this system does three things: finds the most semantically related documents already in the index, asks an LLM to synthesise what's new, how it connects to what exists, and what gap remains. That synthesis — one paragraph, structured — gets embedded and stored as &lt;code&gt;doc_type=reflection&lt;/code&gt;, with a 1.5× ranking boost in search results.&lt;/p&gt;

&lt;p&gt;After every 3 ingests, it consolidates the recent reflections into a &lt;code&gt;doc_type=summary&lt;/code&gt; — a compressed view of what the knowledge base has learned across that batch.&lt;/p&gt;

&lt;p&gt;The effect: your knowledge base builds a second layer of meaning on top of raw documents. Related concepts get explicitly linked. Contradictions get surfaced. A search that previously returned five independent chunks now might also return a reflection that already synthesised those chunks into one coherent insight.&lt;/p&gt;

&lt;p&gt;It's not magic. It's just LLM synthesis run as a background job, stored where retrieval can find it. But the result feels different from a flat document store in a way that's hard to describe until you see it return the reflection instead of the source.&lt;/p&gt;

&lt;p&gt;Kimi K2.5 handles the reflection and consolidation by default. It has noticeably better multi-document reasoning than smaller models — worth the slightly higher cost for synthesis quality. Drop in &lt;code&gt;REFLECTION_MODEL=llama-3.2-3b&lt;/code&gt; in your env if you want to reduce cost at high volume.&lt;/p&gt;




&lt;h2&gt;
  
  
  Running It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/dannwaneri/vectorize-mcp-worker.git
&lt;span class="nb"&gt;cd &lt;/span&gt;vectorize-mcp-worker
npm &lt;span class="nb"&gt;install&lt;/span&gt;

&lt;span class="c"&gt;# Create a 1024d Vectorize index (qwen3-0.6b default)&lt;/span&gt;
wrangler vectorize create mcp-knowledge-base &lt;span class="nt"&gt;--dimensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;1024 &lt;span class="nt"&gt;--metric&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cosine

&lt;span class="c"&gt;# Create D1 database&lt;/span&gt;
wrangler d1 create mcp-knowledge-db

&lt;span class="c"&gt;# Configure, apply schema, deploy&lt;/span&gt;
&lt;span class="nb"&gt;cp &lt;/span&gt;wrangler.toml.example wrangler.toml
&lt;span class="c"&gt;# paste your database_id into wrangler.toml&lt;/span&gt;
wrangler d1 execute mcp-knowledge-db &lt;span class="nt"&gt;--remote&lt;/span&gt; &lt;span class="nt"&gt;--file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;./schema.sql
wrangler deploy

&lt;span class="c"&gt;# Set API key&lt;/span&gt;
wrangler secret put API_KEY
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Dashboard at &lt;code&gt;https://your-worker.workers.dev/dashboard&lt;/code&gt;. OpenAPI spec at &lt;code&gt;/openapi.json&lt;/code&gt; for Postman / Bruno / Insomnia.&lt;/p&gt;

&lt;p&gt;Claude Desktop config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vectorize"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"npx"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"mcp-remote"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"https://your-worker.workers.dev/mcp"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"--header"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="s2"&gt;"Authorization: Bearer YOUR_API_KEY"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Restart Claude Desktop. Six tools appear: &lt;code&gt;search&lt;/code&gt;, &lt;code&gt;ingest&lt;/code&gt;, &lt;code&gt;ingest_image_url&lt;/code&gt;, &lt;code&gt;find_similar_by_url&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;, &lt;code&gt;stats&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Is For
&lt;/h2&gt;

&lt;p&gt;This isn't trying to be better than LLMs at general knowledge. It's making LLMs useful on &lt;em&gt;your&lt;/em&gt; data — the knowledge that doesn't exist anywhere else, the decisions your team made that aren't in any training set, the documents that answer the questions your customers actually ask.&lt;/p&gt;

&lt;p&gt;Most RAG tutorials hand you a demo. This is what happens when you build the thing properly: hybrid search, cross-encoder reranking, intelligent routing, knowledge synthesis, multi-tenancy, rate limiting, caching, a native MCP server, batch ingestion, and a test suite. In one deployable Worker. At boring prices.&lt;/p&gt;

&lt;p&gt;The model knows everything except what you know. That asymmetry is the whole problem. Filling it doesn't have to be expensive.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full source: &lt;a href="https://github.com/dannwaneri/vectorize-mcp-worker" rel="noopener noreferrer"&gt;github.com/dannwaneri/vectorize-mcp-worker&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>rag</category>
      <category>webdev</category>
    </item>
    <item>
      <title>AI is quietly making human experts invisible. I built a tool to stop it.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 14 Apr 2026 17:01:10 +0000</pubDate>
      <link>https://dev.to/dannwaneri/ai-is-quietly-making-human-experts-invisible-i-built-a-tool-to-stop-it-3g2m</link>
      <guid>https://dev.to/dannwaneri/ai-is-quietly-making-human-experts-invisible-i-built-a-tool-to-stop-it-3g2m</guid>
      <description>&lt;p&gt;Every time you ask an AI to write code, something disappears.&lt;/p&gt;

&lt;p&gt;Not the code — the code shows up fine. What disappears is the trail. The GitHub discussion where someone spent two hours explaining &lt;em&gt;why&lt;/em&gt; cursor-based pagination beats offset for live-updating datasets. The Stack Overflow answer from 2019 where one person, after a week of debugging, documented exactly why that approach fails under concurrent writes. The RFC your team wrote six months ago that established the pattern the AI just silently copied.&lt;/p&gt;

&lt;p&gt;The AI consumed all of it. The humans who produced it got nothing.&lt;/p&gt;

&lt;p&gt;And I don't mean "nothing" philosophically. I mean: no citation in the codebase. No way for a new developer to trace why the code is written the way it is. No signal to the person who wrote the original answer that their work mattered.&lt;/p&gt;

&lt;p&gt;Over time, at scale, those people stop contributing. Why maintain a detailed GitHub discussion if AI will summarize it into oblivion and no one will read the original?&lt;/p&gt;

&lt;p&gt;This is the quiet cost of AI-assisted development that nobody is measuring. I've been thinking about it for a while, and I built something to address it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The scenario
&lt;/h2&gt;

&lt;p&gt;A developer joins a team. Six months of AI-assisted codebase. They hit a bug in the pagination logic — cursor-based, unusual implementation, nobody on the team remembers why it was built that way. The original developer who designed it has left.&lt;/p&gt;

&lt;p&gt;Old answer: two days of archaeology. git blame points to a commit message that says "fix pagination." The commit before that says "implement pagination." Dead end.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;poc.py trace src/utils/paginator.py&lt;/code&gt;, that same developer sees this in thirty seconds:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Provenance trace: src/utils/paginator.py
────────────────────────────────────────────────────────────
  [HIGH]  @tannerlinsley on github
          Cursor pagination discussion
          https://github.com/TanStack/query/discussions/123
          Insight: cursor beats offset for live-updating datasets

Knowledge gaps (AI-synthesized, no human source):
  • Error retry strategy — no human source cited
  • Concurrent write handling — AI chose this arbitrarily
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;They now know exactly where the pattern came from and — critically — which parts of the code have no traceable human source. That second section is what saves them. The concurrent write handling is where the bug lives. AI made a choice nobody reviewed.&lt;/p&gt;

&lt;p&gt;That's what this tool does. Not enforcement first. Archaeology first.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I built
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/dannwaneri/proof-of-contribution" rel="noopener noreferrer"&gt;proof-of-contribution&lt;/a&gt;&lt;/strong&gt; is a Claude Code skill that keeps the human knowledge chain intact inside AI-assisted codebases.&lt;/p&gt;

&lt;p&gt;The core idea is simple: every AI-generated artifact should stay tethered to the human knowledge that inspired it. Not as a comment at the top of a file that nobody reads. As a structured, queryable, enforceable record that lives next to the code.&lt;/p&gt;

&lt;p&gt;When the skill is active, Claude automatically appends a &lt;strong&gt;Provenance Block&lt;/strong&gt; to every generated output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## PROOF OF CONTRIBUTION
Generated artifact: fetch_github_discussions()
Confidence: MEDIUM

## HUMAN SOURCES THAT INSPIRED THIS

[1] GitHub GraphQL API Documentation Team
    Source type: Official Docs
    URL: docs.github.com/en/graphql
    Contribution: cursor-based pagination pattern

[2] GitHub Community (multiple contributors)
    Source type: GitHub Discussions
    URL: github.com/community/community
    Contribution: "ghost" fallback for deleted accounts
                  surfaced in bug reports

## KNOWLEDGE GAPS (AI synthesized, no human cited)
- Error handling / retry logic
- Rate limit strategy

## RECOMMENDED HUMAN EXPERTS TO CONSULT
- github.com/octokit community for pagination
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The section that matters most is &lt;strong&gt;Knowledge Gaps&lt;/strong&gt;. That's where AI admits what it synthesized without a traceable human source. No other tool I know of produces this. It's the part that turns "the AI wrote it" from a shrug into an auditable fact.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Knowledge Gaps actually get detected
&lt;/h2&gt;

&lt;p&gt;This is the part worth explaining carefully, because the obvious assumption — that the AI just introspects and reports what it doesn't know — is wrong. LLMs hallucinate confidently. An AI that could reliably detect its own knowledge gaps wouldn't produce knowledge gaps in the first place.&lt;/p&gt;

&lt;p&gt;The detection mechanism is different. It's a comparison, not introspection.&lt;/p&gt;

&lt;p&gt;When you use &lt;a href="https://github.com/dannwaneri/spec-writer" rel="noopener noreferrer"&gt;spec-writer&lt;/a&gt; before building, it generates a structured spec with an explicit assumptions list — every decision the AI is making that you didn't specify, each one impact-rated. That list is the contract: here is every claim this feature rests on.&lt;/p&gt;

&lt;p&gt;When the code ships, proof-of-contribution cross-checks the final implementation against that contract. Anything the code does that doesn't map to a spec assumption or a cited human source gets flagged as a Knowledge Gap. The AI isn't grading its own exam. The spec is the answer key.&lt;/p&gt;

&lt;p&gt;The result is deterministic. If the retry logic wasn't specified and no human source covers it, the gap appears in the block regardless of how confident the model was when it wrote the code. The boundary holds because it comes from the spec, not from the model's confidence.&lt;/p&gt;

&lt;p&gt;This is also why the confidence levels mean something. HIGH means the spec explicitly covered it or the user provided the source directly. MEDIUM means the pattern traces to recognized human-authored work but the exact source isn't pinned. LOW means the model synthesized it — human review strongly recommended before this code goes anywhere near production.&lt;/p&gt;

&lt;p&gt;There's a second detection path that doesn't require spec-writer at all. &lt;code&gt;poc.py verify&lt;/code&gt; runs Python's built-in &lt;code&gt;ast&lt;/code&gt; module against the file and extracts every function definition, conditional branch, and return path. It cross-checks each one against the seeded claims. No API calls. No model confidence. Pure static analysis. When you run it on a file where &lt;code&gt;import-spec&lt;/code&gt; was used first, only the assumptions with no resolved citation surface as gaps. When you run it cold, every uncited structural unit surfaces as a baseline. Either way, the AI's confidence at generation time is irrelevant — the boundary comes from the code's actual structure.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three things the skill does
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Provenance Blocks&lt;/strong&gt; — attached automatically to any generated code, doc, or architecture output. You don't have to ask. It's always there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Knowledge Graph schema&lt;/strong&gt; — when you're building a system to track contributions at scale. Claude generates a complete graph schema for Neo4j, Postgres, or JSON-LD. Nodes for code artifacts, human sources, individual experts, AI sessions, and knowledge claims. Edges that let you ask: "who are the humans behind this module?" or "what did @username contribute to this codebase?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Static analyser (&lt;code&gt;poc.py verify&lt;/code&gt;)&lt;/strong&gt; — runs after the agent builds. Parses the file's structure using Python's AST, cross-checks every function and branch against seeded claims, and reports deterministic Knowledge Gaps. Zero API calls. Exit code 0 means clean, 1 means gaps found — CI-compatible.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;HITL Indexing architecture&lt;/strong&gt; — when you want AI to surface human experts instead of summarizing them. The query interface returns Expert Cards:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Answer: Use cursor-based pagination with GraphQL endCursor.

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
HUMAN EXPERTS ON THIS TOPIC
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
👤 @tannerlinsley  (GitHub)
   Expertise signal: 23 contributions on pagination patterns
   Key contribution: github.com/TanStack/query/discussions/123
   Quote: "Cursor beats offset when rows can be inserted mid-page"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Not a summary. A pointer. The human expert stays visible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting started takes one command
&lt;/h2&gt;

&lt;p&gt;I didn't want this to be another tool that requires you to choose a database before you can do anything. The default is SQLite. It works immediately.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install the skill&lt;/span&gt;
&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.claude/skills
git clone https://github.com/dannwaneri/proof-of-contribution.git ~/.claude/skills/proof-of-contribution

&lt;span class="c"&gt;# Scaffold your project (run once, in your repo root)&lt;/span&gt;
python ~/.claude/skills/proof-of-contribution/assets/scripts/poc_init.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That creates four things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.poc/provenance.db&lt;/code&gt; — SQLite database, local only, gitignored&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.poc/config.json&lt;/code&gt; — project config, committed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.github/PULL_REQUEST_TEMPLATE.md&lt;/code&gt; — PR template with an AI Provenance section&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;.github/workflows/poc-check.yml&lt;/code&gt; — GitHub Action that fails PRs missing attribution&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then you get a local CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python poc.py add src/utils/parser.py    &lt;span class="c"&gt;# record attribution interactively&lt;/span&gt;
python poc.py trace src/utils/parser.py  &lt;span class="c"&gt;# show full human attribution chain&lt;/span&gt;
python poc.py report                     &lt;span class="c"&gt;# repo-wide provenance health&lt;/span&gt;
python poc.py experts                    &lt;span class="c"&gt;# top cited humans in your graph&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;poc.py verify&lt;/code&gt; is what catches gaps before they become incidents:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;python poc.py verify src/utils/csv_exporter.py
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Verify: src/utils/csv_exporter.py
────────────────────────────────────────────────────────────
  Structural units detected : 11
  Seeded claims             : 3
  Covered by cited source   : 2
  Deterministic gaps        : 1

Deterministic Knowledge Gaps (no human source):
  • function: handle_concurrent_writes (lines 47–61)
      Seeded assumption: concurrent write handling — AI chose this arbitrarily

  Resolve: python poc.py add src/utils/csv_exporter.py
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;poc.py trace&lt;/code&gt; is what I use the most for the full attribution picture. This is what it looks like on a real file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Provenance trace: src/utils/csv_exporter.py
────────────────────────────────────────────────────────────
  [HIGH]  @juliandeangelis on github
          Spec Driven Development at MercadoLibre
          https://github.com/mercadolibre/sdd-docs
          Insight: separate functional from technical spec

  [MEDIUM] @tannerlinsley on github
           Cursor pagination discussion
           https://github.com/TanStack/query/discussions/123
           Insight: cursor beats offset for live-updating datasets

Knowledge gaps (AI-synthesized, no human source):
  • Error retry strategy — no human source cited
  • CSV column ordering — AI chose this arbitrarily
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The GitHub Action is for teams that already find the trace valuable
&lt;/h2&gt;

&lt;p&gt;Once you've used &lt;code&gt;poc.py trace&lt;/code&gt; enough times that it's saved you real hours — that's when you push the GitHub Action. Not before.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add .github/ .poc/config.json poc.py
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"chore: add proof-of-contribution"&lt;/span&gt;
git push
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, every PR gets checked. If a developer submits AI-assisted code without an &lt;code&gt;## 🤖 AI Provenance&lt;/code&gt; section in the PR description, the action fails and posts a comment explaining what's needed.&lt;/p&gt;

&lt;p&gt;The opt-out is simple: write &lt;code&gt;100% human-written&lt;/code&gt; anywhere in the PR body and the check skips.&lt;/p&gt;

&lt;p&gt;The enforcement works because the tool already saved them hours before they turned it on. The PR check isn't introducing friction — it's standardizing something people already want to do. That's the only version of a mandate that doesn't get gamed.&lt;/p&gt;




&lt;h2&gt;
  
  
  It works with spec-writer
&lt;/h2&gt;

&lt;p&gt;I built &lt;a href="https://github.com/dannwaneri/spec-writer" rel="noopener noreferrer"&gt;spec-writer&lt;/a&gt; first. It turns vague feature requests into structured specs, technical plans, and task breakdowns before the agent starts building. The problem spec-writer solves is ambiguity &lt;em&gt;before&lt;/em&gt; the code exists.&lt;/p&gt;

&lt;p&gt;proof-of-contribution solves attribution &lt;em&gt;after&lt;/em&gt; the code exists.&lt;/p&gt;

&lt;p&gt;They connect at the assumption layer. spec-writer generates an assumptions list — every implicit decision the AI made that you didn't specify, impact-rated, with guidance on when to correct it. Each correction can now carry a citation. Each citation becomes a node in the knowledge graph. By the time a developer runs &lt;code&gt;poc.py trace&lt;/code&gt; on a finished module, the full chain is visible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;feature request → spec decision → human source → code artifact
                                       ↑
                              poc.py verify closes this loop
                              without asking the AI what it missed
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That chain is what I mean when I say AI should be a pointer to human expertise. Not a replacement. A pointer.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why 2026 is the right time to build this
&lt;/h2&gt;

&lt;p&gt;The tools are mature. Coding agents are shipping code at scale. The question of "who is responsible for this output?" is becoming real — in teams, in code reviews, in enterprise audits.&lt;/p&gt;

&lt;p&gt;The provenance infrastructure doesn't exist yet. git blame tells you who committed. It doesn't tell you what human knowledge shaped the decision. That gap is getting wider every month.&lt;/p&gt;

&lt;p&gt;proof-of-contribution is one piece of the infrastructure. It's not the whole answer. But it's the piece I could build, and it's the piece I think matters most: keeping the humans whose knowledge powers AI visible in the artifacts AI produces.&lt;/p&gt;




&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.claude/skills
git clone https://github.com/dannwaneri/proof-of-contribution.git ~/.claude/skills/proof-of-contribution
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works with Claude Code, Cursor, Gemini CLI, and any agent that supports the Agent Skills standard.&lt;/p&gt;

&lt;p&gt;Repo: &lt;a href="https://github.com/dannwaneri/proof-of-contribution" rel="noopener noreferrer"&gt;github.com/dannwaneri/proof-of-contribution&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>tooling</category>
      <category>opensource</category>
      <category>claudecode</category>
    </item>
    <item>
      <title>Drilling boreholes taught me something AI keeps reminding me of.The formula can be perfect.

The model can be correct.

Yet the result is still dry.Because the assumption about the ground (or the environment) was wrong.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Fri, 10 Apr 2026 15:43:30 +0000</pubDate>
      <link>https://dev.to/dannwaneri/drilling-boreholes-taught-me-something-ai-keeps-reminding-me-ofthe-formula-can-be-perfect-the-2nof</link>
      <guid>https://dev.to/dannwaneri/drilling-boreholes-taught-me-something-ai-keeps-reminding-me-ofthe-formula-can-be-perfect-the-2nof</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/dannwaneri/the-formula-was-exact-the-assumption-was-wrong-thats-not-an-ai-problem-58dm" class="crayons-story__hidden-navigation-link"&gt;The Formula Was Exact. The Assumption Was Wrong. That's Not an AI Problem.&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
      &lt;a href="https://dev.to/dannwaneri/the-formula-was-exact-the-assumption-was-wrong-thats-not-an-ai-problem-58dm" class="crayons-article__context-note crayons-article__context-note__feed"&gt;&lt;p&gt;Geology lessons on the limits of modeling&lt;/p&gt;

&lt;/a&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/dannwaneri" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3606168%2Fa6853c87-4daa-4bea-87e8-d1bd0d8a59d7.jpg" alt="dannwaneri profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/dannwaneri" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Daniel Nwaneri
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Daniel Nwaneri
                
              
              &lt;div id="story-author-preview-content-3477706" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/dannwaneri" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3606168%2Fa6853c87-4daa-4bea-87e8-d1bd0d8a59d7.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Daniel Nwaneri&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/dannwaneri/the-formula-was-exact-the-assumption-was-wrong-thats-not-an-ai-problem-58dm" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 10&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/dannwaneri/the-formula-was-exact-the-assumption-was-wrong-thats-not-an-ai-problem-58dm" id="article-link-3477706"&gt;
          The Formula Was Exact. The Assumption Was Wrong. That's Not an AI Problem.
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/webdev"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;webdev&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/career"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;career&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/dannwaneri/the-formula-was-exact-the-assumption-was-wrong-thats-not-an-ai-problem-58dm" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/raised-hands-74b2099fd66a39f2d7eed9305ee0f4553df0eb7b4f11b01b6b1b499973048fe5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;23&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/dannwaneri/the-formula-was-exact-the-assumption-was-wrong-thats-not-an-ai-problem-58dm#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              9&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            5 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>devjournal</category>
      <category>learning</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>The Formula Was Exact. The Assumption Was Wrong. That's Not an AI Problem.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Fri, 10 Apr 2026 15:07:48 +0000</pubDate>
      <link>https://dev.to/dannwaneri/the-formula-was-exact-the-assumption-was-wrong-thats-not-an-ai-problem-58dm</link>
      <guid>https://dev.to/dannwaneri/the-formula-was-exact-the-assumption-was-wrong-thats-not-an-ai-problem-58dm</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;Your geology will always govern your geophysics.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;My lecturer said it once. I wrote it down. I didn't fully understand it yet.&lt;/p&gt;

&lt;p&gt;I do now. &lt;/p&gt;




&lt;h2&gt;
  
  
  What He Meant
&lt;/h2&gt;

&lt;p&gt;We were studying Vertical Electrical Sounding at the Federal University of Technology Owerri. VES is how you read the earth without drilling it — you send current into the ground, measure how it returns, and infer what's down there from the resistivity curves. Clean method. Decades of field use. Textbook technique.&lt;/p&gt;

&lt;p&gt;But the method assumes something. It assumes the layers beneath you are horizontal, homogeneous, well-behaved. The formula works perfectly under those conditions. Run your numbers, get your model, trust the output.&lt;/p&gt;

&lt;p&gt;Except Nigeria's basement complex isn't horizontal or homogeneous. It's fractured. Laterally variable. Full of structural surprises that don't announce themselves in your data. You can run perfect VES and still drill a dry borehole — not because the method failed, but because you trusted a reading without interrogating the ground it came from.&lt;/p&gt;

&lt;p&gt;The geology is always prior. The geophysics only tells you what's there if you already understand the conditions under which it's operating. Skip that step and the output is confidently wrong.&lt;/p&gt;

&lt;p&gt;I spent a semester learning this in a classroom. Then I spent six months at the Nigerian Geological Survey Agency watching it happen in the field — solid mineral surveys, real terrain, results that came back and either confirmed your model or told you your assumptions were off.&lt;/p&gt;

&lt;p&gt;I didn't know I was learning how to work with AI.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Same Failure, Different Surface
&lt;/h2&gt;

&lt;p&gt;I've been building production systems in Port Harcourt for years now. Cloudflare Workers, RAG pipelines, MCP servers, edge infrastructure for users where bandwidth costs real money per request.&lt;/p&gt;

&lt;p&gt;And I keep watching the same failure repeat, dressed in different code.&lt;/p&gt;

&lt;p&gt;Developer gets AI-generated output. Output looks right. Tests pass. Ships to production. Fails — not catastrophically, but wrongly. Subtly. In ways that take days to trace.&lt;/p&gt;

&lt;p&gt;The model wasn't broken. The formula was fine. The geology was different.&lt;/p&gt;

&lt;p&gt;AI generates for the environment its training data came from — abundant compute, fast connections, forgiving infrastructure, users on fast networks in cities where the grid is reliable. That's the assumed geology. Most of the time, nobody states that assumption. Nobody interrogates it. The output arrives confident and gets treated as ground truth.&lt;/p&gt;

&lt;p&gt;When I built a VPN service targeting Nigerian users, the AI-suggested architecture was technically correct and completely wrong. Correct for the assumed geology. Wrong for mine. The difference wasn't a bug you could find with a linter. It was a mismatch between what the model assumed about the world and what the world actually was.&lt;/p&gt;

&lt;p&gt;That gap — between assumed geology and actual geology — is where production failures live.&lt;/p&gt;




&lt;h2&gt;
  
  
  The VES Lesson Nobody Teaches in CS
&lt;/h2&gt;

&lt;p&gt;VES nearly died as a professional discipline in the 1980s and 1990s.&lt;/p&gt;

&lt;p&gt;Not because the physics were wrong. The physics were exact. It died because practitioners kept getting bad results — boreholes drilled on confident readings that came back dry. Clients stopped trusting the method. The reputation collapsed.&lt;/p&gt;

&lt;p&gt;The post-mortem was brutal in its simplicity: the formula was exact. The assumption was wrong. Geophysicists had been applying a method that required horizontal homogeneity to terrain that wasn't horizontally homogeneous. The model was rigorous. The geology was inconvenient.&lt;/p&gt;

&lt;p&gt;They fixed it — better field protocols, more explicit assumption-checking, ground-truthing before committing to a reading. The method recovered. But only after the field admitted that confident output isn't the same as correct output.&lt;/p&gt;

&lt;p&gt;We are at that moment with AI.&lt;/p&gt;

&lt;p&gt;The models are rigorous. The outputs are confident. And we are shipping to production without interrogating the geology — the actual environment, the actual users, the actual constraints — that the output will have to survive in.&lt;/p&gt;

&lt;p&gt;Ben Santora has been stress-testing LLMs with logic puzzles designed to expose reasoning failures. His finding: most models are solvers, not judges. They produce an answer. They don't flag when the assumed conditions don't match the actual problem.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Knowledge collapse happens when solver output is recycled without a strong, independent judging layer to validate it."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The judging layer is the geologist's job. It always was.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Field Work Actually Trains
&lt;/h2&gt;

&lt;p&gt;I did my industrial training at NGSA in 2015. Solid mineral surveys. Real field conditions. Mineralogy in the lab, VES in the terrain.&lt;/p&gt;

&lt;p&gt;The thing fieldwork does that coursework doesn't is this: it makes the gap between model and ground visible in real time. You take your reading. You record your resistivity curves. You run your interpretation. Then you go back the next day and find out if the borehole hit water or came back dry.&lt;/p&gt;

&lt;p&gt;That feedback loop — model, prediction, ground truth, reckoning — is what builds the instinct to hold your interpretation lightly. Not to distrust the method. To distrust the assumption.&lt;/p&gt;

&lt;p&gt;When I ran my SEO audit agent against my own published content this month — seven URLs, seven FAILs — I wasn't surprised. I'd built the agent, I knew what it was checking, and I ran it on myself first because that's the only version of a demo I trust. The agent was right. Three freeCodeCamp tutorials had broken meta descriptions. Two DEV.to article titles were too long for Google to render cleanly.&lt;/p&gt;

&lt;p&gt;That reflex — interrogate the output before you trust it — isn't something I learned from a JavaScript tutorial. It came from standing in terrain that didn't match the model and having to explain why.&lt;/p&gt;

&lt;p&gt;The same thing happened when I shipped The Foundation's clipboard capture in February. The workflow looked right. I documented it, wrote the article, shipped to GitHub. It was broken — capturing only user messages, missing every AI response, missing everything above the visible viewport. I'd reviewed it at the same speed I built it. The geology was inconvenient. I didn't check.&lt;/p&gt;

&lt;p&gt;Five days later I wrote publicly: &lt;em&gt;"I launched The Foundation with big plans. But I underestimated the scope."&lt;/em&gt; Not in a GitHub issue. On DEV.to. The well came back dry. You say so.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Reframe
&lt;/h2&gt;

&lt;p&gt;The conversation about AI in software development keeps getting stuck on the wrong question.&lt;/p&gt;

&lt;p&gt;Not: &lt;em&gt;is the model capable?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The model is capable. That's not the problem.&lt;/p&gt;

&lt;p&gt;The question is: &lt;em&gt;does the model know your geology?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;AI-generated code is optimised for an assumed environment. Plenty of RAM. Reliable connectivity. Users on fast networks. Infrastructure that forgives. Most of the time, nobody states this assumption — it's baked into the training data, invisible until the output meets terrain it wasn't built for.&lt;/p&gt;

&lt;p&gt;The developers who catch this aren't necessarily the most experienced. They're the ones who learned — somewhere, from something — to name the geology before trusting the reading.&lt;/p&gt;

&lt;p&gt;In the field: name the geology before you trust the reading.&lt;/p&gt;

&lt;p&gt;In production: name the environment before you trust the output.&lt;/p&gt;

&lt;p&gt;Same question. Different surface.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Non-CS Backgrounds Actually Transfer
&lt;/h2&gt;

&lt;p&gt;The argument I keep hearing: &lt;em&gt;your background doesn't matter, code is code.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;It's wrong. And it misses the point.&lt;/p&gt;

&lt;p&gt;What transfers from geophysics isn't syntax knowledge. It's the prior question. The one you ask before you trust the output.&lt;/p&gt;

&lt;p&gt;CS tracks teach you to evaluate whether the code is correct. They don't train the instinct to ask whether the assumed conditions match the actual ones. That instinct comes from fields where the gap between model and ground is visible, expensive, and immediately yours to own.&lt;/p&gt;

&lt;p&gt;The guts come from somewhere. For some people it's painful production failures. For some it's a good mentor. For me it was a lecturer in Owerri who said one sentence I've never stopped thinking about.&lt;/p&gt;

&lt;p&gt;Your geology will always govern your geophysics.&lt;/p&gt;

&lt;p&gt;The model doesn't know your terrain. That's not a limitation to wait out. It's a gap you have to close yourself — every time, before you ship.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>career</category>
      <category>python</category>
    </item>
    <item>
      <title>I Ran My Own SEO Agent on My Two Domains — It Went from 0/4 to 4/4 PASS in One Afternoon</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 07 Apr 2026 15:12:57 +0000</pubDate>
      <link>https://dev.to/dannwaneri/i-ran-my-own-seo-agent-on-my-two-domains-it-went-from-04-to-44-pass-in-one-afternoon-39an</link>
      <guid>https://dev.to/dannwaneri/i-ran-my-own-seo-agent-on-my-two-domains-it-went-from-04-to-44-pass-in-one-afternoon-39an</guid>
      <description>&lt;p&gt;&lt;code&gt;invoice.naija-vpn.com&lt;/code&gt; was serving the Carter Efe $50K/month Twitch story as its meta description. That page is an invoice generator tool for Nigerian freelancers. Nothing about it has anything to do with Carter Efe.&lt;/p&gt;

&lt;p&gt;The agent caught it on the first run. A scraper wouldn't have — it reads raw HTML before JavaScript executes. The agent uses Playwright, reads the rendered DOM, sees what a browser sees after all the scripts run. The homepage content was leaking in dynamically. The scraper sees an empty description. The agent sees the actual problem.&lt;/p&gt;

&lt;p&gt;I own both domains — naija-vpn.com and naijavpn.com, a Virtual Payment Navigator for Nigerian freelancers and creators. I ran the agent on my own sites to see what it would find.&lt;/p&gt;




&lt;h3&gt;
  
  
  What the agent found
&lt;/h3&gt;

&lt;p&gt;Four URLs. Four FAILs.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;th&gt;Title&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Overall&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;naija-vpn.com/&lt;/td&gt;
&lt;td&gt;FAIL — 65 chars&lt;/td&gt;
&lt;td&gt;FAIL — 185 chars&lt;/td&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;naijavpn.com/&lt;/td&gt;
&lt;td&gt;FAIL — 66 chars&lt;/td&gt;
&lt;td&gt;FAIL — 192 chars&lt;/td&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;/cleva-vs-geegpay&lt;/td&gt;
&lt;td&gt;FAIL — 63 chars&lt;/td&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;invoice.naija-vpn.com&lt;/td&gt;
&lt;td&gt;FAIL — no page-specific meta&lt;/td&gt;
&lt;td&gt;FAIL — no page-specific meta&lt;/td&gt;
&lt;td&gt;FAIL&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The homepage title was &lt;code&gt;NaijaVPN - Virtual Payment Navigator for Nigerian Freelancers&lt;/code&gt; — 65 characters when the display limit is 60. The description was 185 characters when the limit is 160.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;invoice.naija-vpn.com&lt;/code&gt; subdomain was the worst finding. It's a separate subdomain — Google treats it as its own site — but it had no page-specific metadata at all. The agent was reading &lt;code&gt;Carter Efe makes $50K/month from Twitch — here's how he gets paid in Nigeria...&lt;/code&gt; as the meta description for an invoice generator tool. The subdomain is a React SPA and the static HTML &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; had no &lt;code&gt;&amp;lt;meta name="description"&amp;gt;&lt;/code&gt; tag. Homepage content was leaking in dynamically after JavaScript executed.&lt;/p&gt;

&lt;p&gt;A requests-based scraper would have missed this entirely. It reads the raw HTML before JavaScript executes and would have seen an empty description, not the leaking homepage copy. The agent uses Playwright — it reads the rendered DOM, the same page a browser sees after all the scripts run. That's the difference.&lt;/p&gt;




&lt;h3&gt;
  
  
  The cost curve routing on real pages
&lt;/h3&gt;

&lt;p&gt;The audit ran in tiered mode. Here's what the routing looked like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;naija-vpn.com/&lt;/code&gt; → Tier 1 — deterministic, zero API cost&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;naijavpn.com/&lt;/code&gt; → Tier 1 — canonical pointing correctly to naija-vpn.com, clean&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/cleva-vs-geegpay&lt;/code&gt; → Tier 1 — clear structure, no escalation needed&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;invoice.naija-vpn.com&lt;/code&gt; → Tier 2 (Haiku) — title on the boundary, needed a closer look&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Three tiers, four pages, one run. Total API cost for the audit: under $0.05.&lt;/p&gt;




&lt;h3&gt;
  
  
  What I actually changed
&lt;/h3&gt;

&lt;p&gt;The rewrite agent ran next — same cost curve applied to the suggestions. Tier 1 truncated the titles deterministically. Haiku generated description suggestions. Sonnet wrote the voice-matched copy for the invoice subdomain opening paragraph.&lt;/p&gt;

&lt;p&gt;Three changes I deployed:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Homepage title:&lt;/strong&gt; &lt;code&gt;NaijaVPN - Get Paid Internationally in Nigeria&lt;/code&gt; — 46 characters. Cut the bloat, kept the value proposition.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Homepage description:&lt;/strong&gt; Trimmed from 185 to 156 characters. The agent's suggestion preserved the Carter Efe reference — that's the hook that makes people click — but cut everything that was padding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;invoice.naija-vpn.com&lt;/code&gt;:&lt;/strong&gt; Added a proper &lt;code&gt;&amp;lt;meta name="description"&amp;gt;&lt;/code&gt; to the static HTML &lt;code&gt;&amp;lt;head&amp;gt;&lt;/code&gt; of the React SPA — the fix lives in &lt;code&gt;index.html&lt;/code&gt;, not in the JavaScript. One line of HTML. The agent's suggested copy: &lt;em&gt;"Generate professional invoices instantly. Receive international payments in Nigeria with Geegpay, Payoneer and Cleva. Free invoice tool for Nigerian freelancers."&lt;/em&gt; — 158 characters, PASS.&lt;/p&gt;




&lt;h3&gt;
  
  
  The verification run
&lt;/h3&gt;

&lt;p&gt;Reset state. Re-ran the audit.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;th&gt;Title&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Overall&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;naija-vpn.com/&lt;/td&gt;
&lt;td&gt;PASS — 46 chars&lt;/td&gt;
&lt;td&gt;PASS — 156 chars&lt;/td&gt;
&lt;td&gt;Tier 1&lt;/td&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;naijavpn.com/&lt;/td&gt;
&lt;td&gt;PASS — 46 chars&lt;/td&gt;
&lt;td&gt;PASS — 156 chars&lt;/td&gt;
&lt;td&gt;Tier 1&lt;/td&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;/cleva-vs-geegpay&lt;/td&gt;
&lt;td&gt;PASS — 52 chars&lt;/td&gt;
&lt;td&gt;PASS — 156 chars&lt;/td&gt;
&lt;td&gt;Tier 1&lt;/td&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;invoice.naija-vpn.com&lt;/td&gt;
&lt;td&gt;PASS — 50 chars&lt;/td&gt;
&lt;td&gt;PASS — 149 chars&lt;/td&gt;
&lt;td&gt;Tier 2&lt;/td&gt;
&lt;td&gt;PASS&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;4/4. Zero flags.&lt;/p&gt;

&lt;p&gt;The verification run hit Tier 1 on every page — pure deterministic Python, zero API calls. That's the cost curve completing its own argument: the first run used Haiku and Sonnet because the issues were ambiguous enough to need judgment. The verification run used nothing because the fixes were mechanical and the checks are mechanical. The model cost dropped to zero not because the run was cheaper, but because the problems were gone.&lt;/p&gt;




&lt;h3&gt;
  
  
  The finding that mattered most
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;invoice.naija-vpn.com&lt;/code&gt; subdomain inheriting homepage metadata is a serious SEO problem. Google treats subdomains as separate sites — so this wasn't "duplicate metadata on the same site." It was a standalone subdomain with no metadata of its own, rendering the homepage's Carter Efe story as the description for an invoice tool.&lt;/p&gt;

&lt;p&gt;The agent caught it because it reads the rendered DOM. A scraper reads raw HTML — it would have seen an empty description and flagged the absence, not the leak. The agent saw what a real browser sees: the homepage content dynamically injected after JavaScript executed.&lt;/p&gt;

&lt;p&gt;The agent found it automatically, on the first run, and proposed a fix.&lt;/p&gt;

&lt;p&gt;That's the actual value. Not the character counts. The automated surface of a problem that was invisible to a quick look at the page.&lt;/p&gt;




&lt;p&gt;Two full audit passes plus the rewrite run: under $0.15 total. Four pages, three runs, one afternoon.&lt;/p&gt;

&lt;p&gt;The free core at &lt;a href="https://github.com/dannwaneri/seo-agent" rel="noopener noreferrer"&gt;dannwaneri/seo-agent&lt;/a&gt; handles the audit and the basic report. The PDF with per-page screenshots, severity-sorted issues, and the rewrite suggestions are in the Pro layer. The cost curve runs in both.&lt;/p&gt;

</description>
      <category>python</category>
      <category>automation</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Was Paying $0.006 Per URL for SEO Audits Until I Realized Most Needed $0</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Fri, 03 Apr 2026 14:24:02 +0000</pubDate>
      <link>https://dev.to/dannwaneri/i-was-paying-0006-per-url-for-seo-audits-until-i-realized-most-needed-0-132j</link>
      <guid>https://dev.to/dannwaneri/i-was-paying-0006-per-url-for-seo-audits-until-i-realized-most-needed-0-132j</guid>
      <description>&lt;p&gt;&lt;a href="https://dev.to/pascal_cescato_692b7a8a20"&gt;Pascal CESCATO&lt;/a&gt; read my SEO audit agent piece and left this in the comments:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"You don't need an LLM for this. Everything you're sending to Claude can be done directly in Python — zero cost, fully deterministic, no hallucination risk."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;He was right. And wrong. And the conversation that followed is the reason I rebuilt the entire thing.&lt;/p&gt;




&lt;h3&gt;
  
  
  What Pascal Actually Said
&lt;/h3&gt;

&lt;p&gt;The audit agent I published checks title length, meta description length, H1 count, and canonical tags. Pascal's point: those are character counts and presence checks. A regex does that. You don't pay $0.006 per URL for a regex.&lt;/p&gt;

&lt;p&gt;I pushed back. The &lt;code&gt;flags&lt;/code&gt; array requires judgment — "title reads like a navigation label rather than a page description" isn't a character count. Pascal conceded, then reframed:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Two-pass makes more sense. Deterministic Python for binary checks, model call only on pages that pass the mechanical audit but need a second look. You pay per genuinely ambiguous case, not per URL."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's a better architecture than what I shipped. I said so publicly.&lt;/p&gt;

&lt;p&gt;Then &lt;a href="https://dev.to/aiforwork"&gt;Julian Oczkowski&lt;/a&gt; extended it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Deterministic rules first, lightweight models for triage, larger models reserved for genuinely ambiguous edge cases. Keeps latency low, costs predictable, reduces unnecessary LLM dependency."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Three people in a comment thread had just designed something I hadn't thought to name. Pascal called it two-pass. Julian called it tiered. I called it the cost curve — a sliding scale from free to expensive, routed by what the task actually requires.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Cost Curve
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Tier 1 — Deterministic Python. Cost: $0.&lt;/strong&gt;&lt;br&gt;
Title &amp;gt;60 characters? FAIL. Description missing? FAIL. H1 count == 0? FAIL. These are not judgment calls. A model that can reason about Shakespeare does not need to be invoked to count to 60.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 2 — Haiku. Cost: ~$0.0001 per URL.&lt;/strong&gt;&lt;br&gt;
Title present but 4 characters long. Description present but 30 characters. Status code is a redirect. These pass the mechanical audit but something is off. Haiku is cheap enough that calling it for ambiguous cases costs less than the time you'd spend debugging why the deterministic check missed something.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tier 3 — Sonnet. Cost: ~$0.006 per URL.&lt;/strong&gt;&lt;br&gt;
Pages Haiku flags as needing semantic judgment. "This title passes length but reads like a navigation label." "This description duplicates the title verbatim." Sonnet earns its cost here. Not everywhere.&lt;/p&gt;

&lt;p&gt;The insight is routing. Most pages on a typical agency site have mechanical issues — missing descriptions, long titles, no canonical. Those never need a model. The interesting cases — pages that pass every binary check but still feel wrong — are where the model earns its place.&lt;/p&gt;

&lt;p&gt;On my last run of 50 URLs, 8 reached Sonnet. The rest resolved at Tier 1. Total cost dropped from ~$0.30 to ~$0.05. The 8 that hit Sonnet were the ones worth paying for.&lt;/p&gt;


&lt;h3&gt;
  
  
  What I Actually Built
&lt;/h3&gt;

&lt;p&gt;I restructured the entire repo around this architecture.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;core/&lt;/code&gt; stays flat and MIT licensed. The original seven modules untouched. Anyone who cloned v1 still runs &lt;code&gt;python core/index.py&lt;/code&gt; and gets identical behavior.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;premium/&lt;/code&gt; adds four modules. &lt;code&gt;cost_curve.py&lt;/code&gt; handles the tier routing — &lt;code&gt;audit_url(snapshot, tiered=True)&lt;/code&gt; runs Tier 1 first, escalates to Haiku if something fails, escalates to Sonnet only if Haiku flags semantic ambiguity. &lt;code&gt;multi_client.py&lt;/code&gt; manages project folders — &lt;code&gt;--project acme&lt;/code&gt; reads and writes from &lt;code&gt;projects/acme/&lt;/code&gt; with isolated state, input, and reports. &lt;code&gt;enhanced_reporter.py&lt;/code&gt; generates WeasyPrint PDFs with per-URL screenshots, issues sorted by severity, and suggested fixes. &lt;code&gt;rewrite_agent.py&lt;/code&gt; is the one Pascal didn't anticipate — after the audit, &lt;code&gt;--rewrite&lt;/code&gt; generates improvement suggestions using the same cost curve: Tier 1 truncates titles deterministically, Haiku writes meta description suggestions, Sonnet rewrites opening paragraphs. Pass &lt;code&gt;--voice-sample ./my-writing.txt&lt;/code&gt; and the prompt includes a sample of your writing. The suggestions sound like you, not like Claude.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;main.py&lt;/code&gt; is the unified entry point. Free users run &lt;code&gt;python main.py&lt;/code&gt; and get v1 behavior. Pro users add &lt;code&gt;--pro&lt;/code&gt;, set &lt;code&gt;SEO_AGENT_LICENSE&lt;/code&gt;, unlock the premium layer.&lt;/p&gt;

&lt;p&gt;A full pro run looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python main.py &lt;span class="nt"&gt;--project&lt;/span&gt; client-x &lt;span class="nt"&gt;--pro&lt;/span&gt; &lt;span class="nt"&gt;--tiered&lt;/span&gt; &lt;span class="nt"&gt;--rewrite&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That single command: reads from &lt;code&gt;projects/client-x/input.csv&lt;/code&gt;, routes every URL through the cost curve, generates rewrite suggestions for failing pages, writes a PDF report with screenshots and severity levels, and appends a run record to the audit history.&lt;/p&gt;




&lt;h3&gt;
  
  
  The Architecture Decision That Matters
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;core/&lt;/code&gt; imports nothing from &lt;code&gt;premium/&lt;/code&gt;. Ever.&lt;/p&gt;

&lt;p&gt;This isn't just clean code. It's a trust contract with anyone who forks the repo. The MIT-licensed core is the public good — auditable, forkable, accepts PRs. The premium layer is proprietary and closed, but it builds on a foundation anyone can inspect.&lt;/p&gt;

&lt;p&gt;Mads Hansen in the comments named the right question: how do you prevent the auditor from developing blind spots on recurring patterns? The answer I didn't have then: run history in &lt;code&gt;state.json&lt;/code&gt;. Each completed run appends a record — timestamps, pass counts, fail counts, report path. Over time, "this page has failed description length for six consecutive runs" is a different signal than "this page failed today." The audit becomes a monitor. Not just a snapshot.&lt;/p&gt;




&lt;h3&gt;
  
  
  What the Comment Thread Cost
&lt;/h3&gt;

&lt;p&gt;Nothing. And produced a better architecture than I would have built alone.&lt;/p&gt;

&lt;p&gt;Pascal's pushback separated the deterministic from the semantic. Julian's production framing gave me the three-tier structure. &lt;a href="https://dev.to/apex_stack"&gt;Apex Stack&lt;/a&gt;'s 89K-page site showed me where the orphan detection problem lives. &lt;a href="https://dev.to/mads_hansen_27b33ebfee4c9"&gt;Mads Hansen&lt;/a&gt; named the blind spot question I hadn't asked.&lt;/p&gt;

&lt;p&gt;None of that was in the original article. All of it is in the repo now.&lt;/p&gt;

&lt;p&gt;The public comment thread is the architecture review I didn't schedule. That's what happens when you publish the honest version — the demo that fails on your own content — instead of the staged one.&lt;/p&gt;

&lt;p&gt;The staged demo would have passed. The honest one compounded.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Full repo: &lt;a href="https://github.com/dannwaneri/seo-agent" rel="noopener noreferrer"&gt;dannwaneri/seo-agent&lt;/a&gt;. Core is MIT. Premium requires a license key. The freeCodeCamp tutorial covers the v1 build in detail: &lt;a href="https://www.freecodecamp.org/news/how-to-build-a-local-seo-audit-agent-with-browser-use-and-claude-api" rel="noopener noreferrer"&gt;How to Build a Local SEO Audit Agent with Browser Use and Claude API&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>ai</category>
      <category>webdev</category>
      <category>automation</category>
    </item>
    <item>
      <title>Agents Don't Just Do Unauthorized Things. They Cause Humans to Do Unauthorized Things.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Mon, 30 Mar 2026 14:36:52 +0000</pubDate>
      <link>https://dev.to/dannwaneri/agents-dont-just-do-unauthorized-things-they-cause-humans-to-do-unauthorized-things-51j4</link>
      <guid>https://dev.to/dannwaneri/agents-dont-just-do-unauthorized-things-they-cause-humans-to-do-unauthorized-things-51j4</guid>
      <description>&lt;p&gt;A comment thread shouldn't produce original research. This one did.&lt;/p&gt;

&lt;p&gt;Last week I published a piece about agent governance — the gap between what an agent is authorized to do and what it effectively becomes authorized to do through accumulated actions. I used a quant fund analogy: five independent strategies, each within its own risk limits, collectively overweight the same sector. No single decision was wrong. The aggregate outcome was unauthorized.&lt;/p&gt;

&lt;p&gt;The comment section built something I hadn't anticipated.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://dev.to/kalpaka"&gt;Kalpaka&lt;/a&gt;,&lt;a href="https://dev.to/vicchen"&gt;Vic Chen&lt;/a&gt;, &lt;a href="https://dev.to/acytryn"&gt;Andre Cytryn&lt;/a&gt;, &lt;a href="https://dev.to/euphorie"&gt;Stephen Lee&lt;/a&gt;, and &lt;a href="https://dev.to/connor_gallic"&gt;Connor Gallic&lt;/a&gt; spent three days extending the argument in directions I hadn't gone. What follows is an attempt to assemble what they built — with attribution, because the thread earned it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Unit Problem
&lt;/h2&gt;

&lt;p&gt;The first thing Kalpaka named was the hardest: what do you measure?&lt;/p&gt;

&lt;p&gt;Actions taken is too noisy. Resources touched is closer but misses compounding. The unit that matters, they argued, is state delta — the diff between what the system could affect before and after a given action sequence.&lt;/p&gt;

&lt;p&gt;A vendor approval doesn't just place an order. It creates a supply chain dependency. A database read doesn't just return rows. It establishes an access pattern the agent now relies on.&lt;/p&gt;

&lt;p&gt;So the audit object isn't an action log. It's a capability graph — resources and permissions as nodes, actually-exercised access paths as edges. Reconciliation compares the declared graph against the exercised one. The delta is your behavioral exposure.&lt;/p&gt;

&lt;p&gt;This is the frame that makes the quant fund analogy precise rather than decorative. A 13F filing is exactly this: a reconciliation between declared strategy and actual exposure, anchored to external cadence rather than internal computation speed. The quarterly filing forces the moment when someone has to look at the full graph, not just the individual positions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Kinds of Edges
&lt;/h2&gt;

&lt;p&gt;The capability graph has a shape problem I hadn't considered.&lt;/p&gt;

&lt;p&gt;Direct edges are legible. The agent called the database, touched the file, invoked the API. These edges can be tracked. They decay naturally — an access pattern not exercised in N reconciliation cycles can be pruned. The agent hasn't used that path, so it drops from the exercised graph.&lt;/p&gt;

&lt;p&gt;Induced edges are different. These are the edges created when an agent's output causes a human to take an action the agent itself didn't take. The Meta Sev 1 incident last week is the exact pattern: the agent didn't write anything unauthorized. It gave advice that caused an engineer to widen access. The exposure persisted for two hours. The agent's direct action log showed nothing unusual.&lt;/p&gt;

&lt;p&gt;Induced edges don't decay the same way direct edges do. The human decision the agent caused doesn't un-happen when the agent stops referencing it. The widened access persists independently of the agent's continued activity. These edges have a much longer half-life — effectively permanent until someone explicitly reconciles them.&lt;/p&gt;

&lt;p&gt;This is where the governance architecture splits. Direct edges decay on a usage clock. Induced edges require active reconciliation. The first can be automated. The second can't — which is exactly where the 13F analogy holds strongest. Quarterly isn't a technical choice. It's a regulatory one. Someone external decided that interval was often enough for fund positions. Agent capability graphs need the same external anchor.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Irreversibility Problem
&lt;/h2&gt;

&lt;p&gt;Andre Cytryn raised threshold drift: you calibrate materiality at deployment time, but the agent's decision space expands with use.&lt;/p&gt;

&lt;p&gt;The per-strategy limits in the fund example weren't wrong at design time. They were wrong at failure time because the correlation structure changed. Continuous recalibration is the obvious fix. But continuous recalibration creates its own attack surface — a mechanism the agent can game structurally, not intentionally. Enough accumulated decisions that each looks within tolerance can shift the calibration baseline until the new thresholds permit what the original thresholds wouldn't.&lt;/p&gt;

&lt;p&gt;The only way out Kalpaka identified: decouple recalibration authority from the agent's own execution context entirely. Thresholds reviewed by something with no stake in the agent's performance.&lt;/p&gt;

&lt;p&gt;This requires knowing which actions are irreversible — but irreversibility is often only visible in retrospect. The Meta incident proves it. Nobody flagged the permission-widening action as irreversible-scope-changing while it was happening. The agent's advice looked like a routine technical suggestion. The irreversibility was only visible after the state had already changed.&lt;/p&gt;

&lt;p&gt;Kalpatha's resolution was to invert the default assumption. Don't try to classify irreversibility upfront. Treat every induced edge as irreversible. Over-reconcile first, and let the reconciliation history generate the labeled data you need to build the upstream classifier later.&lt;/p&gt;

&lt;p&gt;This is how financial compliance actually bootstrapped. Early fund compliance didn't start with sophisticated materiality filters. They reconciled everything quarterly and learned which position changes were material through decades of accumulated review history. The filters came from the data. The data came from over-reconciling.&lt;/p&gt;

&lt;p&gt;We're in the over-reconcile phase. Expensive, noisy, generates false positives. But it's the only way to produce the labeled examples that make the classifier possible later. Trying to solve the detection problem at inception is trying to skip the generation of training data.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Fatigue Problem
&lt;/h2&gt;

&lt;p&gt;Over-reconciling creates a human cost. Every flagged induced state change needs a review decision. Volume in early cycles will be high by design.&lt;/p&gt;

&lt;p&gt;The compliance history the 13F analogy draws on has a second chapter nobody likes to mention: the review process degrades under load. Humans start rubber-stamping. False positives stop being caught. The labeled data gets noisy before the classifier can be trained on it.&lt;/p&gt;

&lt;p&gt;Kalpaka's answer was adaptive rather than prescribed: track reviewer consistency over time. Same reviewer, same flag type — does the approval rate shift as volume increases? That's your fatigue signal. Once you can detect it, you don't need a hard cap. You have evidence-based throttling.&lt;/p&gt;

&lt;p&gt;The auditor rotation parallel holds here too. Financial auditors rotate precisely because of this problem. Fresh eyes are the simplest fatigue mitigation. The question for agent behavioral review is whether there are enough qualified reviewers to rotate through.&lt;/p&gt;

&lt;p&gt;That's harder than it sounds. Financial auditors share a professional vocabulary — they all know what material means, what a restatement implies, what a concentration risk looks like. That shared language is what makes fresh eyes useful. Agent behavioral accounting doesn't have that vocabulary yet. The capability graph framing this thread built is an attempt to construct it. Until reviewers have a shared framework for what "consequential scope change" looks like in practice, rotating fresh eyes doesn't transfer the same way.&lt;/p&gt;

&lt;p&gt;The qualified reviewer problem might be the bootstrapping constraint underneath the bootstrapping constraint.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Enforcement Floor
&lt;/h2&gt;

&lt;p&gt;Connor Gallic brought the piece back to earth.&lt;/p&gt;

&lt;p&gt;Theory is one thing. What teams actually ship is another. The simplest version of enforcement is a write checkpoint: every agent action that changes state passes through a policy evaluation before it executes. Not token monitoring, not post-hoc audit. A deterministic gate between intent and action.&lt;/p&gt;

&lt;p&gt;The aggregate authorization problem gets simpler when you centralize that policy layer. If every agent's writes flow through the same checkpoint, cross-agent scope becomes visible — because the governance layer has the view that individual agents don't.&lt;/p&gt;

&lt;p&gt;The 13F analogy works because quarterly filings are boring, reliable, and externally anchored. Agent governance needs the same properties. The hard part isn't the theory. It's making enforcement boring enough that teams actually ship it.&lt;/p&gt;

&lt;p&gt;The write checkpoint wins on that dimension. It's implementable today. It handles direct scope violations cleanly. It's the enforcement floor.&lt;/p&gt;

&lt;p&gt;The capability graph is the ceiling. It handles induced state changes, threshold drift, cross-agent composition. It's more expensive, harder to build, requires the vocabulary that doesn't fully exist yet. It's also necessary for complete governance.&lt;/p&gt;

&lt;p&gt;The right architecture is probably layered: checkpoint as the floor that catches direct violations cheaply, capability graph as the audit layer that catches induced drift over time. You ship the checkpoint first because it's boring enough to actually build. You develop the capability graph because the checkpoint leaves a class of failures uncovered.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Thread Built
&lt;/h2&gt;

&lt;p&gt;None of this was in the original piece.&lt;/p&gt;

&lt;p&gt;The two-clock distinction was there. The 13F analogy was there. The governance debt framing was there.&lt;/p&gt;

&lt;p&gt;The capability graph, the direct/induced edge split, the over-reconcile bootstrap, the adaptive fatigue ceiling, the layered enforcement architecture — those came from Kalpaka, Andre, Vic, Stephen, and Connor over three days in a comment section.&lt;/p&gt;

&lt;p&gt;That's worth naming directly. Not as a courtesy, but because it demonstrates the problem Foundation is designed to solve. This conversation happened in public, in a thread that will eventually scroll off everyone's feed, attributed to usernames rather than preserved as structured knowledge. The architecture it built is worth more than that.&lt;/p&gt;

&lt;p&gt;The agent governance problem doesn't have a consensus solution yet. What it has is a comment thread that got further than most papers. The qualified reviewer problem is still open. The capability graph needs implementation. The adaptive ceiling needs tooling.&lt;/p&gt;

&lt;p&gt;But the vocabulary exists now. That's what the thread produced. And vocabulary, as someone wiser than me wrote recently, turns vague frustration into specific, solvable problems.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;The original piece: &lt;a href="https://dev.to/dannwaneri/your-agent-is-making-decisions-nobody-authorized-2bc7"&gt;Your Agent Is Making Decisions Nobody Authorized&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>architecture</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>I Built a Local AI Agent That Audits My Own Articles. It Flagged Every Single One.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Mon, 30 Mar 2026 14:33:56 +0000</pubDate>
      <link>https://dev.to/dannwaneri/i-built-a-local-ai-agent-that-audits-my-own-articles-it-flagged-every-single-one-pkh</link>
      <guid>https://dev.to/dannwaneri/i-built-a-local-ai-agent-that-audits-my-own-articles-it-flagged-every-single-one-pkh</guid>
      <description>&lt;p&gt;Not as a gotcha. As a result.&lt;/p&gt;

&lt;p&gt;Seven URLs. Seven FAILs.&lt;/p&gt;

&lt;p&gt;My Hashnode profile is missing an H1. Three freeCodeCamp tutorials have meta descriptions that are either missing or over 160 characters. Two DEV.to articles have titles too long for Google to render cleanly.&lt;/p&gt;

&lt;p&gt;I built the agent. I ran it on my own content first. That's the honest version of the demo.&lt;/p&gt;




&lt;h2&gt;
  
  
  The problem I was actually solving
&lt;/h2&gt;

&lt;p&gt;Every digital marketing agency has someone whose job is basically this: open a spreadsheet, visit each client URL, check the title tag, check the description, check the H1, note broken links, paste everything into a report. Repeat weekly.&lt;/p&gt;

&lt;p&gt;That person costs money. The work is deterministic. The only reason it's still manual is that nobody built the alternative.&lt;/p&gt;

&lt;p&gt;I built it in a weekend.&lt;/p&gt;




&lt;h2&gt;
  
  
  The stack
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Browser Use&lt;/strong&gt; — Python-native browser automation. The agent navigates real pages in a visible Chromium window. Not a headless scraper. Persistent sessions, real rendering, the same page a human would see.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude API (Sonnet)&lt;/strong&gt; — reads the page snapshot and returns structured JSON: title status, description status, H1 count, canonical tag, flags. One API call per URL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;httpx&lt;/strong&gt; — async HEAD requests for broken link detection. Capped at 50 links per page, concurrent, 5-second timeout per request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flat JSON files&lt;/strong&gt; — &lt;code&gt;state.json&lt;/code&gt; tracks what's been audited. Interrupt mid-run, restart, it picks up exactly where it stopped. No database needed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Seven Python files. 956 lines total. Runs on a Windows laptop.&lt;/p&gt;




&lt;h2&gt;
  
  
  The part most tutorials skip: HITL
&lt;/h2&gt;

&lt;p&gt;The agent hits a login wall. Throws an exception. Run dies.&lt;/p&gt;

&lt;p&gt;That's most automation tutorials.&lt;/p&gt;

&lt;p&gt;This one doesn't work that way.&lt;/p&gt;

&lt;p&gt;When the agent detects a non-200 status, a redirect to a login page, or a title containing "sign in" or "access denied", it pauses. In interactive mode: skip, retry, or quit. In &lt;code&gt;--auto&lt;/code&gt; mode it skips automatically, logs the URL to &lt;code&gt;needs_human[]&lt;/code&gt; in state, and continues.&lt;/p&gt;

&lt;p&gt;An agent that knows its limits is more useful than one that fails silently. That's the design decision most people don't make because tutorials don't cover it.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the audit actually found
&lt;/h2&gt;

&lt;p&gt;I ran it against my own published content across three platforms:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;URL&lt;/th&gt;
&lt;th&gt;Failing fields&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;hashnode.com/&lt;a class="mentioned-user" href="https://dev.to/dannwaneri"&gt;@dannwaneri&lt;/a&gt;
&lt;/td&gt;
&lt;td&gt;H1 missing&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;freeCodeCamp — how-to-build-your-own-claude-code-skill&lt;/td&gt;
&lt;td&gt;Meta description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;freeCodeCamp — how-to-stop-letting-ai-agents-guess&lt;/td&gt;
&lt;td&gt;Meta description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;freeCodeCamp — build-a-production-rag-system&lt;/td&gt;
&lt;td&gt;Title + meta description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;freeCodeCamp — author/dannwaneri&lt;/td&gt;
&lt;td&gt;Meta description&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dev.to — the-gatekeeping-panic&lt;/td&gt;
&lt;td&gt;Title too long&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dev.to — i-built-a-production-rag-system&lt;/td&gt;
&lt;td&gt;Title too long&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The freeCodeCamp description issues are partly platform-level — freeCodeCamp controls the template and sometimes truncates or omits meta descriptions. The DEV.to title issues are mine. Article titles that read well as headlines often exceed 60 characters in the &lt;code&gt;&amp;lt;title&amp;gt;&lt;/code&gt; tag.&lt;/p&gt;

&lt;p&gt;The agent didn't care. It checked the standard and reported the result.&lt;/p&gt;




&lt;h2&gt;
  
  
  The schedule play
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;python index.py --auto&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Add a &lt;code&gt;.bat&lt;/code&gt; file that sets the API key and calls that command. Schedule it in Windows Task Scheduler for Monday 7am. Check &lt;code&gt;report-summary.txt&lt;/code&gt; with your coffee.&lt;/p&gt;

&lt;p&gt;That's the agency workflow. No babysitting. Edge cases in &lt;code&gt;needs_human[]&lt;/code&gt; for human review. Everything else processed and reported automatically.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this actually costs
&lt;/h2&gt;

&lt;p&gt;One Sonnet API call per URL. Roughly $0.002 per page. A 20-URL weekly audit costs less than $0.05. The Playwright browser runs locally — no cloud browser fees, no Browserbase subscription.&lt;/p&gt;

&lt;p&gt;The whole thing runs on a $5/month philosophy. Same one I use for everything else.&lt;/p&gt;




&lt;h2&gt;
  
  
  The code
&lt;/h2&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/dannwaneri/seo-agent" rel="noopener noreferrer"&gt;dannwaneri/seo-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clone it, add your URLs to &lt;code&gt;input.csv&lt;/code&gt;, set &lt;code&gt;ANTHROPIC_API_KEY&lt;/code&gt; in your environment, run &lt;code&gt;pip install -r requirements.txt&lt;/code&gt;, run &lt;code&gt;playwright install chromium&lt;/code&gt;, then &lt;code&gt;python index.py&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The freeCodeCamp tutorial walks through each module — browser integration, the Claude extraction prompt, the async link checker, the HITL logic. Link in the comments when it's live.&lt;/p&gt;




&lt;h2&gt;
  
  
  The shift worth naming
&lt;/h2&gt;

&lt;p&gt;Browser automation has been a developer tool for a decade. Playwright, Selenium, Puppeteer — all powerful, all requiring someone to write and maintain selectors. The moment a button's class name changes, the script breaks.&lt;/p&gt;

&lt;p&gt;This agent doesn't use selectors. It reads the page the way Claude reads it — semantically, through the accessibility tree. A "Submit" button is still a "Submit" button even if the CSS class changed.&lt;/p&gt;

&lt;p&gt;The extraction logic is in the prompt, not in the code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old way:&lt;/strong&gt; Automation breaks when the page changes.&lt;br&gt;&lt;br&gt;
&lt;strong&gt;New way:&lt;/strong&gt; Reasoning adapts. The code doesn't need to.&lt;/p&gt;

&lt;p&gt;That's the actual shift. Not "AI does the work" but "the brittleness moved." From selectors to prompts. From maintenance to reasoning. The failure modes are different. So is the recovery.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built this as the first in a series on practical local AI agent setups for agency operations. The freeCodeCamp step-by-step tutorial is coming. Repo is live now.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>automation</category>
      <category>ai</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Imposter Syndrome Didn't Go Away. It Got Quieter.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Thu, 26 Mar 2026 13:05:44 +0000</pubDate>
      <link>https://dev.to/dannwaneri/imposter-syndrome-didnt-go-away-it-got-quieter-4mcc</link>
      <guid>https://dev.to/dannwaneri/imposter-syndrome-didnt-go-away-it-got-quieter-4mcc</guid>
      <description>&lt;p&gt;I noticed something last year. The imposter syndrome posts disappeared.&lt;/p&gt;

&lt;p&gt;Not gradually. They were everywhere — "I've been coding for three years and I still Google how to center a div," "just got promoted to senior and I have no idea what I'm doing," threads with thousands of likes, people in the replies saying &lt;em&gt;me too, me too, me too&lt;/em&gt;. Then the LLMs arrived, and the posts stopped. I thought maybe the tools cured it. That we'd finally found the thing that made everyone feel competent.&lt;/p&gt;

&lt;p&gt;I was wrong. The feeling didn't go away. It just changed shape.&lt;/p&gt;




&lt;p&gt;The old imposter syndrome had a specific texture. You knew what you didn't know. You couldn't center a div, you'd never touched Kubernetes, you were faking it in standups about GraphQL. The gap was legible. You could name it, study it, fill it. That legibility is what made the posts shareable. "I don't know X" is a sentence you can say out loud.&lt;/p&gt;

&lt;p&gt;The new version is harder to name. You shipped the feature. The tests pass. Your manager is happy. But when someone asks you to walk through the implementation, something is off. You can narrate what the code does. You can't always explain why it's structured the way it is, what the model assumed, where it would break under pressure. The code is yours the same way a house is yours when you hired the contractor. You own it. You don't know every decision that went into the walls.&lt;/p&gt;

&lt;p&gt;Engineering culture has long tied ownership to authorship. You wrote the code, therefore you understood it. You understood it, therefore you were responsible for it. That chain held for decades. AI broke it — not dramatically, not all at once, but consistently enough that a new kind of doubt has moved in where the old one used to live.&lt;/p&gt;




&lt;p&gt;The data is starting to catch up to what developers already feel.&lt;/p&gt;

&lt;p&gt;METR previously published a paper which found the use of AI tools caused a 20% slowdown in completing tasks among experienced open-source developers — a finding strange enough that they ran a follow-up study. That follow-up ran into its own problem: developers refusing to participate because they did not wish to work without AI. 30% to 50% of developers told researchers they were choosing not to submit certain tasks because they didn't want to do them without AI.&lt;/p&gt;

&lt;p&gt;Read that twice. Not "AI makes me faster." Not even "AI makes me better." Developers who cannot or will not attempt the work without the tool present. That's not productivity. That's dependency.&lt;/p&gt;

&lt;p&gt;Luciano Nooijen noticed this before the research did. He used AI tools heavily at work, stopped for a side project, and hit a wall. "I was feeling so stupid because things that used to be instinct became manual, sometimes even cumbersome." The instincts didn't fail slowly. They went quiet.&lt;/p&gt;




&lt;p&gt;AI coding tools can make you more productive. They can make you feel more confident. They can also produce developers who don't understand the context behind the code they've written or how to debug it.&lt;/p&gt;

&lt;p&gt;Stack Overflow called this "the illusion of expertise." I'd call it something slightly different. It's not that the expertise is fake. It's that the path that usually builds expertise — the struggle, the failure, the iteration — got shortened. When AI generates code without the developer engaging deeply with the implementation, they skip over all of those learning iterations. The developer doesn't struggle with the problem, doesn't try multiple approaches, doesn't experience the failures that build intuition.&lt;/p&gt;

&lt;p&gt;The old imposter syndrome was painful but self-correcting. You felt like a fraud, so you studied. You filled the gap. The new version doesn't have an obvious gap to fill. The code shipped. You're a senior developer with a good job and passing tests and no visible evidence that anything is wrong. The doubt is quieter. That's what makes it harder.&lt;/p&gt;




&lt;p&gt;We're living in the most empowering time to be a developer. The barrier to building things has never been lower. And yet, I've never felt less sure of my own skills.&lt;/p&gt;

&lt;p&gt;That's Pranav Reveendran, writing in December. He's not alone. A 2025 survey by Perceptyx found a "confidence gap" where 71% of employees use AI, but only 35% of individual contributors feel they understand it well enough. The adoption curve went up. The confidence curve didn't follow.&lt;/p&gt;

&lt;p&gt;This is what the posts were about, when people still wrote them. Not "do I belong in tech?" but "do I understand what I'm building?" The first question had a community around it. Thousands of replies, conferences, entire career tracks built around addressing it. The second question doesn't have a community yet. It's the thing people say in DMs but not in threads. It's the thing I've heard from developers who've been building for years: I know how to use the tools. I'm not sure I know the work anymore.&lt;/p&gt;




&lt;p&gt;I don't think this resolves neatly. The tools aren't going away and the productivity is real and I'm not arguing for anyone to stop using them. I use them. I built most of Foundation's evaluator with AI assistance and I'd do it again.&lt;/p&gt;

&lt;p&gt;But I think the silence is worth naming. The imposter syndrome posts didn't stop because the feeling stopped. They stopped because the feeling changed into something that's harder to share. "I don't know how to center a div" is embarrassing but legible. "I shipped a feature I can't fully explain" is a different kind of statement. It implicates the work, not just the person. It's harder to say &lt;em&gt;me too&lt;/em&gt; to.&lt;/p&gt;

&lt;p&gt;The old imposter syndrome asked: &lt;em&gt;am I good enough?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The new one asks: &lt;em&gt;is the work mine?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;That's a harder question. And so far, it's mostly going unasked.&lt;/p&gt;

</description>
      <category>discuss</category>
      <category>career</category>
      <category>webdev</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Future Is Not the Agent Using a Human Interface</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Mon, 23 Mar 2026 13:44:20 +0000</pubDate>
      <link>https://dev.to/dannwaneri/the-future-is-not-the-agent-using-a-human-interface-48cg</link>
      <guid>https://dev.to/dannwaneri/the-future-is-not-the-agent-using-a-human-interface-48cg</guid>
      <description>&lt;p&gt;Carl Pei said it at SXSW last week. His company, Nothing, makes smartphones. He stood on stage and told the room: "The future is not the agent using a human interface. You need to create an interface for the agent to use."&lt;/p&gt;

&lt;p&gt;A consumer hardware CEO publicly declaring that the product category he sells is the wrong shape for what's coming. That's not a hot take. That's a company repositioning before the floor disappears.&lt;/p&gt;

&lt;p&gt;The room moved on. Most of the coverage focused on the "apps are dying" framing. That's the wrong thing to argue about. Apps are not dying next Tuesday. The question worth sitting with is quieter and more immediately useful: what does it mean that the thing we've been building — apps designed for human eyes, human fingers, human intuition — is increasingly being navigated by something that has none of those?&lt;/p&gt;




&lt;p&gt;For thirty years, software design started with a user. A person with a screen, a mouse or finger, a working memory of roughly four things, limited patience, inconsistent behavior. Every design decision flowed from that starting point. Navigation was visual because users have eyes. Flows were linear because users lose context. Confirmation dialogs exist because users make mistakes and need to undo them.&lt;/p&gt;

&lt;p&gt;Those constraints weren't arbitrary. They were load-bearing. The entire architecture of how software works was built around the specific limitations and capabilities of a human being sitting in front of it.&lt;/p&gt;

&lt;p&gt;Agents don't have eyes. They don't navigate menus — they call functions. They don't get confused by non-linear flows — they parse structured outputs. They don't need confirmation dialogs — they need permission boundaries defined at initialization, not presented as popups mid-task.&lt;/p&gt;

&lt;p&gt;When an agent tries to use an app designed for a human, it's doing something like what Pei described: hiring a genius employee and making them work using elevator buttons. The capability is real. The interface is friction. The agent scrapes what it can, simulates the clicks, and works around the design rather than with it.&lt;/p&gt;

&lt;p&gt;This works. Barely. Temporarily.&lt;/p&gt;




&lt;p&gt;The split is already visible in how developers describe their workflows.&lt;/p&gt;

&lt;p&gt;Karpathy, in a March podcast, described moving from writing code to delegating to agents running 16 hours a day. His framing: macro actions over repositories, not line-by-line editing. The unit of work is no longer a file. It's a task.&lt;/p&gt;

&lt;p&gt;Addy Osmani wrote about the same shift at the interface level: the control plane becoming the primary surface, the editor becoming one instrument underneath it. What used to be the center of developer work is becoming a specialized tool for specific moments — the deep inspection, the edge case, the thing the agent got almost right and subtly wrong.&lt;/p&gt;

&lt;p&gt;These descriptions share a structure: something that was primary becomes secondary. Something that was implicit becomes explicit. The developer who used to navigate the editor now supervises agents. The app that used to be the product now needs to also be a legible interface for non-human callers.&lt;/p&gt;




&lt;p&gt;Here's what that means practically, for anyone building software right now.&lt;/p&gt;

&lt;p&gt;The apps built for human navigation will still work for humans. That's not going away. But increasingly, those apps will also be called by agents acting on behalf of humans — booking the flight, filing the form, triggering the workflow. And when the agent calls your app, it doesn't navigate. It looks for a contract: what can I call, what will you return, what happens when I'm wrong.&lt;/p&gt;

&lt;p&gt;Most apps don't have that contract. They have a UI. They have an API if you're lucky. But the API was designed as a developer convenience, not as a primary interface for autonomous callers. The rate limits assume human usage patterns. The error responses assume a developer reading them. The authentication assumes a human with a session.&lt;/p&gt;

&lt;p&gt;None of those assumptions hold for agents.&lt;/p&gt;

&lt;p&gt;The developers who see this clearly are already building differently. Not abandoning the human interface — users still need screens, still need control surfaces, still need to understand what's happening on their behalf. But building the agent interface in parallel. Structured outputs. Explicit capability declarations. Error responses designed to be parsed, not read. Permission boundaries that don't require a human to click through them.&lt;/p&gt;




&lt;p&gt;The spec is where this shows up first.&lt;/p&gt;

&lt;p&gt;A product spec written for human implementation describes features. What the user sees. What they can do. How the flow works. A spec written for agent implementation describes contracts. What the system accepts. What it returns. What it guarantees. Where the boundaries are.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;[ASSUMPTION: ...]&lt;/code&gt; tags in spec-writer exist because that boundary — between what was specified and what the agent decided — is where agent implementations go wrong. Not in the happy path. In the assumptions that weren't stated because a human developer would have asked a clarifying question.&lt;/p&gt;

&lt;p&gt;An agent doesn't ask. It fills the gap with whatever its training suggests is most plausible. If the assumption was wrong, you find out in production.&lt;/p&gt;

&lt;p&gt;The spec that makes agent implementation reliable is the one that surfaces the assumptions before the agent starts. Not because agents are unreliable — they're remarkably capable within a well-defined contract. But because the contract has to be explicit in a way it never had to be when the implementer was human and could pick up the phone.&lt;/p&gt;




&lt;p&gt;Carl Pei's argument is about device form factors. The smartphone built around apps and home screens doesn't fit a world where agents intermediate between intention and execution.&lt;/p&gt;

&lt;p&gt;The same argument applies to every level of the stack.&lt;/p&gt;

&lt;p&gt;The database schema built for human-readable queries. The API designed for developer convenience. The workflow tool that requires clicking through five screens. The SaaS product whose entire value lives in a visual interface with no programmatic equivalent.&lt;/p&gt;

&lt;p&gt;None of these are broken today. They will accumulate friction as agent usage grows — the same way command-line tools accumulated friction when GUIs arrived, the same way desktop software accumulated friction when everything moved to the web.&lt;/p&gt;

&lt;p&gt;The difference is pace. The GUI transition took a decade. The web transition took another. The agent transition is compressing.&lt;/p&gt;




&lt;p&gt;The developers who will build the next layer are already asking a different question. Not "how does the user navigate this?" but "how does the agent call this?" Not "what does the confirmation dialog say?" but "what are the permission boundaries at initialization?" Not "how do we make the flow intuitive?" but "what does the contract guarantee?"&lt;/p&gt;

&lt;p&gt;These are not new questions. They're the questions API designers have always asked. What's new is that they now apply to everything — not just the backend service, but the whole product. The human interface remains. It becomes one layer, not the only layer.&lt;/p&gt;

&lt;p&gt;The future is not the agent using a human interface. The future is building interfaces designed for both — and knowing which decisions belong to which layer.&lt;/p&gt;

&lt;p&gt;The developers who get that distinction early will have built the right foundation before it becomes mandatory. The ones who don't will spend the next three years retrofitting contracts onto systems that were never designed to have them.&lt;/p&gt;

&lt;p&gt;That's the same pattern as every previous transition. The only thing that changes is how much runway you have before the friction becomes a crisis.&lt;/p&gt;

&lt;p&gt;Right now, you still have some.&lt;/p&gt;

</description>
      <category>webdev</category>
      <category>discuss</category>
      <category>ai</category>
      <category>architecture</category>
    </item>
    <item>
      <title>Your Agent Is Making Decisions Nobody Authorized</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Thu, 19 Mar 2026 18:02:23 +0000</pubDate>
      <link>https://dev.to/dannwaneri/your-agent-is-making-decisions-nobody-authorized-2bc7</link>
      <guid>https://dev.to/dannwaneri/your-agent-is-making-decisions-nobody-authorized-2bc7</guid>
      <description>&lt;p&gt;A quant fund ran five independent strategies. Every one passed its individual risk limits. Every quarterly filing looked reasonable in isolation. But all five strategies were overweight the same sector.&lt;/p&gt;

&lt;p&gt;Aggregate exposure exceeded anything anyone had authorized — because the complexity budget was scoped per-strategy, never cross-strategy. No single decision was wrong. The aggregate outcome was unauthorized. Nobody was watching the right scope.&lt;/p&gt;

&lt;p&gt;This is governance debt. It accumulates invisibly. Each decision individually correct. Aggregate outcome unauthorized. The failure only surfaces when the sector moves against the fund and the concentration nobody explicitly built turns out to have been built anyway — one individually-reasonable decision at a time.&lt;/p&gt;

&lt;p&gt;The token economy has no accounting entry for this. Nothing on the infrastructure bill reflects what happened. The cost appears later, in a different quarter, a different system, a different team's incident report.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two Clocks
&lt;/h2&gt;

&lt;p&gt;Every agent system has at least two clocks running simultaneously.&lt;/p&gt;

&lt;p&gt;The execution clock measures computation — tokens consumed, API calls made, latency per response. This is what most monitoring systems track. It is visible, quantifiable, and entirely the wrong thing to govern.&lt;/p&gt;

&lt;p&gt;The governance clock measures consequences — decisions made, thresholds crossed, exposures accumulated. This clock runs at a fundamentally different speed than execution. It is also running on a fundamentally different metric.&lt;/p&gt;

&lt;p&gt;Execution counts tokens. Governance counts consequences.&lt;/p&gt;

&lt;p&gt;Most agent architectures try to govern at execution layer granularity. They instrument every API call, set token budgets, alert on cost spikes. The result is a monitoring system that costs more attention than the decisions it is protecting. The governance layer becomes noise. Worse than useless — it actively degrades decision quality by demanding attention on non-material changes.&lt;/p&gt;

&lt;p&gt;The fix is not better monitoring. It is a different master clock.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Master Clock
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://dev.to/vibeyclaw"&gt;Vic Chen&lt;/a&gt; builds institutional investor analysis tooling around quarterly SEC filing data. The production domain he works in — 13F analysis, the quarterly disclosures hedge funds file showing their equity positions — has solved the governance clock problem in a way most software systems haven't.&lt;/p&gt;

&lt;p&gt;The expensive thing in 13F analysis is not parsing the filings. That part is mechanical. Run the filing through the pipeline, extract the positions, compare against the previous quarter. Cheap. Deterministic. Fast.&lt;/p&gt;

&lt;p&gt;The expensive thing is deciding which signals actually warrant human attention. Every false positive costs analyst time. Every false negative costs trust in the system. That judgment call does not map cleanly to token costs. You cannot optimize it by switching to a cheaper model or batching the API calls. The bottleneck is not computation. It is consequence.&lt;/p&gt;

&lt;p&gt;The master clock in Vic's system is anchored to SEC reporting cadence — filing deadlines, amendment windows, restatement periods. The quarterly 13F disclosure cycle is one of the few externally-anchored materiality signals in finance. It already encodes a judgment: this change was significant enough to report.&lt;/p&gt;

&lt;p&gt;When a fund flips a position intra-quarter and back again, that event never surfaces in the filing. This is not a detection failure. The governance architecture is working correctly. An intra-quarter flip that reverses before the filing deadline was not a conviction change. The master clock — anchored to external cadence rather than internal computation — correctly filters it out.&lt;/p&gt;

&lt;p&gt;This is filtering by design rather than detection by volume. The governance layer is not trying to catch everything. It is anchored to what the domain has already decided matters.&lt;/p&gt;

&lt;p&gt;When the system detects an NT 13F — a late filing notification — or an amendment chain exceeding two revisions, the complexity budget automatically expands. Slower parsing. Deeper cross-referencing. Human-in-the-loop checkpoints. The external signal triggers the governance response because the external signal already encodes the materiality judgment.&lt;/p&gt;

&lt;p&gt;The master clock is not a timer. It is a materiality filter.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Over-Calibration Trap
&lt;/h2&gt;

&lt;p&gt;The first failure mode most governance architectures hit is over-calibration.&lt;/p&gt;

&lt;p&gt;Early 13F monitoring systems flagged every rounding discrepancy between a fund's 13F and their 13D as potential drift. The noise-to-signal ratio made the governance layer worse than useless. Analysts were spending attention on non-material changes. The system meant to improve decision quality was degrading it.&lt;/p&gt;

&lt;p&gt;Vic's framing of the fix: governance cost should scale with expected loss from undetected drift, not with the volume of changes observed.&lt;/p&gt;

&lt;p&gt;A $50M position shift in a mega-cap is noise. A $50M position shift in a micro-cap is a thesis change. Same signal magnitude. Completely different materiality. The governance architecture has to encode that domain knowledge or it cannot distinguish between them.&lt;/p&gt;

&lt;p&gt;This is where the "significant drift threshold" becomes practical rather than theoretical. Without it, you are building a governance layer that is perpetually anxious — flagging everything, earning attention for nothing, training operators to ignore alerts. With it, the governance layer fires on what matters and stays quiet on what does not.&lt;/p&gt;

&lt;p&gt;The threshold is not a technical parameter. It is a domain judgment encoded into architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  Governance Debt
&lt;/h2&gt;

&lt;p&gt;Back to the five-strategy fund.&lt;/p&gt;

&lt;p&gt;The token economy version of that failure is identical in structure. An agent that makes two expensive API calls but expands the decision space by introducing correlated hypotheses is under token budget and creating governance debt simultaneously. The cost does not appear on the infrastructure bill. It appears when the downstream decision fails in ways that trace back to the expanded hypothesis space nobody reviewed.&lt;/p&gt;

&lt;p&gt;An agent that makes 100 cheap API calls but narrows a decision space from 5,000 options to 3 is adding value regardless of token cost. An agent that makes two expensive calls but introduces unreviewed complexity is accruing debt regardless of token efficiency.&lt;/p&gt;

&lt;p&gt;The decision economy is the accounting system that captures the difference.&lt;/p&gt;

&lt;p&gt;Three signals that matter in the decision economy and do not appear in token accounting:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decision scope.&lt;/strong&gt; How many downstream choices does this agent action constrain or enable? An action that narrows the decision space is value-generating. An action that expands the decision space without corresponding resolution is debt-generating. The token cost of both can be identical.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Consequence materiality.&lt;/strong&gt; Not all decisions are equal. The governance clock runs faster when the expected loss from undetected drift is higher. A rounding discrepancy in a mega-cap position and a position reversal in a micro-cap both generate the same token cost to detect. Their materiality is orders of magnitude apart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Authorization scope.&lt;/strong&gt; Was this decision within the scope of what was explicitly authorized, or did it cross a threshold that required a decision nobody made? Governance debt accumulates at the boundary between authorized and implicitly permitted.&lt;/p&gt;

&lt;p&gt;None of these signals are invisible. They require a different accounting system — one that tracks consequences rather than computation, materiality rather than volume, authorization scope rather than token spend.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rate Problem
&lt;/h2&gt;

&lt;p&gt;Static agents accumulate governance debt slowly. An agent that makes the same decisions in the same scope every session creates predictable exposure. You can audit it. You can scope the complexity budget correctly once and leave it.&lt;/p&gt;

&lt;p&gt;An agent with memory that promotes knowledge automatically accumulates governance debt at the rate the memory compounds. Each session, the promoted knowledge expands the hypothesis space the agent operates in. Each expansion is individually reasonable. The aggregate effect is an authorization scope that widens faster than anyone is reviewing it.&lt;/p&gt;

&lt;p&gt;The governance layer has to keep pace with the learning rate or it falls further behind with every session — not linearly, but exponentially.&lt;/p&gt;

&lt;p&gt;This is why the evaluator architecture in Foundation anchors promotion criteria to human judgment rather than automating it. Not because automation is wrong in principle. Because the governance debt from unchecked automatic promotion compounds at the same rate as the knowledge base — and the rate is the problem, not any individual promoted insight.&lt;/p&gt;

&lt;p&gt;The master clock that works for 13F analysis works for agent memory for the same reason: it is anchored to external materiality judgment rather than internal computation speed. The filing cadence enforces a review interval that the domain has already validated. The human promotion gate enforces a review interval that the knowledge system needs.&lt;/p&gt;

&lt;p&gt;Both are the same governance architecture. One is thirty years old. One is being built now.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Decision Economy Actually Measures
&lt;/h2&gt;

&lt;p&gt;Token costs are visible on the infrastructure bill. Governance debt is not.&lt;/p&gt;

&lt;p&gt;This asymmetry creates predictable incentives. Teams optimize for what they can measure. Token spend gets instrumented, budgeted, alerted. Governance debt accumulates until it manifests as a production failure, a trust breakdown, a concentration exposure nobody authorized across five individually-reasonable decisions.&lt;/p&gt;

&lt;p&gt;The token economy prices execution correctly. It is the wrong accounting system for judgment costs because judgment costs do not appear until downstream consequences surface — often in a different quarter, a different system, a different team's incident report.&lt;/p&gt;

&lt;p&gt;The precedent that prices judgment correctly is not a token count. It is a record of what decisions were made under what conditions, what thresholds were crossed, what downstream exposure was created, and whether any of it was explicitly authorized.&lt;/p&gt;

&lt;p&gt;That record is what makes governance debt visible before it becomes a production problem.&lt;/p&gt;

&lt;p&gt;Execution counts tokens. Governance counts consequences.&lt;/p&gt;

&lt;p&gt;The decision economy is what you build when you understand the difference.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of a series on what AI actually changes in software development. Previous pieces: &lt;a href="https://dev.to/dannwaneri/the-gatekeeping-panic-what-ai-actually-threatens-in-software-development-5b9l"&gt;The Gatekeeping Panic&lt;/a&gt;, &lt;a href="https://dev.to/dannwaneri/the-meter-was-always-running-44c4"&gt;The Meter Was Always Running&lt;/a&gt;, &lt;a href="https://dev.to/dannwaneri/who-said-what-to-whom-5914"&gt;Who Said What to Whom&lt;/a&gt;, &lt;a href="https://dev.to/dannwaneri/the-token-economy-3cd9"&gt;The Token Economy&lt;/a&gt;, &lt;a href="https://dev.to/dannwaneri/building-the-evaluator-36kg"&gt;Building the Evaluator&lt;/a&gt;, &lt;a href="https://dev.to/dannwaneri/i-shipped-broken-code-and-wrote-an-article-about-it-98p"&gt;I Shipped Broken Code and Wrote an Article About It&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;The governance debt examples and 13F production evidence in this piece draw from analysis by &lt;a href="https://dev.to/vibeyclaw"&gt;Vic Chen&lt;/a&gt;, who builds institutional investor analysis tooling around quarterly SEC filing data. The five-strategy concentration example is his.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>architecture</category>
      <category>discuss</category>
    </item>
    <item>
      <title>I Built a Skill That Writes Your Specs For You</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 17 Mar 2026 11:35:53 +0000</pubDate>
      <link>https://dev.to/dannwaneri/i-built-a-skill-that-writes-your-specs-for-you-1o2n</link>
      <guid>https://dev.to/dannwaneri/i-built-a-skill-that-writes-your-specs-for-you-1o2n</guid>
      <description>&lt;p&gt;Julián Deangelis published a piece this week that hit 194k+ views in days. The argument: AI coding agents don't fail because the model is weak. They fail because the instructions are ambiguous.&lt;/p&gt;

&lt;p&gt;He called it Spec Driven Development. Four steps: specify what you want, plan how to build it technically, break it into tasks, implement one task at a time. Each step reduces ambiguity. By the time the agent starts writing code, it has everything it needs — what the feature does, what the edge cases are, what the tests should verify.&lt;/p&gt;

&lt;p&gt;The piece is right. The problem is the workflow takes discipline most sessions don't have. Under deadline pressure, the spec step disappears and you're back to prompting directly.&lt;/p&gt;

&lt;p&gt;So I built a skill that does the spec-writing for you.&lt;/p&gt;




&lt;h2&gt;
  
  
  The silent decisions problem
&lt;/h2&gt;

&lt;p&gt;Julián's example stuck with me:&lt;/p&gt;

&lt;p&gt;"Add a feature to manage items from the backoffice."&lt;/p&gt;

&lt;p&gt;The agent reads the codebase, picks a pattern, writes the feature. At first it looks fine. Then you click "add item" again and it inserts the same item twice. All the assumptions you thought were obvious were never in the prompt.&lt;/p&gt;

&lt;p&gt;Which backoffice — internal or seller-facing? Should the operation be idempotent? Admin-only or all users? Which storage layer? Which error handling strategy?&lt;/p&gt;

&lt;p&gt;Each one is a silent decision. The agent guesses. Some guesses are right. Some are wrong. And you don't find out which until the feature is in production behaving in ways you didn't expect.&lt;/p&gt;

&lt;p&gt;The spec is what makes those decisions visible before the agent guesses. But writing a spec from scratch, for every feature, before every session — that's the friction that kills the habit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Generate first, flag assumptions inline
&lt;/h2&gt;

&lt;p&gt;The standard approach to spec tools is Q&amp;amp;A: the tool asks clarifying questions before generating anything. What's the user role? What's the auth model? What should happen on error?&lt;/p&gt;

&lt;p&gt;The problem with Q&amp;amp;A is you have to know what you don't know. If you knew exactly what questions to ask, you'd be close to having the spec already.&lt;/p&gt;

&lt;p&gt;spec-writer takes a different approach. It generates the full spec immediately and marks every decision it made that you didn't specify:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="nf"&gt;Given &lt;/span&gt;an authenticated user requesting an export
&lt;span class="err"&gt;When the export contains more than 1,000 rows [ASSUMPTION&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="err"&gt;async&lt;/span&gt; &lt;span class="err"&gt;for&lt;/span&gt; &lt;span class="err"&gt;large&lt;/span&gt; &lt;span class="err"&gt;exports]&lt;/span&gt;
&lt;span class="nf"&gt;Then &lt;/span&gt;the export is generated asynchronously and the user is notified by email
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At the end, every assumption is collected into a prioritized list:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Assumptions to review&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; Async generation for exports over 1,000 rows — Impact: HIGH
   Correct this if: your order volumes are low and synchronous is fine
&lt;span class="p"&gt;
2.&lt;/span&gt; Date range filter is required — Impact: MEDIUM
   Correct this if: the first version should export all orders with no filtering
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Correcting a draft is faster than answering questions cold. The &lt;code&gt;[ASSUMPTION: ...]&lt;/code&gt; tags show you exactly where the agent would have guessed wrong — before it guesses. That's the value. Not the spec itself. The visible assumptions.&lt;/p&gt;




&lt;h2&gt;
  
  
  A real example
&lt;/h2&gt;

&lt;p&gt;I ran it on something I'm actually building — CLI capture for Foundation, my knowledge commons project. Foundation currently captures Claude.ai browser conversations via a browser extension. Claude Code sessions live in &lt;code&gt;~/.claude/projects/&lt;/code&gt; and disappear after 30 days. I needed to capture those too.&lt;/p&gt;

&lt;p&gt;The prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/spec-writer Add CLI capture for Claude Code sessions stored in ~/.claude/projects/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here's a slice of what came back:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;From the spec:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight gherkin"&gt;&lt;code&gt;&lt;span class="nf"&gt;Given &lt;/span&gt;a developer with Claude Code sessions in ~/.claude/projects/
&lt;span class="nf"&gt;When &lt;/span&gt;they run the Foundation CLI capture command
&lt;span class="nf"&gt;Then &lt;/span&gt;a list of uncaptured sessions is displayed, sorted by most recent first

&lt;span class="nf"&gt;Given &lt;/span&gt;a session already in Foundation
&lt;span class="nf"&gt;When &lt;/span&gt;the developer runs capture again
&lt;span class="nf"&gt;Then &lt;/span&gt;that session does not appear in the review list

&lt;span class="nf"&gt;Given &lt;/span&gt;a malformed .jsonl file
&lt;span class="nf"&gt;When &lt;/span&gt;the capture command processes it
&lt;span class="nf"&gt;Then &lt;/span&gt;an error is logged for that file and processing continues on remaining sessions
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;From the plan:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CLI capture runs locally — it needs to read the local filesystem, which 
Cloudflare Workers cannot do. [ASSUMPTION: CLI capture is a local Node.js 
or Bun script that calls the Foundation API to insert sessions, rather 
than a Worker itself]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That assumption stopped me. I hadn't thought through the architecture split — Foundation runs on Cloudflare Workers, which can't read local filesystems. The skill surfaced it before I'd written a line of code.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The assumptions summary:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="p"&gt;1.&lt;/span&gt; CLI capture is a local script calling the Foundation API, not a Worker
   Impact: HIGH
   Correct this if: you want a purely serverless approach
&lt;span class="p"&gt;
2.&lt;/span&gt; Manual curation before capture, not automatic bulk import
   Impact: HIGH
   Correct this if: you want automatic background capture
&lt;span class="p"&gt;
3.&lt;/span&gt; Session ID from .jsonl filename is the deduplication key
   Impact: MEDIUM
   Correct this if: session IDs are stored differently in your schema
&lt;span class="p"&gt;
4.&lt;/span&gt; No sensitive data scrubbing in v1
   Impact: MEDIUM
   Correct this if: your sessions contain credentials or keys
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four assumptions, two of them high-impact. The architecture one I would have hit mid-implementation. The sensitive data one I wouldn't have thought about until someone complained.&lt;/p&gt;

&lt;p&gt;That's the value. Not the spec itself. The visible assumptions.&lt;/p&gt;




&lt;h2&gt;
  
  
  How it fits into SDD
&lt;/h2&gt;

&lt;p&gt;spec-writer gets you to Spec-First immediately, with no ceremony. One command, full output, correct the assumptions, hand to the agent.&lt;/p&gt;

&lt;p&gt;Julián describes two levels beyond that. Spec-Anchored: the spec lives in the repo and evolves with the code. Spec-as-Source: the spec is the primary artifact, code is regenerated to match. If you want to move toward Spec-Anchored, save the output to &lt;code&gt;specs/feature-name.md&lt;/code&gt; in your repo. The skill produces something worth keeping.&lt;/p&gt;

&lt;p&gt;The methodology is Julián's. The skill is the friction remover.&lt;/p&gt;




&lt;h2&gt;
  
  
  Install
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; ~/.claude/skills
git clone https://github.com/dannwaneri/spec-writer.git ~/.claude/skills/spec-writer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/spec-writer [your feature description]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Works across Claude Code, Cursor, Gemini CLI, and any agent that supports the Agent Skills standard. The same SKILL.md file works everywhere.&lt;/p&gt;

&lt;p&gt;The repo is at &lt;a href="https://github.com/dannwaneri/spec-writer" rel="noopener noreferrer"&gt;github.com/dannwaneri/spec-writer&lt;/a&gt;. The README has the full output format and a worked example.&lt;/p&gt;




&lt;p&gt;The agents are getting better at implementing. The bottleneck was always specification — knowing what to build precisely enough that the agent doesn't have to guess. spec-writer doesn't remove that requirement. It makes it faster to satisfy.&lt;/p&gt;

&lt;p&gt;The spec isn't the output. The assumptions are.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built on the Spec Driven Development methodology — operationalized by tools like &lt;a href="https://github.com/github/spec-kit" rel="noopener noreferrer"&gt;GitHub Spec Kit&lt;/a&gt; and &lt;a href="https://github.com/Fission-AI/OpenSpec" rel="noopener noreferrer"&gt;OpenSpec&lt;/a&gt;. Julián Deangelis's writing on SDD at MercadoLibre was the direct inspiration.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Other skills: &lt;a href="https://github.com/dannwaneri/voice-humanizer" rel="noopener noreferrer"&gt;voice-humanizer&lt;/a&gt; — checks your writing against your own voice, not generic AI patterns.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>productivity</category>
      <category>webdev</category>
      <category>claudecode</category>
      <category>agentskills</category>
    </item>
  </channel>
</rss>
