<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Daniel Nwaneri</title>
    <description>The latest articles on DEV Community by Daniel Nwaneri (@dannwaneri).</description>
    <link>https://dev.to/dannwaneri</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3606168%2F7684e1e1-b986-4ee3-ae5b-56db2b97d286.jpg</url>
      <title>DEV Community: Daniel Nwaneri</title>
      <link>https://dev.to/dannwaneri</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dannwaneri"/>
    <language>en</language>
    <item>
      <title>Hi @ben . you liked this yesterday: https://dev.to/dannwaneri/the-loop-is-not-the-product-466d
 hours later Sloan flagged it as AI-generated.
Your founder liked the piece. Your bot flagged it.
Sloan isn't catching AI. It's catching good writing.....</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Wed, 10 Jun 2026 01:45:57 +0000</pubDate>
      <link>https://dev.to/dannwaneri/hi-ben-you-liked-this-yesterday-httpsdevtodannwanerithe-loop-is-not-the-product-466d-2cpo</link>
      <guid>https://dev.to/dannwaneri/hi-ben-you-liked-this-yesterday-httpsdevtodannwanerithe-loop-is-not-the-product-466d-2cpo</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/dannwaneri/the-loop-is-not-the-product-466d" class="crayons-story__hidden-navigation-link"&gt;The Loop Is Not the Product&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
      &lt;a href="https://dev.to/dannwaneri/the-loop-is-not-the-product-466d" class="crayons-article__context-note crayons-article__context-note__feed"&gt;&lt;p&gt;AI compute costs vs human labor&lt;/p&gt;

&lt;/a&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/dannwaneri" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3606168%2F7684e1e1-b986-4ee3-ae5b-56db2b97d286.jpg" alt="dannwaneri profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/dannwaneri" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Daniel Nwaneri
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Daniel Nwaneri
                &lt;a href="/++"&gt;&lt;img alt="Subscriber" class="subscription-icon" src="https://assets.dev.to/assets/subscription-icon-805dfa7ac7dd660f07ed8d654877270825b07a92a03841aa99a1093bd00431b2.png"&gt;&lt;/a&gt;
              
              &lt;div id="story-author-preview-content-3846240" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/dannwaneri" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3606168%2F7684e1e1-b986-4ee3-ae5b-56db2b97d286.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Daniel Nwaneri&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/dannwaneri/the-loop-is-not-the-product-466d" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 9&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/dannwaneri/the-loop-is-not-the-product-466d" id="article-link-3846240"&gt;
          The Loop Is Not the Product
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag crayons-tag--filled  " href="/t/discuss"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;discuss&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/webdev"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;webdev&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/productivity"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;productivity&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/dannwaneri/the-loop-is-not-the-product-466d" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/exploding-head-daceb38d627e6ae9b730f36a1e390fca556a4289d5a41abb2c35068ad3e2c4b5.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/multi-unicorn-b44d6f8c23cdd00964192bedc38af3e82463978aa611b4365bd33a0f1f4f3e97.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;16&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/dannwaneri/the-loop-is-not-the-product-466d#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              19&lt;span class="hidden s:inline"&gt;&amp;nbsp;comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            6 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success crayons-icon c-btn__icon"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>The Loop Is Not the Product</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 09 Jun 2026 14:35:13 +0000</pubDate>
      <link>https://dev.to/dannwaneri/the-loop-is-not-the-product-466d</link>
      <guid>https://dev.to/dannwaneri/the-loop-is-not-the-product-466d</guid>
      <description>&lt;p&gt;A tweet landed on my timeline from Peter Steinberger — OpenClaw founder, now at OpenAI:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Here's your monthly reminder that you shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;He's right about the mechanic. He's not asking the harder question.&lt;/p&gt;




&lt;p&gt;Before agents, we had cron jobs.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;0 2 * * * ./process_reports.sh&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;That's the whole contract. Run at 2am. Do what you said. Fail loudly or silently. Nobody wrote a think piece about cron jobs disrupting knowledge work. Nobody raised a seed round on a well-tuned crontab.&lt;/p&gt;

&lt;p&gt;But structurally? A cron job is a loop that prompts a process on a schedule. It just had the decency to be honest about what it was.&lt;/p&gt;

&lt;p&gt;Cron jobs → Airflow → event-driven pipelines → agents. Each layer added adaptability and removed legibility. Cron is maximally legible. You can read the entire logic in one line. An agent doing "the same job" is a probability distribution with a system prompt and a credit card attached.&lt;/p&gt;




&lt;p&gt;Now we've gone further. We have multi-agent systems. Specialist agents. Orchestrator agents that decide which specialist to call. Verification agents that check the output. Agents that self-correct when they fail.&lt;/p&gt;

&lt;p&gt;And companies are quietly running the math and going pale.&lt;/p&gt;

&lt;p&gt;Uber burned through its entire annual AI budget in four months. An NVIDIA vice president said publicly that AI computing costs now exceed employee labor costs. The FinOps Foundation's 2026 State of FinOps report found 73% of enterprises say AI costs exceeded original projections. Not a few bad actors. Not early adopters who didn't know better. Seventy-three percent.&lt;/p&gt;

&lt;p&gt;The mechanism has a name now: the agentic loop multiplier. A simple query in 2023 cost $0.04 per interaction. A multi-step orchestrated agent workflow in 2026 costs $1.20 — thirty times higher. Gartner puts the range at 5-30x more tokens per task than the chatbot pilots that justified the budget. The ROI calculations that approved the deployment assumed chatbot-level consumption. The invoices arrived with agent-level reality.&lt;/p&gt;

&lt;p&gt;A mid-level developer runs $80-120k. Fully loaded with benefits and overhead, maybe $250k. That sounds expensive until the token bill lands.&lt;/p&gt;

&lt;p&gt;The human compounds. They learn your codebase, your culture, your shortcuts. They remember the decision you made last quarter and why. The agent starts fresh every session. Every morning you're paying for the same orientation meeting. Context reconstruction — re-reading docs, re-loading state, re-establishing what "done" means — isn't free. You're billing for memory the human already had.&lt;/p&gt;

&lt;p&gt;The demo never shows you this. The demo is a single agent, single task, cherry-picked problem, running for 90 seconds while someone claps at a conference. The production reality is a fleet burning tokens on retries, tool calls that fail and get reattempted, coordination overhead between agents nobody budgeted for.&lt;/p&gt;

&lt;p&gt;You've built a bureaucracy. A token-denominated bureaucracy with no union and no lunch breaks and no salary cap.&lt;/p&gt;




&lt;p&gt;Back to Steinberger's tweet.&lt;/p&gt;

&lt;p&gt;"Designing loops that prompt your agents" is a real architectural upgrade over manual prompting. If you're still narrating every step to an agent like you're dictating to a secretary, the loop is the upgrade. Prompts from state — test results, diffs, error logs — not from you typing.&lt;/p&gt;

&lt;p&gt;But designing the loop is just procrastination with better posture if there's no customer at the end of it.&lt;/p&gt;

&lt;p&gt;Because someone still has to decide what the loop optimizes for. What "done" looks like. When to break. What counts as a failure worth stopping for. That's not automation — that's system design with higher stakes, because now the mistakes compound before anyone sees them.&lt;/p&gt;

&lt;p&gt;And "designing loops" is genuinely hard in a way prompting isn't. Most people who can write a good prompt cannot design a feedback loop with appropriate exit conditions, cost governors, and human checkpoints. The tweet makes the upgrade sound like switching from tabs to spaces. It's closer to switching from writing functions to designing distributed systems.&lt;/p&gt;

&lt;p&gt;What I want to know: what breaks in the loop that a prompt would have caught? Every abstraction hides something. Prompting hides scale. Loops hide drift. At some point the agent has been running for six hours optimizing a metric nobody remembers choosing, and the loop is beautiful and the output is garbage.&lt;/p&gt;




&lt;p&gt;Here's what nobody in the agent hype cycle wants to sit with:&lt;/p&gt;

&lt;p&gt;The old model had a forcing function built in. You shipped, a human used it, something broke, you fixed it. Feedback was physical. A user opened a ticket. A client called. Reality interrupted the loop.&lt;/p&gt;

&lt;p&gt;Agents don't have that governor. The loop is the product. And when the loop is the product, you can optimize indefinitely without ever confronting whether the output matters.&lt;/p&gt;

&lt;p&gt;Token burn becomes a proxy for progress. Iteration velocity becomes a stand-in for value creation. The agent looks productive because it never stops — but stopping is exactly what would force the question.&lt;/p&gt;

&lt;p&gt;Autonomy used to mean delegated judgment. You trust someone to make calls because they understand the goal and can feel when something's off. What most agents have is delegated execution. They can do the steps. They have no stake in the outcome, no access to the silence that follows a bad result, no way to know the customer churned three weeks later because the feature was technically correct and completely wrong.&lt;/p&gt;




&lt;p&gt;Automate the tedious middle of a known, stable process. Data pipeline, alert triage, code linting, content reformatting. Stuff where the definition of done is actually defined. That's real. That's useful. A cron job with taste.&lt;/p&gt;

&lt;p&gt;The inflated version — the one burning the tokens — is the agent as a substitute for product thinking. If you don't know what to build, an agent that builds constantly feels like momentum.&lt;/p&gt;

&lt;p&gt;It isn't. It's expensive randomness with good logging.&lt;/p&gt;




&lt;p&gt;Consider Spotify.&lt;/p&gt;

&lt;p&gt;A company that built its entire brand on one rule: only ship what users ask for. Feature requests drove the roadmap. That's it.&lt;/p&gt;

&lt;p&gt;Then AI became mainstream and the calculus changed publicly. Spotify's workforce went from 7,721 employees at the start of 2024 to 7,242 by Q3 — shrinking every quarter while revenue grew 19% year over year. Their filings note it plainly: profitability driven by "lower personnel and related costs." They're doing more with fewer people. The numbers look good on a slide.&lt;/p&gt;

&lt;p&gt;But nobody's asking the follow-up question. The features that built Spotify's loyalty — Discover Weekly — came from people who understood the product, the listener, the culture of music discovery. Accumulated judgment. What does the agent fleet ship? What user asked for it? What happens when "only build what users want" gets replaced by "ship what the loop produces"?&lt;/p&gt;

&lt;p&gt;We don't know yet. The invoices look better. The product debt is still accumulating.&lt;/p&gt;




&lt;p&gt;I built &lt;a href="https://github.com/dannwaneri/seo-agent" rel="noopener noreferrer"&gt;seo-agent&lt;/a&gt; — an open-source SEO audit agent using Python, Browser Use, Claude API, and Playwright.&lt;/p&gt;

&lt;p&gt;I could leave it burning tokens 24/7. I didn't. Not because of the money. Because I couldn't answer the basic question: what would it actually be doing?&lt;/p&gt;

&lt;p&gt;I wired a cron job to run it on schedule. It analyzes logs. It surfaces what's broken. Then I look at the output, decide what matters, and go into my codebase with Claude Code to write the fix and the test. The agent handles the tedious middle. I handle the judgment at the edges.&lt;/p&gt;

&lt;p&gt;Call that old fashioned. I'd call it honest.&lt;/p&gt;

&lt;p&gt;The loop runs. But it runs to me. Not into a void.&lt;/p&gt;




&lt;p&gt;My Bookmark Brain — a RAG system trained on 50,000 of my own X bookmarks — flagged this pattern when I showed it the tweet:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;"Designing the loop is just procrastination with better posture if there's no customer at the end of it. Automated nobody is still nobody."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The stack was never the problem. It was always the most comfortable place to hide from the problem.&lt;/p&gt;

&lt;p&gt;Cron jobs ran quietly and failed loudly. Agents run loudly and fail quietly. The failure is just spread across enough API calls that the bill arrives before the reckoning does.&lt;/p&gt;

&lt;p&gt;Design better loops. Ship to someone who asked.&lt;/p&gt;




&lt;p&gt;This article used AI tools for research verification and editing.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>discuss</category>
      <category>productivity</category>
    </item>
    <item>
      <title>I built a tool to stop Claude from forgetting everything then forgot about it myself</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Wed, 03 Jun 2026 14:45:42 +0000</pubDate>
      <link>https://dev.to/dannwaneri/i-built-a-tool-to-stop-claude-from-forgetting-everything-then-forgot-about-it-myself-2e7f</link>
      <guid>https://dev.to/dannwaneri/i-built-a-tool-to-stop-claude-from-forgetting-everything-then-forgot-about-it-myself-2e7f</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-05-21"&gt;GitHub Finish-Up-A-Thon Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;I wired edge-context-mode into my own Claude Code setup. Then I stopped using it. Not because it didn't work — because I didn't understand what I'd built.&lt;/p&gt;

&lt;p&gt;Six weeks later, this challenge made me come back. What I found: a tool that was half-finished, a Durable Object secretly lying to me, and — when I finally looked at the code honestly — something actually worth finishing.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;The problem is simple to describe and annoying to live with.&lt;/p&gt;

&lt;p&gt;You start a session with Claude. You read a file, run a command, ask a question. Twenty minutes in the answers get worse. Forty minutes in it's forgotten the context. An hour in you're hitting limits and starting over.&lt;/p&gt;

&lt;p&gt;The cause: &lt;strong&gt;raw output floods the context window&lt;/strong&gt;. A &lt;code&gt;cat&lt;/code&gt; on a 500-line file puts 500 lines in context. &lt;code&gt;npm list&lt;/code&gt; adds 200 more. &lt;code&gt;git log&lt;/code&gt; adds more. The context fills with output the LLM will never reference again, and the things that actually matter — decisions, architecture, what you chose and why — get pushed out.&lt;/p&gt;

&lt;p&gt;edge-context-mode intercepts that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Normal:        cat large-file.ts  →  500 lines flood into context
edge-context:  ctx_execute(...)   →  [ctx:ab3f9x] + "12 line(s): interface User..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The raw output goes to Cloudflare D1 at the edge. The LLM gets a reference token and a 50-word summary. Context stays clean. Sessions stay coherent.&lt;/p&gt;

&lt;p&gt;Search is hybrid: D1's FTS5 gives you BM25 keyword matching out of the box. Pair it with &lt;a href="https://github.com/dannwaneri/vectorize-mcp-worker" rel="noopener noreferrer"&gt;vectorize-mcp-worker&lt;/a&gt; — another tool I built — and &lt;code&gt;ctx_search&lt;/code&gt; runs semantic vector search on top. Same stored data, same reference tokens. The retrieval layer just gets smarter.&lt;/p&gt;

&lt;p&gt;That's the design. What was actually shipped in April was a different story.&lt;/p&gt;




&lt;h3&gt;
  
  
  This directly solves the compaction problem
&lt;/h3&gt;

&lt;p&gt;If you've used Claude Code for a long session, you've probably seen this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Anyone else notice that compaction seems to lose more details than normal? It never seemed to matter before, but I'm seeing it frequently now."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's the same problem, one layer up. When Claude Code hits context limits, it compacts — auto-summarises the conversation to make room. The details that disappear are the exact things that matter: error messages from 30 minutes ago, what was tried and failed, the architectural choice that explains why the code looks the way it does.&lt;/p&gt;

&lt;p&gt;They disappear because they were sitting raw in the context window. Compaction summarises them aggressively and the specifics are gone.&lt;/p&gt;

&lt;p&gt;edge-context-mode attacks this in two ways. First, prevention: every &lt;code&gt;ctx_execute&lt;/code&gt; call keeps raw output out of the context entirely — only a 50-word summary and a reference token go in. Less in context means compaction triggers less often. Second, survival: everything stored via &lt;code&gt;ctx_execute&lt;/code&gt; and &lt;code&gt;ctx_annotate&lt;/code&gt; lives in D1, outside the context. Compaction can't touch it. After compaction wipes your conversation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ctx_history(session_id: "myproject-2026-05-22")
→ full chronological list of everything that happened, pulled from D1

ctx_reflect(session_id: "myproject-2026-05-22")
→ "Session has 14 entries over ~47 min. Fixed D1 FK constraint, added
   ctx_get tool, updated README, decided 512KB cap on raw_output..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The session memory didn't compact. Only the conversation did.&lt;/p&gt;

&lt;p&gt;The same day I wrote this, this appeared on X:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F002obg5m0aesfio3zgxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F002obg5m0aesfio3zgxd.png" alt="X thread: @ankkala on compaction models, @dannwaneri reply on compression without loss" width="800" height="621"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;@ankkala: "There should be an entire new class of LLM dedicated just to compaction." My reply: "Compression without loss is the oldest hard problem in cognition. Summarization fails the same way bad thinking fails — not because the words are wrong, but because the underlying structure was never identified in the first place. A specialized compaction model doesn't fix that. It just obscures where the reasoning broke down."&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The argument for a dedicated compaction model assumes the problem is compression efficiency. It isn't. The problem is that the wrong things are in the context in the first place. A better compressor produces confident-sounding summaries of the wrong things. edge-context-mode doesn't make compaction smarter — it reduces what has to be compacted at all.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;One caveat, now partially lifted:&lt;/strong&gt; edge-context-mode is an MCP server. When I wrote this it was wired only into Claude Code. By the time I finished, I'd registered it in VS Code and GitHub Copilot discovered all 8 tools automatically — same server, zero changes to the code. ChatGPT and Gemini still require function-calling adapters (v1.1). But "Claude Code only" undersold it from the start.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/dannwaneri/edge-context-mode" rel="noopener noreferrer"&gt;https://github.com/dannwaneri/edge-context-mode&lt;/a&gt;&lt;br&gt;
&lt;strong&gt;Release:&lt;/strong&gt; &lt;a href="https://github.com/dannwaneri/edge-context-mode/releases/tag/v1.0.0" rel="noopener noreferrer"&gt;https://github.com/dannwaneri/edge-context-mode/releases/tag/v1.0.0&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local setup — 4 commands, no cloud account:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/dannwaneri/edge-context-mode
&lt;span class="nb"&gt;cd &lt;/span&gt;edge-context-mode
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run migrate:local
npm run &lt;span class="nb"&gt;local&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Register with Claude Code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;claude mcp add edge-context-mode &lt;span class="nt"&gt;--&lt;/span&gt; node /path/to/edge-context-mode/src/local.ts
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;What a session looks like now:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Run a command — only a reference token enters context
ctx_execute("node -e \"require('./package.json').dependencies\"", "check deps")
→ [ctx:kx92ma3b1p]
→ 1 line(s): { hono: '^4.7.11', '@modelcontextprotocol/sdk': '^1.12.0'...

# Need the actual output? Pull it by reference
ctx_get("[ctx:kx92ma3b1p]")
→ ref:     [ctx:kx92ma3b1p]
→ summary: 1 line(s): { hono: '^4.7.11'...
→ --- raw output ---
→ { hono: '^4.7.11', '@modelcontextprotocol/sdk': '^1.12.0', ... }

# Save a decision without running a command
ctx_annotate("decided to cap raw_output at 512KB — D1 row limit is ~1MB, leaving headroom")
→ [ctx:mw71nx4d2q]

# Search past context semantically
ctx_search("D1 storage decisions")
→ [ctx:mw71nx4d2q] [score:1.84] annotation: decided to cap raw_output at 512KB...

# Health check
ctx_doctor
→ { "d1": "ok", "execution_mode": "local-stdio", "vectorize_mcp": "configured", "sessions": 6, "entries": 9 }
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgz6y4yfncl8a3g85pfcm.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgz6y4yfncl8a3g85pfcm.jpg" alt="Copilot Chat agentic loop: created executor.spec.ts, ran npm test, all 4 passing" width="800" height="1062"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltokiw5ozh4gu4dlgh21.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltokiw5ozh4gu4dlgh21.jpg" alt="Copilot Chat edge case analysis: 8 cases including Windows \r\n line endings" width="800" height="1062"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The Comeback Story
&lt;/h2&gt;

&lt;p&gt;On April 15th, I shipped the initial release. Two commits, deployed to Cloudflare, wired into my own setup. Then I moved on.&lt;/p&gt;

&lt;p&gt;Coming back for this challenge, I read the code properly for the first time. Here's what was actually there.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Durable Object was returning a placeholder.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the one I'm most embarrassed about. &lt;code&gt;ExecutorDO.ts&lt;/code&gt; — the Cloudflare Durable Object that was supposed to sandbox execution in Workers mode — had this in production:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;stdout&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`[DO received: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;command&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;]`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;exit_code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;timed_out&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you deployed to Cloudflare Workers and called &lt;code&gt;ctx_execute&lt;/code&gt;, you'd get back a fake success. No output. No error. Just a quietly wrong result. I'd left a comment: &lt;em&gt;"integrate with a Workers AI function or trusted external runner"&lt;/em&gt; — and never did it.&lt;/p&gt;

&lt;p&gt;The fix wasn't to build the external runner. Cloudflare Workers genuinely cannot spawn subprocesses, and building a remote execution service in two weeks isn't the right call. The honest fix was to say so: replace the silent stub with a clear error that tells you exactly what to run instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;ctx_get&lt;/code&gt; didn't exist.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The entire architecture depends on &lt;code&gt;[ctx:id]&lt;/code&gt; references being retrievable. Store a summary, get back a token, pull the original when you need it. That was the design. There was no tool to do the pulling. Every reference was write-only. I'd built half a memory system and hadn't noticed.&lt;/p&gt;

&lt;p&gt;Added &lt;code&gt;ctx_get&lt;/code&gt; — strips the &lt;code&gt;[ctx:]&lt;/code&gt; prefix, queries D1 by ID, checks expiry, returns the summary and raw output. If it's gone: &lt;code&gt;"Entry not found or expired."&lt;/code&gt; No crash, no drama.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;ctx_annotate&lt;/code&gt; didn't exist either.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Context only accumulated through &lt;code&gt;ctx_execute&lt;/code&gt; — shell commands. You couldn't save why you made a decision. You couldn't annotate an architectural choice. You couldn't store a note without wrapping it in a fake command. The tool only captured what you ran, not what you thought.&lt;/p&gt;

&lt;p&gt;Added &lt;code&gt;ctx_annotate&lt;/code&gt; — takes text, stores it as &lt;code&gt;type: "annotation"&lt;/code&gt;, shows up in &lt;code&gt;ctx_search&lt;/code&gt; and &lt;code&gt;ctx_history&lt;/code&gt;. The session history now reflects intent, not just execution.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Raw output was never stored.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;There's a comment in the original schema that says exactly this: &lt;code&gt;"raw output is NEVER stored here — only the summary."&lt;/code&gt; Deliberate design. The problem: &lt;code&gt;ctx_get&lt;/code&gt; needs something to return. You can't have a retrieval tool with nothing to retrieve.&lt;/p&gt;

&lt;p&gt;Migration &lt;code&gt;0002_raw_output.sql&lt;/code&gt; recreates the table with a &lt;code&gt;raw_output TEXT&lt;/code&gt; column and an updated CHECK constraint to include &lt;code&gt;annotation&lt;/code&gt; as a valid entry type. Full stdout now stored in D1, capped at 512KB. Old entries retain &lt;code&gt;NULL&lt;/code&gt; gracefully.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup required three services before anything ran.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The original README listed as prerequisites: Cloudflare account with Workers Paid plan, a D1 database, and a separately deployed &lt;a href="https://github.com/dannwaneri/vectorize-mcp-worker" rel="noopener noreferrer"&gt;vectorize-mcp-worker&lt;/a&gt;. Three services, three secrets, before the server would start. Most people would give up at step two.&lt;/p&gt;

&lt;p&gt;Local mode now works with zero cloud setup. Four commands. The vectorize-mcp-worker is an optional semantic search upgrade — worth deploying once you're running sessions regularly, but not a requirement to get started.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;code&gt;ctx_search&lt;/code&gt; had a silent phrase-matching bug — found it live.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one I found after v1.0.0 shipped, while testing the tools in a real session. I stored an annotation, then searched for words I knew were in it: &lt;em&gt;"Vectorize optional"&lt;/em&gt;. No results.&lt;/p&gt;

&lt;p&gt;The bug was in &lt;code&gt;ftsPhrase()&lt;/code&gt; in &lt;code&gt;store.ts&lt;/code&gt;. It wrapped the entire query in double quotes:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Before — strict phrase search&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="s2"&gt;`"&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/"/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;""&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;"`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c1"&gt;// "Vectorize optional" only matches if those two words are adjacent&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The annotation text said &lt;em&gt;"Vectorize **stays&lt;/em&gt;* optional"*. One word between them. No match.&lt;/p&gt;

&lt;p&gt;The fix: quote each term individually so FTS5 treats them as implicit AND — all terms must appear in the document, anywhere, not consecutively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// After — per-term quoting&lt;/span&gt;
&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;q&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;trim&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+/&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;word&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="s2"&gt;`"&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;word&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/"/g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;""&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt;"`&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// "Vectorize" "optional" — matches regardless of what's between them&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Special-char safety (hyphens, numbers) preserved. Fixed and shipped as a post-v1.0.0 patch the same day.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The migration runner was silently wiping all data on every restart.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This one I found after the article was mostly written, while investigating why &lt;code&gt;ctx_get&lt;/code&gt; returned &lt;code&gt;(raw output not available)&lt;/code&gt; on every annotation.&lt;/p&gt;

&lt;p&gt;The migration runner in &lt;code&gt;local.ts&lt;/code&gt; used a try/catch pattern:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;sql&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="cm"&gt;/* already applied */&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The assumption: if the SQL fails, the migration was already applied. The problem: migration &lt;code&gt;0002&lt;/code&gt; starts with &lt;code&gt;DROP TABLE IF EXISTS context_entries&lt;/code&gt;. &lt;code&gt;IF EXISTS&lt;/code&gt; never throws — so the catch never fires. Every server restart ran &lt;code&gt;0002&lt;/code&gt; from scratch, dropping the entire &lt;code&gt;context_entries&lt;/code&gt; table and recreating it empty. All stored context wiped. Silently.&lt;/p&gt;

&lt;p&gt;The fix: a &lt;code&gt;_migrations&lt;/code&gt; table that records each &lt;code&gt;.sql&lt;/code&gt; file by name. On startup, already-applied files are skipped entirely:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exec&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`CREATE TABLE IF NOT EXISTS _migrations (name TEXT PRIMARY KEY, applied_at INTEGER)`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;already&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;prepare&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;SELECT 1 FROM _migrations WHERE name = ?&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;f&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;already&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Data now survives restarts. This is the kind of bug that only shows up when you actually use the thing — not in tests, not in code review, only when you store something, close the terminal, reopen it, and find nothing there.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Experience with GitHub Copilot
&lt;/h2&gt;

&lt;p&gt;I want to be honest here because vague Copilot praise is exactly what's eroding trust in challenge submissions right now.&lt;/p&gt;

&lt;p&gt;The code work in this finish-up — the migration, the new tools, the server fixes — I did with Claude Code. Copilot was where the project had been failing silently since April: &lt;strong&gt;tests&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;vitest&lt;/code&gt; was in &lt;code&gt;package.json&lt;/code&gt; from the initial commit. Zero tests had ever been written. I'd been aware of this the way you're aware of a leak you haven't fixed.&lt;/p&gt;

&lt;p&gt;I opened &lt;code&gt;src/tools/executor.ts&lt;/code&gt; in VS Code and gave Copilot Chat this prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Write vitest unit tests for the &lt;code&gt;validateCommand&lt;/code&gt; function in this file. Test: a whitelisted command like 'node' passes, an unknown binary like 'rm' is rejected with COMMAND_NOT_ALLOWED, a path traversal attempt with '../etc' is blocked, and a git subcommand not in the allowlist is rejected."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Copilot opened &lt;code&gt;package.json&lt;/code&gt; to check the test configuration, created &lt;code&gt;executor.spec.ts&lt;/code&gt; (+51 lines), ran &lt;code&gt;npm test --silent&lt;/code&gt; to verify, and reported back: &lt;em&gt;"Tests added and run: all 4 passing."&lt;/em&gt; The agentic loop — read config, write file, run tests, confirm results — without me prompting each step.&lt;/p&gt;

&lt;p&gt;Then I asked it to look at &lt;code&gt;summarise&lt;/code&gt; for missing coverage. It came back with eight specific edge cases: empty output, whitespace-only lines, Windows line endings where &lt;code&gt;\r&lt;/code&gt; could leak into summaries stored in D1, lines with leading spaces, the &lt;code&gt;SUMMARY_MAX_CHARS&lt;/code&gt; boundary, mixed empty and non-empty lines.&lt;/p&gt;

&lt;p&gt;The Windows one stopped me. I'm on Windows. &lt;code&gt;\r\n&lt;/code&gt; line endings are something I live with and had completely stopped thinking about. A summary with a trailing &lt;code&gt;\r&lt;/code&gt; stored in D1 and returned to the LLM is a subtle, real bug I would not have found on my own.&lt;/p&gt;

&lt;p&gt;That's what honest Copilot use looks like. It found what I'd been ignoring and made me own it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Post-Publication: It Runs in GitHub Copilot Too
&lt;/h2&gt;

&lt;p&gt;After the article went live, I added two lines to &lt;code&gt;.vscode/mcp.json&lt;/code&gt; and restarted VS Code. GitHub Copilot discovered all 8 tools automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs55pi4d1hc3piefrdqx5.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs55pi4d1hc3piefrdqx5.jpg" alt="GitHub Copilot calling ctx_doctor via edge-context-mode MCP Server — Status: ok, Sessions tracked: 12, Entries stored: 33" width="800" height="1062"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtmz54g4u3gb3kj8gg31.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhtmz54g4u3gb3kj8gg31.jpg" alt="GitHub Copilot calling ctx_annotate via edge-context-mode MCP Server — note saved confirmed" width="800" height="1062"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Same server. Zero code changes. The claim that this is "Claude Code only" was wrong the moment MCP support shipped in VS Code.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;.vscode/mcp.json&lt;/code&gt; config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"servers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"edge-context-mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stdio"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"command"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"cmd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"args"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"/c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"C:&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;path&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;to&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;edge-context-mode&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="s2"&gt;start-mcp.cmd"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The question I keep getting: &lt;em&gt;can this work outside Claude Code?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Already answered: Cursor, Windsurf, Claude Desktop, GitHub Copilot in VS Code — anything that speaks MCP works today. "Claude Code only" was underselling it from the start.&lt;/p&gt;

&lt;p&gt;Genuinely universal requires one more step: function-calling adapters for OpenAI and Gemini. The storage layer — D1, FTS5, the reference system — doesn't change at all. Same data, same search, same &lt;code&gt;[ctx:id]&lt;/code&gt; tokens. You'd just expose the tools as OpenAI function definitions or Gemini tool declarations instead of MCP tool registrations.&lt;/p&gt;

&lt;p&gt;The Workers HTTP mode is already the bridge. Any LLM with tool/function calling can hit the deployed endpoint directly if you wire up the schema on their side.&lt;/p&gt;

&lt;p&gt;The seamless part — where raw output is automatically kept out of context before you even think about it — still requires the client to route through edge-context-mode. MCP does that natively. For non-MCP LLMs you'd call &lt;code&gt;ctx_execute&lt;/code&gt; manually instead of running commands directly. Less automatic. Still useful. The memory survives either way.&lt;/p&gt;

&lt;p&gt;That's v1.1: OpenAI and Gemini adapters, same core, no storage changes. If you're building on a different stack and want to help, the repo is open.&lt;/p&gt;

&lt;p&gt;One more gap worth naming: &lt;strong&gt;you can't retroactively capture a session that started before edge-context-mode was running.&lt;/strong&gt; If you worked for an hour before registering the MCP server, that context lived in the Claude Code conversation window only — edge-context-mode never saw it.&lt;/p&gt;

&lt;p&gt;The workaround right now is &lt;code&gt;ctx_annotate&lt;/code&gt; — manually summarise what happened before the tool was active. It works but it's manual.&lt;/p&gt;

&lt;p&gt;I tested this on the very session in which I'm writing this article. It hit context limits once, compacted, and continued. I opened the &lt;code&gt;.jsonl&lt;/code&gt; file, found the compaction summary (stored as a &lt;code&gt;type: "user"&lt;/code&gt; entry with &lt;code&gt;isCompactSummary: true&lt;/code&gt;), and ran one &lt;code&gt;ctx_annotate&lt;/code&gt; call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ctx_annotate("SESSION IMPORT (finishupathon, 2026-05-22) — edge-context-mode v1.0.0 decisions:
ExecutorDO stub → honest error. ctx_get added. ctx_annotate added. raw_output migration.
Local mode: 4 commands, zero cloud. ftsPhrase() per-term fix.")
→ [ctx:8wo1rh1buy]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One compaction, one call. That context is now in D1. If this session compacts again tomorrow, &lt;code&gt;ctx_search("edge-context-mode v1.0.0")&lt;/code&gt; will surface exactly what was decided and why.&lt;/p&gt;

&lt;p&gt;The proper fix: Claude Code stores every session as a &lt;code&gt;.jsonl&lt;/code&gt; file in &lt;code&gt;~/.claude/projects/&lt;/code&gt;. Full conversation, every tool call, every output. A &lt;code&gt;ctx_import&lt;/code&gt; command that reads those files and bulk-loads them into D1 would close the gap completely — retroactive context, searchable, surviving all future compaction. The storage layer already handles it. It just needs a reader for the &lt;code&gt;.jsonl&lt;/code&gt; format (compact summaries are the &lt;code&gt;isCompactSummary: true&lt;/code&gt; entries) and a bulk insert path. That's v1.2.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built with TypeScript, Cloudflare Workers, Durable Objects, D1, and a belated appreciation for what I'd actually made.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>discuss</category>
      <category>githubfinishupathon</category>
    </item>
    <item>
      <title>Google Is One Feature Away From Killing an Entire Startup Category</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 02 Jun 2026 08:14:41 +0000</pubDate>
      <link>https://dev.to/dannwaneri/google-is-one-feature-away-from-killing-an-entire-startup-category-jk</link>
      <guid>https://dev.to/dannwaneri/google-is-one-feature-away-from-killing-an-entire-startup-category-jk</guid>
      <description>&lt;p&gt;I use NotebookLM as an attack layer before I publish anything.&lt;/p&gt;

&lt;p&gt;Not for summaries. Not for research. I upload my draft, let the audio overview run, and listen to two AI hosts interrogate my argument. If the narration exposes a gap — a claim that doesn't land, a section that wanders — I go back and fix it before the article goes live.&lt;/p&gt;

&lt;p&gt;It works. Better than re-reading. Better than asking a colleague. The distance of hearing your argument spoken back at you by something that doesn't share your assumptions is genuinely useful.&lt;/p&gt;

&lt;p&gt;One thing is missing.&lt;/p&gt;

&lt;p&gt;The voice. My voice.&lt;/p&gt;




&lt;h2&gt;
  
  
  What NotebookLM already does
&lt;/h2&gt;

&lt;p&gt;The hard problems are solved.&lt;/p&gt;

&lt;p&gt;NotebookLM ingests PDFs, Google Docs, audio files, YouTube links, and web URLs. It grounds its narration in your sources — it doesn't hallucinate beyond them. It produces coherent two-host audio that actually sounds like a conversation, not a text-to-speech dump. It maintains source fidelity across long documents.&lt;/p&gt;

&lt;p&gt;These are not small engineering problems. Source grounding alone eliminates an entire class of failure that plagues every generic AI summarizer. The coherence of the narration — the way the two hosts disagree, interrupt, and redirect — required significant work to build.&lt;/p&gt;

&lt;p&gt;Most people use it to summarize research papers. Some use it for meeting notes. I use it to stress-test arguments before they're public.&lt;/p&gt;

&lt;p&gt;What nobody is talking about: Google solved the hard part already.&lt;/p&gt;




&lt;h2&gt;
  
  
  The one feature that changes everything
&lt;/h2&gt;

&lt;p&gt;Personalized voice.&lt;/p&gt;

&lt;p&gt;Not a voice clone of a podcast host. Not a generic British narrator. Your voice. Trained on your recordings, matched to your cadence, deployed on your content.&lt;/p&gt;

&lt;p&gt;The moment Google ships that, three things happen simultaneously:&lt;/p&gt;

&lt;p&gt;Audio overviews stop sounding like someone else reading your work. They start sounding like you, presenting your argument, in your register. The use case expands from "summarize this" to "publish this." And every startup selling AI-powered personal audio becomes a feature comparison in a product Google already owns.&lt;/p&gt;




&lt;h2&gt;
  
  
  The countdown timers
&lt;/h2&gt;

&lt;p&gt;Let's name them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ElevenLabs&lt;/strong&gt; built a genuinely impressive voice cloning product. The quality is real. The API is well-documented. Developers use it. But ElevenLabs' core value proposition — "your voice, anywhere" — is exactly what Google would ship as a NotebookLM toggle. Not a new product. A settings page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Podcastle&lt;/strong&gt; sells AI-powered podcast creation with voice cloning and audio cleanup. It's a prosumer tool aimed at creators who want professional audio without a studio. It's also a collection of features that sit inside what NotebookLM already does structurally, minus the voice layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Wondercraft&lt;/strong&gt; is an AI audio platform for turning written content into audio. Good product. Direct overlap with NotebookLM's architecture. One product update away from redundancy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Descript&lt;/strong&gt; is the most defensible of the group — it has a video editing layer, a timeline, a collaboration workflow. It isn't purely an audio generation tool. But its AI voice layer, "Overdub," is exactly the feature that would become noise the day Google ships personal voice to NotebookLM.&lt;/p&gt;

&lt;p&gt;None of them are bad products. That's not the argument.&lt;/p&gt;

&lt;p&gt;The argument is that their moat is a gap Google hasn't prioritized filling. That's a countdown timer, not a competitive advantage.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Google tax
&lt;/h2&gt;

&lt;p&gt;This pattern has a name. Developers who've been around long enough know it.&lt;/p&gt;

&lt;p&gt;Google Workspace has a task manager. It's called Tasks. It's fine. Todoist, TickTick, and Things 3 all exist because Tasks is fine and not great, and "fine" left a gap big enough to build companies in.&lt;/p&gt;

&lt;p&gt;Google Calendar handles scheduling. Calendly became a $3 billion company because Calendar doesn't do one specific thing — let other people book time in your calendar without an email thread. One feature. Entire company.&lt;/p&gt;

&lt;p&gt;Google Keep exists. Notion exists anyway. The overlap is real; the gap was real enough to matter.&lt;/p&gt;

&lt;p&gt;Some of these survive. Calendly survived because the booking workflow is genuinely distinct from what Calendar does natively. Notion survived because it expanded beyond the gap — documents, databases, wikis — before Google closed it.&lt;/p&gt;

&lt;p&gt;The AI audio startups don't have that runway. They're not building adjacent to NotebookLM. They're building inside it. Their entire value proposition sits in the gap between what NotebookLM already does and the one feature Google hasn't shipped.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Google probably won't ship it next quarter
&lt;/h2&gt;

&lt;p&gt;Let's be honest about the timeline.&lt;/p&gt;

&lt;p&gt;Google is not aggressively pursuing this. NotebookLM is a Google Labs product — impressive, genuinely useful, and clearly not the company's primary focus. The team is small relative to the broader Gemini push. Personal voice cloning has real regulatory and ethical baggage — deepfakes, consent, liability — that slows any large company down in ways a startup can ignore.&lt;/p&gt;

&lt;p&gt;The AI audio startups might have 18 months. Maybe 24.&lt;/p&gt;

&lt;p&gt;But here's the thing about building on a platform gap: the clock isn't running on whether Google ships it. The clock is running on whether the market believes Google will ship it. The moment that belief takes hold — a Google I/O demo, a leak, a product page — the fundraising environment for AI audio personalization startups changes overnight.&lt;/p&gt;

&lt;p&gt;They're not being hunted. They're being ignored.&lt;/p&gt;

&lt;p&gt;Ignored by Google is its own kind of death sentence.&lt;/p&gt;




&lt;h2&gt;
  
  
  What this actually means for developers
&lt;/h2&gt;

&lt;p&gt;If you're building in this space, the question isn't "can we build a better voice cloning product than ElevenLabs?" The question is: "What does this product do that would survive a Google I/O keynote?"&lt;/p&gt;

&lt;p&gt;The companies that survive the Google tax survive it by expanding beyond the gap before it closes. Calendly survived by becoming a scheduling platform — reminders, routing, integrations — not just a booking link. Notion survived by becoming a workspace, not just a note-taking tool.&lt;/p&gt;

&lt;p&gt;AI audio startups that survive will be the ones that embed voice into a workflow Google doesn't own. Video production pipelines. Podcast distribution. Live audio. Language learning. The ones that build deep into a workflow Google has no reason to touch.&lt;/p&gt;

&lt;p&gt;The ones building "NotebookLM but with your voice" are the countdown timers.&lt;/p&gt;




&lt;p&gt;I still use NotebookLM every time I publish. I still listen to two AI hosts interrogate my arguments while I cook.&lt;/p&gt;

&lt;p&gt;I just do it in someone else's voice.&lt;/p&gt;

&lt;p&gt;For now.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>startup</category>
      <category>productivity</category>
    </item>
    <item>
      <title>What Building My Own AI Bot Taught Me About Generative AI</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Wed, 27 May 2026 19:57:35 +0000</pubDate>
      <link>https://dev.to/dannwaneri/what-building-my-own-ai-bot-taught-me-about-generative-ai-57il</link>
      <guid>https://dev.to/dannwaneri/what-building-my-own-ai-bot-taught-me-about-generative-ai-57il</guid>
      <description>&lt;p&gt;I built a bot trained on my own X bookmarks and likes. Around 50,000 of them, accumulated over years of lurking, arguing, and clicking the save button on things that made me stop scrolling.&lt;/p&gt;

&lt;p&gt;The technical part isn't complicated in principle. You pull your export, embed the text, build a RAG pipeline, add a style prompt derived from your own writing patterns, and you get something that responds to prompts by retrieving your most relevant saved content and riffing from there. I called it Bookmark Brain, which is either clever or embarrassing — I haven't decided.&lt;/p&gt;

&lt;p&gt;What I didn't expect was how much it would clarify my thinking about what generative AI actually is.&lt;/p&gt;




&lt;p&gt;The bot works too well. That's the problem.&lt;/p&gt;

&lt;p&gt;When I ask it about API design opinions or takes on the current AI hype cycle, it returns something that sounds like me — specific, slightly annoyed, grounded in a particular set of concerns — better than most general-purpose LLMs do when I prompt them with "write in my voice." The difference isn't the model. It's the retrieval layer. The model in both cases is doing the same approximate thing. What changes is what it retrieves before it starts generating.&lt;/p&gt;

&lt;p&gt;That realization landed harder than I expected: a significant chunk of what we call AI "intelligence" is retrieval. The system finds related content, mixes it with the query, and produces output shaped by that specific neighborhood of the embedding space. It's not thinking. It's not understanding. It's doing something closer to extremely sophisticated autocomplete with a memory. The illusion of reasoning comes from the quality of what was retrieved, not from inference happening in any deep sense.&lt;/p&gt;

&lt;p&gt;The uncomfortable follow-on: I started noticing the same thing in myself. A lot of what I'd been calling original thinking was my brain doing something structurally similar — retrieving from a curated internal dataset of influences, combining them in ways that felt novel, outputting with enough fluency to pass as insight. The bot didn't make me feel smarter. It made me suspicious of my own cognition.&lt;/p&gt;

&lt;p&gt;My bot sounds coherent because my bookmarks are coherent. I've spent years curating a specific worldview — skeptical of tech hype, interested in systems and incentives, irritated by vague abstraction. That worldview is baked into the dataset. Retrieval finds it. The model outputs it in grammatical sentences. The whole thing looks like intelligence from the outside.&lt;/p&gt;




&lt;p&gt;Then the Granta thing happened.&lt;/p&gt;

&lt;p&gt;If you missed it: Granta, the literary magazine, ran a piece flagged by AI detectors. Turns out the writing was human — and older than the detectors themselves. Pre-2022, written before the tools they were being assessed with even existed.&lt;/p&gt;

&lt;p&gt;The writer, understandably, was furious. The editorial response was clumsy. What struck me was the confidence behind the process — the idea that a detector score constitutes evidence of anything meaningful.&lt;/p&gt;

&lt;p&gt;It doesn't. AI detectors are probabilistic classifiers trained on distributional differences between human and AI writing. Dense, formal, or unusual prose trips them constantly. Academic writing, translated text, anything with a compressed or structured style — all of these get flagged. The detector isn't reading. It's pattern-matching statistical features. And those features shift as models improve, as writing styles evolve, as the gap between the training distribution and current reality widens.&lt;/p&gt;

&lt;p&gt;Watching publications, employers, and universities lean on these tools as if they're reliable is the same energy as relying on a polygraph. The tool isn't detecting deception. It's detecting nervousness, or formality, or the wrong register for the context. The conclusion isn't what the tool thinks it is.&lt;/p&gt;

&lt;p&gt;What the Granta situation made concrete for me: we have a collective problem with mistaking a signal for the thing the signal supposedly measures. Perplexity score is not authenticity. Semantic similarity is not understanding. And this is the same confusion that inflates most AI capability claims.&lt;/p&gt;




&lt;p&gt;Here's the irony I live with every day.&lt;/p&gt;

&lt;p&gt;I use AI heavily. I build with it, write with it, prototype faster because of it. I'm not performing skepticism while secretly relying on it — I'm actually relying on it, out in the open, and also genuinely skeptical of what it's doing and why the claims around it are so often overconfident.&lt;/p&gt;

&lt;p&gt;Yes, I'm part of the problem. I know that. But I built Bookmark Brain precisely because I wanted to understand what the problem actually is — not at the level of takes and op-eds, but at the level of retrieval logs and embedding distances and why a particular output came out the way it did. The people most confident about AI — evangelists and critics alike — are usually the ones who haven't built anything with it. They're responding to the outputs. I wanted to see the pipes.&lt;/p&gt;

&lt;p&gt;My bot makes this concrete in a specific way. Because I can see exactly what it's doing — retrieve, compose, style-match — I can no longer pretend the underlying process is mysterious. It isn't. It's a very good pattern engine. And the patterns it's good at are the ones humans have already made enough times to constitute a retrievable signal.&lt;/p&gt;

&lt;p&gt;The things it can't do are equally clear. It can't tell me something genuinely new. It can't resolve contradictions in my bookmarks; it just retrieves whichever side of an argument is more semantically proximate to my query. It has no persistent sense of what I care about most — that's in the embedding weights and the retrieval ranking, not in anything like a value structure. If I've saved content across five years on Nigerian economic policy, it can retrieve that content. It cannot tell me what I should think about a new development that doesn't yet exist in those embeddings.&lt;/p&gt;

&lt;p&gt;That's not a criticism. It's just an accurate description of what the tool is. The criticism is when people — including, honestly, past me — talk about these systems as if they're operating at a different level entirely.&lt;/p&gt;




&lt;p&gt;Most people initially misunderstand generative AI the same way. They see the output and map it to human cognition because that's the only reference frame available. The output sounds like thinking. Therefore it is thinking. The logic is understandable and wrong.&lt;/p&gt;

&lt;p&gt;What's actually happening is closer to: the system has compressed a large representation of existing human expression, retrieves the most contextually relevant parts, and generates a continuation that's statistically consistent with that neighborhood. That's not nothing. In fact it's remarkable. But it's not reasoning. It's not understanding. And it absolutely is not reliable in domains where the training distribution doesn't match the actual problem.&lt;/p&gt;

&lt;p&gt;Building Bookmark Brain made this concrete rather than abstract. I could watch the retrieval logs. I could see what it was pulling. I could trace why a particular response came out the way it did. That transparency — available only because I built it — is exactly what's missing when people interact with closed systems and anthropomorphize the outputs.&lt;/p&gt;




&lt;p&gt;The piece of this I'm still sitting with is about curation.&lt;/p&gt;

&lt;p&gt;My bot is useful because I curated carefully for years. The quality of the output is downstream of the quality of the input — not the model, not the prompt engineering, the input. 50,000 bookmarks that reflect a consistent set of concerns, an identifiable worldview, real opinions.&lt;/p&gt;

&lt;p&gt;If I'd bookmarked everything uncritically, the bot would be incoherent. Garbage in, garbage out, but at scale and with a convincing fluency that would make the garbage harder to spot.&lt;/p&gt;

&lt;p&gt;That's the thing about generative AI broadly: it doesn't make bad data good. It makes it fluent. And fluency is exactly the property that makes it hard for people — including detectors, including reviewers, including people who should know better — to evaluate what's actually in front of them.&lt;/p&gt;

&lt;p&gt;I built a tool that sounds like me. It works because of what I put into it, not because of anything the model does that's particularly special. The model is a compositor. The dataset is the author.&lt;/p&gt;

&lt;p&gt;That's the most clarifying thing I've learned. It's what almost every discussion about these systems gets wrong.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>rag</category>
      <category>llm</category>
    </item>
    <item>
      <title>Toward a Standard Model for Agent Memory</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 26 May 2026 17:29:36 +0000</pubDate>
      <link>https://dev.to/dannwaneri/toward-a-standard-model-for-agent-memory-3807</link>
      <guid>https://dev.to/dannwaneri/toward-a-standard-model-for-agent-memory-3807</guid>
      <description>&lt;p&gt;Most agent memory systems are digital attics.&lt;/p&gt;

&lt;p&gt;You put things in. You hope to find them later. You mostly don't. The retrieval is fuzzy, the context is lost, and the agent that needs to remember why a deployment failed three weeks ago gets back something that &lt;em&gt;looks&lt;/em&gt; related but carries none of the causal weight.&lt;/p&gt;

&lt;p&gt;This is the wrong mental model for memory. Not because the retrieval is bad — though it often is  but because storage is the wrong frame entirely.&lt;/p&gt;

&lt;p&gt;If memory is storage, you're building a place things go to accumulate. If memory is infrastructure, you're building something load-bearing. Agents depending on memory for causal context — why this failed, what fixed it, how that decision connected to this outcome — need load-bearing infrastructure. Not a warehouse. A power grid.&lt;/p&gt;

&lt;p&gt;The difference is consequential. Storage fails silently. You put something in and nothing comes out, or something wrong comes out, and the agent keeps going with degraded information it can't see is degraded. Infrastructure fails loudly, because the system depending on it stops working. Load-bearing memory makes failures visible. That's not a downside. That's the point.&lt;/p&gt;

&lt;p&gt;I've been building production agent workflows on Cloudflare Workers for two years, long enough to feel this distinction in concrete terms. The vectorize-mcp-worker — hybrid vector + BM25 search, cross-encoder reranking, a Gemma 4 MoE reflection layer — started as a storage system. Every architectural decision I've made since has been moving it toward infrastructure. That shift didn't happen all at once. It happened because the storage model kept producing the same failure: agents that couldn't distinguish between what &lt;em&gt;looked&lt;/em&gt; relevant and what &lt;em&gt;caused&lt;/em&gt; the thing they were trying to understand.&lt;/p&gt;

&lt;p&gt;A comment thread on Ken Walger's "Engineering Agent Memory" article clarified something I'd been working around without having language for.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Sequencing Problem
&lt;/h2&gt;

&lt;p&gt;The obvious fix for agent memory is write-time tagging. Tag what you know when you know it. Mark failures as failures. Mark resolutions as resolutions. Build the causal chain at the moment it happens.&lt;/p&gt;

&lt;p&gt;The problem is that causality is only visible in retrospect.&lt;/p&gt;

&lt;p&gt;An agent logs: "deployment failed due to timeout." That's a real memory. It happened. It's worth keeping. Later — same session, different session, a week later — the agent logs: "switched to async pattern, deployment succeeded." Also real. Also worth keeping.&lt;/p&gt;

&lt;p&gt;These two memories belong together. They're the before and after of the same causal chain. But at write-time, you don't know that. When the failure happens, there's no resolution to link it to. When the resolution happens, the failure might be in a different session, under a different key, already buried in the retrieval index.&lt;/p&gt;

&lt;p&gt;Vector search won't find the link reliably either. "Deployment failed due to timeout" and "switched to async pattern" don't look similar in embedding space. They're semantically distant. A similarity search for one won't surface the other. The causal connection is invisible to the retrieval layer.&lt;/p&gt;

&lt;p&gt;This is the sequencing problem. Write-time tagging is premature because the thing you need to tag — the causal relationship — doesn't exist yet when the first memory lands. And post-hoc retrieval is unreliable because the link you need to recover isn't semantic. It's structural. It's temporal. It's the kind of connection that requires knowing what happened before and after, not just what looks alike.&lt;/p&gt;

&lt;p&gt;Most memory systems stub this out. Summarize the session. Hope the summary captures enough. Move on.&lt;/p&gt;

&lt;p&gt;It doesn't. And the failure mode is subtle enough that you don't notice until the agent is confidently reasoning from a memory that's missing the half that would have changed the conclusion.&lt;/p&gt;




&lt;h2&gt;
  
  
  Instrumented Capture + Temporal Mirror
&lt;/h2&gt;

&lt;p&gt;The sequencing problem has two parts. They need two different solutions.&lt;/p&gt;

&lt;p&gt;The first part — what to capture at write-time — is solved by instrumented capture. Not tagging outcomes. Tagging intent. When an agent makes a tool call, the instrumentation layer sees not just the result but the active context: what the agent was attempting, what state it was in, what it expected to happen. "Attempting calibration sequence v2" is richer than "calibration failed." The failure is the outcome. The attempt is the context. You need both, and only one of them exists at write-time.&lt;/p&gt;

&lt;p&gt;MCP is the right layer for this. If the tool call routes through an MCP server, the server sees the full reasoning context — intent, failure mode, action taken — in real time, not reconstructed from a cold transcript later. Instrumentation at the call site captures signal that post-hoc analysis can't recover. The question is fidelity, which I'll come back to.&lt;/p&gt;

&lt;p&gt;The second part — bridging the gap between a failure tag and a resolution that lands later — is what Ken Walger calls the Temporal Mirror. A post-write reflection pass that runs across recent entries and surfaces causal candidates: memories that aren't similar in embedding space but are temporally adjacent and structurally complementary. The failure and its resolution. The question and the answer it didn't know was coming.&lt;/p&gt;

&lt;p&gt;In my setup, that reflection pass runs via Gemma 4 MoE after ingestion. Not a local model. The reason is specific: causal candidate identification requires enough reasoning capacity to recognize structural complementarity across entries that don't look alike on the surface. A smaller local model handles classification well. It misses the non-obvious links. And the non-obvious links are exactly where the causal chain value lives — if the link were obvious, vector search would have found it already.&lt;/p&gt;

&lt;p&gt;The token cost is real. It's also bounded. The reflection pass runs once per ingestion event, not per query. A fixed overhead at write-time rather than a compounding cost every time the memory is accessed. That trade-off only makes sense if the reflection pass actually improves retrieval precision — which brings us to how the link gets stored once it's found.&lt;/p&gt;




&lt;h2&gt;
  
  
  Forensic Receipt: Pre-paying for Precision
&lt;/h2&gt;

&lt;p&gt;Once the reflection pass identifies a causal link, the question is how to store it so retrieval can use it deterministically.&lt;/p&gt;

&lt;p&gt;The answer isn't another embedding. It's a UUID.&lt;/p&gt;

&lt;p&gt;Ken Walger calls this the Forensic Receipt — a unique identifier that links a failure entry to its resolution entry, independent of their semantic similarity. The agent doesn't need to search for the connection. It's already encoded. "deployment-failure-2024-11-04" links directly to "async-resolution-2024-11-07" via a stored causal edge, not via a similarity score that might or might not surface the right entry depending on how the query is phrased.&lt;/p&gt;

&lt;p&gt;This is the difference between a Reasoning Ledger and a Digital Attic. The attic accumulates. The ledger traces. When an agent queries memory for context about a deployment failure, the ledger doesn't return what &lt;em&gt;looks like&lt;/em&gt; deployment failures — it returns the specific failure and its resolution, linked by a chain of evidence that was built deliberately at ingestion time.&lt;/p&gt;

&lt;p&gt;The cost argument is cleaner than it sounds. Every fuzzy vector search that fails to surface the right memory is a cost: tokens spent, context window consumed, agent confidence degraded on a premise that's missing a piece. The reflection pass that builds the Forensic Receipt is an upfront investment against that compounding failure cost. You're pre-paying for retrieval precision at ingestion rather than paying repeatedly for imprecision at query time.&lt;/p&gt;

&lt;p&gt;The attic charges you every time you look for something and can't find it. The ledger charges you once, when you put it in correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Observer's Tax
&lt;/h2&gt;

&lt;p&gt;There's a constraint that sits underneath all of this that the architecture can't ignore.&lt;/p&gt;

&lt;p&gt;Ken Walger, whose background is in forensic auditing, named it the Observer's Tax: if your instrumentation is heavy enough to change the latency or behavior of the agent, you've lost the high-fidelity signal you were trying to capture. The agent you're logging isn't the agent anymore. The causal chain you're preserving is the chain of a degraded system.&lt;/p&gt;

&lt;p&gt;Instrumented capture only works if the instrumentation is cheap enough to leave on in production. An MCP layer that adds 400ms to every tool call changes the agent's decision timing. A reflection pass that blocks ingestion until it completes changes the agent's memory availability mid-session. The Observer's Tax isn't a theoretical concern — it's the boundary condition that determines whether the whole architecture is describing a real system or an idealized one.&lt;/p&gt;

&lt;p&gt;The practical implication: lightweight instrumentation over comprehensive instrumentation. Every additional signal the capture layer records is a cost paid in latency and behavioral change. The goal isn't maximum fidelity. It's minimum-viable fidelity — enough signal to build the causal chain, cheap enough to not corrupt it.&lt;/p&gt;

&lt;p&gt;Event-driven triggering applies the same principle to the reflection pass. Running it after every write is expensive. Running it on a schedule risks the gap: a failure tag sitting unlinked for hours before the next sweep. The better trigger is structural: the reflection pass fires when a write contains specific signals — error states, resolution markers, state transitions — rather than on a timer or on every entry. The signal-to-noise ratio on causal candidates improves significantly. The cost stays bounded.&lt;/p&gt;




&lt;h2&gt;
  
  
  Toward a Standard Model
&lt;/h2&gt;

&lt;p&gt;That's what infrastructure does. Storage charges you at query time. The Reasoning Ledger charges you once, when you build it correctly.&lt;/p&gt;

&lt;p&gt;I'm calling this a proposal, not a standard. The pieces are real — they come from production systems, from a comment thread that surfaced the right vocabulary, from architectural decisions made under Cloudflare's CPU constraints where every overhead is visible immediately. But the boundary conditions aren't fully mapped. High-fidelity capture as the load-bearing requirement is the right frame. How cheap is cheap enough? What's the minimum-viable reflection pass for a given workflow complexity? Those questions don't have clean answers yet.&lt;/p&gt;

&lt;p&gt;What the architecture gives you right now is a way to think about the problem that storage framing doesn't. Not where to put things. What to build so that agents can reason from them.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Open Problem
&lt;/h2&gt;

&lt;p&gt;The open problem isn't retrieval. It's capture fidelity.&lt;/p&gt;

&lt;p&gt;Every part of this architecture downstream of the instrumentation layer depends on the capture layer getting the signal right. The Temporal Mirror can only find causal connections that the ingestion pipeline actually received. The Forensic Receipt can only link entries that contain enough structural signal to recognize as complementary. The Observer's Tax names the constraint but doesn't solve it: we don't yet have a principled way to determine what minimum-viable instrumentation looks like for a given agent workflow.&lt;/p&gt;

&lt;p&gt;That's the next thing to figure out. Not how to retrieve memory better — vector search, BM25, cross-encoder reranking, all of that is solved enough to build on. How to capture what the agent actually did in a form that makes causal reasoning possible later, without the capture itself changing what the agent does.&lt;/p&gt;

&lt;p&gt;Most of this piece came out of a comment thread on Ken Walger's &lt;a href="https://dev.to/kenwalger/engineering-agent-memory-4a42"&gt;"Engineering Agent Memory"&lt;/a&gt; on DEV.to. Ken coined both sides of the central frame — "digital attic" and "power grid for reasoning" — in the same sentence, and named the Temporal Mirror, the Forensic Receipt, the Observer's Tax, and the Standard Model. The sequencing problem, event-driven triggering, and Instrumented Capture came from my end. The production specifics — Gemma 4 MoE reflection pass, Cloudflare CPU constraints, vectorize-mcp-worker architecture — are mine. Everything else emerged from the exchange.&lt;/p&gt;

&lt;p&gt;If you want to read the thread before the write-up arrived here, it's worth reading. Ken thinks carefully about forensic integrity in ways that transfer directly to agentic systems.&lt;/p&gt;

&lt;p&gt;The Standard Model isn't finished. The open constraint — how cheap is cheap enough for a given workflow — doesn't have a clean answer yet. But the frame is more useful than the one it replaces. You can build on infrastructure. You can't build on an attic.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>architecture</category>
      <category>memory</category>
    </item>
    <item>
      <title>Built a 100k-Document RAG System by Hand. Hermes Read the Architecture in 47 Seconds.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Thu, 21 May 2026 15:43:57 +0000</pubDate>
      <link>https://dev.to/dannwaneri/built-a-100k-document-rag-system-by-hand-hermes-read-the-architecture-in-47-seconds-14ge</link>
      <guid>https://dev.to/dannwaneri/built-a-100k-document-rag-system-by-hand-hermes-read-the-architecture-in-47-seconds-14ge</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/hermes-agent-2026-05-15"&gt;Hermes Agent Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;For the past six months, I have been building what Hermes does by hand.&lt;/p&gt;

&lt;p&gt;Not as a thought experiment. Actually building it: a hybrid BM25 + vector search engine on Cloudflare Workers, a Gemma 4 MoE reflection layer, 100,000+ documents indexed, an MCP server with Durable Objects, multimodal image ingestion with Llama 4 Scout. I won the OpenClaw Challenge writing about spec-writing agents. I built bookmark-cli — a personal knowledge engine with 45,053 tweets indexed from years of reading.&lt;/p&gt;

&lt;p&gt;When the Hermes Agent Challenge dropped, I had one question: does the tool do what I taught myself to do?&lt;/p&gt;

&lt;p&gt;This is that experiment.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;My machine:&lt;/strong&gt; Windows 11, Python 3.14, Git Bash (MSYS2), no WSL2.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The repo:&lt;/strong&gt; &lt;code&gt;vectorize-mcp-worker&lt;/code&gt; — my production hybrid RAG system. Six months of architectural decisions baked into TypeScript. V4 routing. Six specialized query routes. Cost analytics down to the millisecond. The kind of codebase where you know exactly what a competent agent &lt;em&gt;should&lt;/em&gt; say about it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The goal:&lt;/strong&gt; Install Hermes, configure it to use Claude, have it summarise the architecture in 5 bullets, document every friction point.&lt;/p&gt;




&lt;h2&gt;
  
  
  Friction Point 1: The Installer Doesn't Know You Exist
&lt;/h2&gt;

&lt;p&gt;The official install command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I ran it. The script loaded, detected my OS, and exited immediately:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight powershell"&gt;&lt;code&gt;&lt;span class="n"&gt;Windows&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;detected.&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;Please&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;use&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;PowerShell&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;installer:&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="n"&gt;irm&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nx"&gt;https://raw.githubusercontent.com/.../install.ps1&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="n"&gt;iex&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's not in the README. The PowerShell installer exists — it's right there in the repo — but nothing in the public docs points Windows users to it. You find out by reading the bash script's source.&lt;/p&gt;

&lt;p&gt;The install UX assumes you're on Linux or macOS. Windows is supported, but you're expected to find your own way there.&lt;/p&gt;




&lt;h2&gt;
  
  
  Friction Point 2: WSL2 Would Have Saved Me This
&lt;/h2&gt;

&lt;p&gt;I don't have WSL2 installed. That's on me. Every agent tooling project I've touched in the last year has quietly assumed it. Hermes is no different.&lt;/p&gt;

&lt;p&gt;If you're on Windows and planning to work with CLI agents: install WSL2 first. Don't assume Git Bash is equivalent. It almost never is.&lt;/p&gt;




&lt;h2&gt;
  
  
  Friction Point 3: PyPI to the Rescue (Unofficially)
&lt;/h2&gt;

&lt;p&gt;Rather than chase the PowerShell installer, I checked PyPI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip index versions hermes-agent
&lt;span class="c"&gt;# Available versions: 0.13.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;hermes-agent 0.13.0&lt;/code&gt; exists on PyPI. The official flow never mentions this. I installed it directly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;hermes-agent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Four minutes, some progress bars, done.&lt;/p&gt;




&lt;h2&gt;
  
  
  Friction Point 4: Dependency Hell
&lt;/h2&gt;

&lt;p&gt;The install succeeded but left me with this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;browser-use 0.12.6 requires openai==2.16.0, but you have openai 2.24.0
browser-use 0.12.6 requires requests==2.32.5, but you have requests 2.33.0
browser-use 0.12.6 requires rich==14.3.1, but you have rich 14.3.3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes bumped &lt;code&gt;openai&lt;/code&gt; from 2.16.0 to 2.24.0. Another tool I use (&lt;code&gt;browser-use&lt;/code&gt;) pins the older version. Pip resolved it in Hermes's favor. Something will break later. I don't know what yet.&lt;/p&gt;

&lt;p&gt;This is the tax you pay for a rich ecosystem. Every major agent framework is in a dependency arms race.&lt;/p&gt;




&lt;h2&gt;
  
  
  Friction Point 5: The Binary Isn't on PATH
&lt;/h2&gt;

&lt;p&gt;After install, &lt;code&gt;hermes&lt;/code&gt; doesn't work:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;hermes doctor
bash: hermes: &lt;span class="nb"&gt;command &lt;/span&gt;not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The executables land in &lt;code&gt;%APPDATA%\Python\Python314\Scripts&lt;/code&gt;, which isn't on PATH by default. Pip warns you — in small print, at the end of a long install log.&lt;/p&gt;

&lt;p&gt;Fixed with the full path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;/c/Users/DELL/AppData/Roaming/Python/Python314/Scripts/hermes.exe doctor
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  &lt;code&gt;hermes doctor&lt;/code&gt; — Where It Gets Good
&lt;/h2&gt;

&lt;p&gt;Once you find the binary, the experience shifts. &lt;code&gt;hermes doctor&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;◆ Python Environment
  ✓ Python 3.14.4
  ⚠ Not in virtual environment (recommended)

◆ Required Packages
  ✓ OpenAI SDK
  ✓ Rich (terminal UI)
  ✓ python-dotenv

◆ Tool Availability
  ✓ browser
  ✓ terminal
  ✓ file
  ✓ memory
  ⚠ browser-cdp (system dependency not met)
  ⚠ web (missing EXA_API_KEY, TAVILY_API_KEY...)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every warning includes the fix. ✓ or ⚠, then the exact command. You don't go looking for the next step. After an install that gave you nothing, this lands differently.&lt;/p&gt;




&lt;h2&gt;
  
  
  Friction Point 6: &lt;code&gt;hermes model&lt;/code&gt; Is Interactive-Only
&lt;/h2&gt;

&lt;p&gt;I wanted to configure Claude from a script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes model &lt;span class="nt"&gt;--provider&lt;/span&gt; anthropic &lt;span class="nt"&gt;--model&lt;/span&gt; claude-sonnet-4-5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No such flags. &lt;code&gt;hermes model&lt;/code&gt; opens an interactive TUI. There's no way to set the model non-interactively from the CLI.&lt;/p&gt;

&lt;p&gt;I wrote the API key directly to &lt;code&gt;~/.hermes/.env&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;sk-ant-...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then passed the model inline at runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes chat &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"anthropic/claude-sonnet-4-5"&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Friction Point 7: &lt;code&gt;--source&lt;/code&gt; Doesn't Set the Working Directory
&lt;/h2&gt;

&lt;p&gt;I tried:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;hermes chat &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="s2"&gt;"C:/path/to/vectorize-mcp-worker/src"&lt;/span&gt; &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"Summarise..."&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hermes ignored the path and started from my home directory, listing unrelated projects:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I can see:
- bookmark-cli
- dannwaneri
- siteclinic

Which project should I analyze?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--source&lt;/code&gt; doesn't mean "work in this directory." I had to &lt;code&gt;cd&lt;/code&gt; into the repo first. Small thing, confusing when you're scripting.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Actual Session
&lt;/h2&gt;

&lt;p&gt;Running from the repo directory with Claude as the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;vectorize-mcp-worker
hermes chat &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"anthropic/claude-sonnet-4-5"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-q&lt;/span&gt; &lt;span class="s2"&gt;"Summarise the architecture of this codebase in exactly 5 bullet points."&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-Q&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output (verbatim, 47 seconds):&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Cloudflare Workers-based RAG Platform&lt;/strong&gt; — Serverless hybrid RAG system deployed on Cloudflare Workers with Vectorize (384d/1024d vector store), D1 (SQLite for full-text and metadata), Workers AI (embeddings via bge-small/bge-m3/qwen3-0.6b), service bindings to multimodal-pro-worker for image processing, and MCP Durable Objects for stateful agent sessions.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Intelligent V4 Routing Architecture&lt;/strong&gt; — Query router (RouteSelector) uses Llama 3.2 3B intent classifier to dispatch to 6 specialized routes: ENTITY_LOOKUP (SQL), SEMANTIC_SEARCH (vector), KEYWORD_EXACT (BM25), OCR_DOCUMENT (text extraction), VISUAL_ANALYSIS (Llama 4 Scout vision), and GRAPH_REASONING (relationship traversal), with automatic fallback to vector search on route failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid Search with RRF Fusion&lt;/strong&gt; — Dual-path retrieval combining vector search (Vectorize query) and BM25 keyword search (D1 full-text index), merged via Reciprocal Rank Fusion (RRF) with k=60, followed by optional cross-encoder reranking (bge-reranker-base), semantic highlighting, and Cloudflare Cache API for cacheable queries.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Chunking and Multi-Tenancy Pipeline&lt;/strong&gt; — Ingestion engine performs recursive chunking with 15% overlap, parallel batch embedding (up to 100 docs/batch), deduplication via parent_id, and tenant isolation enforced by middleware that injects tenant_id filters into all search/delete operations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Model Context Protocol Server&lt;/strong&gt; — Exposes tools via modern MCP Streamable HTTP transport backed by VectorizeMcpAgent Durable Object at /mcp endpoint, with legacy JSON-RPC endpoints, cost analytics tracking (totalMs/embeddingMs/vectorMs/rerankerMs), rate limiting on mutating endpoints, and reflection system for synthetic training data generation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I read that three times.&lt;/p&gt;

&lt;p&gt;It got every major component right. RRF with k=60 — correct. Six routing modes, named accurately. The Durable Objects backing the MCP server. I've had reviewers describe the search as "just Vectorize" — missing the BM25 path entirely. Hermes got all of it in 47 seconds.&lt;/p&gt;

&lt;p&gt;It missed the Gemma 4 MoE reflection layer specifically, and didn't distinguish between embedding model options by dimension size the way I would. Details inside details. The architecture read is accurate.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Actually Think
&lt;/h2&gt;

&lt;p&gt;Hermes got the architecture right. Not close enough — right. And that's the thing that mattered most to me, because it's the hardest thing to fake. You can summarise README bullet points. You can't summarise a codebase you didn't actually read.&lt;/p&gt;

&lt;p&gt;The install is rough if you're on Windows. Not broken — rough. The PyPI path works, the dependency conflicts are manageable, but "check the bash script source to find the PowerShell installer" is not an onboarding flow.&lt;/p&gt;

&lt;p&gt;The gap I expected — between six months of building this by hand and a first session with a tool someone else built — was smaller than I thought it would be. That means either Hermes is further along than the install suggests, or I built something closer to commodity than I wanted to admit.&lt;/p&gt;

&lt;p&gt;Probably both.&lt;/p&gt;

&lt;p&gt;Multi-step sessions are what I actually want to test next — whether Hermes holds architectural context across a conversation the way I hold it across a codebase. That's the harder question. This was one query.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Honest Take
&lt;/h2&gt;

&lt;p&gt;If you're a Windows developer coming to Hermes cold: budget 30 minutes to fight the install, not 5. Once you're past it, the tool is real.&lt;/p&gt;

&lt;p&gt;If you're a builder who's been assembling RAG pipelines by hand: Hermes won't replace what you know. It uses what you know. The architectural understanding you built making those systems is exactly what makes you dangerous with a tool like this.&lt;/p&gt;

&lt;p&gt;The difference is now I can ask the codebase questions instead of just reading it.&lt;/p&gt;

&lt;p&gt;That's not nothing.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;All commands run on Windows 11, Python 3.14, Git Bash. Hermes Agent v0.13.0. Claude Sonnet 4.5 via Anthropic API.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;vectorize-mcp-worker: hybrid RAG on Cloudflare Workers — &lt;a href="https://github.com/dannwaneri/vectorize-mcp-worker" rel="noopener noreferrer"&gt;github.com/dannwaneri/vectorize-mcp-worker&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>hermesagentchallenge</category>
      <category>devchallenge</category>
      <category>agents</category>
      <category>ai</category>
    </item>
    <item>
      <title>MCP Just Landed on Your Phone: What Google AI Edge Gallery Actually Does</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Wed, 20 May 2026 17:09:24 +0000</pubDate>
      <link>https://dev.to/dannwaneri/mcp-just-landed-on-your-phone-what-google-ai-edge-gallery-actually-does-1567</link>
      <guid>https://dev.to/dannwaneri/mcp-just-landed-on-your-phone-what-google-ai-edge-gallery-actually-does-1567</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;I was already running MCP servers on my desktop — connected to Claude, wired into my daily workflow — when Google announced at I/O 2026 that AI Edge Gallery now supports MCP connections on Android. I pulled out my Pixel and started testing.&lt;/p&gt;

&lt;p&gt;First attempt: "No eligible devices." The app requires capable hardware. Second device — it opened.&lt;/p&gt;

&lt;p&gt;What I found is more interesting than the announcement.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Google AI Edge Gallery Is
&lt;/h2&gt;

&lt;p&gt;Google AI Edge Gallery is an open-source Android app from Google Research. Large language models run entirely on your device — no internet required for inference, no data leaves your phone. Every prompt, every image, every audio clip stays local.&lt;/p&gt;

&lt;p&gt;That part isn't new. What changed at I/O 2026: the app now supports agents. Not a chat interface with a web search button — a proper agent runtime with toggleable skills, calendar integration, scheduled reminders, and experimental MCP connections. The same protocol the rest of the serious agent ecosystem runs on.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Model List Surprised Me
&lt;/h2&gt;

&lt;p&gt;Opening the Models panel, I expected a Gemma showcase. That's not what this is.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Size&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Gemma-4-E2B-it&lt;/td&gt;
&lt;td&gt;2.6 GB&lt;/td&gt;
&lt;td&gt;Recommended across most use cases, 32K context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma-4-E4B-it&lt;/td&gt;
&lt;td&gt;3.7 GB&lt;/td&gt;
&lt;td&gt;Multi-modal, 32K context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma-3n-E2B-it&lt;/td&gt;
&lt;td&gt;3.7 GB&lt;/td&gt;
&lt;td&gt;Text, vision, audio, 4096 context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma-3n-E4B-it&lt;/td&gt;
&lt;td&gt;4.9 GB&lt;/td&gt;
&lt;td&gt;Text, vision, audio, 4096 context&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemma3-1B-IT&lt;/td&gt;
&lt;td&gt;584 MB&lt;/td&gt;
&lt;td&gt;4-bit quantized&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Qwen2.5-1.5B-Instruct&lt;/td&gt;
&lt;td&gt;1.6 GB&lt;/td&gt;
&lt;td&gt;Alibaba's model, LiteRT-LM ready&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek-R1-Distill-Qwen-1.5B&lt;/td&gt;
&lt;td&gt;1.8 GB&lt;/td&gt;
&lt;td&gt;Reasoning model, fully on-device&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TinyGarden-270M&lt;/td&gt;
&lt;td&gt;289 MB&lt;/td&gt;
&lt;td&gt;Fine-tuned FunctionGemma, task automation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MobileActions-270M&lt;/td&gt;
&lt;td&gt;289 MB&lt;/td&gt;
&lt;td&gt;Fine-tuned FunctionGemma, device control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;DeepSeek and Qwen sitting in a Google app isn't accidental. The actual product here is &lt;strong&gt;LiteRT-LM&lt;/strong&gt; — Google's mobile inference runtime — not Gemma. Which model you run on top is your choice.&lt;/p&gt;




&lt;h2&gt;
  
  
  Gemma 4: The Number That Matters
&lt;/h2&gt;

&lt;p&gt;Gemma 4 E2B runs in 2.6 GB and opens a &lt;strong&gt;32K context window&lt;/strong&gt;. Gemma 3n had 4096 tokens. Multi-modal input — text, images, and audio — in one model.&lt;/p&gt;

&lt;p&gt;That context gap is what makes the agent use case real. Tool call outputs, calendar data, conversation history — there's room to feed all of it back to the model without truncating. Running a 32K context window offline on a phone wasn't viable when the best on-device options topped out under 2K.&lt;/p&gt;

&lt;p&gt;All models run through LiteRT-LM — previously TensorFlow Lite, now meaningfully upgraded.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent Skills: What It Actually Looks Like
&lt;/h2&gt;

&lt;p&gt;This is the part I came to test. Here's what's actually inside.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Skills Architecture
&lt;/h3&gt;

&lt;p&gt;Agent Skills runs on 12 skills total — split into two tiers:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Built-in skills&lt;/strong&gt; (Google's own):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;calculate-hash&lt;/code&gt; — hash a given text&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;create-calendar-event&lt;/code&gt; — write to OS calendar&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;interactive-map&lt;/code&gt; — show a map view for a location&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Community skills&lt;/strong&gt; (user-created, same interface):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;read-calendar-events&lt;/code&gt; — read OS calendar for a specific date&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;schedule-notification&lt;/code&gt; — schedule a one-time or repeating daily notification&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;query-wikipedia&lt;/code&gt; — pull a Wikipedia summary on a topic&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;qr-code&lt;/code&gt; — generate a QR code for a URL&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;mood-tracker&lt;/code&gt; — stores daily mood and comments, tracks history&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;send-email&lt;/code&gt; — send an email&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;learn-something-new&lt;/code&gt; — daily learning companion with image card and scheduled notification&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;kitchen-adventure&lt;/code&gt; — dungeon master RPG set in a world of sentient kitchen appliances&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;text-spinner&lt;/code&gt; — "Spin the given text on my head"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two things worth noticing. First: calendar read and calendar write are &lt;strong&gt;separate skills&lt;/strong&gt; with separate toggles. That's a granular permissions model — the agent can write events without having read access, or vice versa. Second: &lt;code&gt;send-email&lt;/code&gt; means an offline on-device model can send emails through a skill. That's not a demo capability.&lt;/p&gt;

&lt;p&gt;Every skill has a "View" button — inspect the full skill definition before enabling it. Each is individually toggleable. The chat bar shows a live count: &lt;strong&gt;Skills 8 | MCP 0&lt;/strong&gt;. Skills and MCP are tracked separately in the same toolbar.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Four ways to add skills:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Featured list — curated community contributions&lt;/li&gt;
&lt;li&gt;Load from URL — any web-hosted skill directory&lt;/li&gt;
&lt;li&gt;Import local skill — from the device directly, no server needed&lt;/li&gt;
&lt;li&gt;GitHub Discussions — browse the full community&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Local import is the developer detail. You can build and test a skill entirely on-device without hosting anything. The iteration loop for custom skill development doesn't require a deployed server.&lt;/p&gt;

&lt;p&gt;Agent Skills only accepts two models — both Gemma 4. Gemma 3, DeepSeek, Qwen: available for AI Chat, locked out of agents. Google drew a capability line and stuck to it.&lt;/p&gt;

&lt;h3&gt;
  
  
  The MCP Setup
&lt;/h3&gt;

&lt;p&gt;Tap the MCP counter in the toolbar → empty state with a single button: &lt;strong&gt;+ Add MCP server&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The dialog asks for:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Your MCP server URL&lt;/li&gt;
&lt;li&gt;Authorization: &lt;strong&gt;None&lt;/strong&gt;, &lt;strong&gt;Request header&lt;/strong&gt;, or &lt;strong&gt;OAuth (WIP)&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;OAuth isn't ready yet. That's the honest limitation for anyone planning to connect enterprise or authenticated MCP servers — you're working with no auth or bearer token only, for now. Public MCP servers work today. Authenticated production servers will have to wait for OAuth to ship.&lt;/p&gt;

&lt;p&gt;Worth noting what this architecture means even without OAuth: tool-selection logic runs on-device, and only the structured API call leaves the phone. For healthcare or legal tooling that can't send raw queries to a server, that's a meaningful trust boundary — not a workaround.&lt;/p&gt;

&lt;p&gt;For developers already running public MCP servers: enter the URL, the app fetches the tool manifest, tool definitions load into the system prompt alongside your active skills. The model handles invocation from there.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Rest of the App
&lt;/h2&gt;

&lt;p&gt;Each use case — AI Chat, Ask Image, Audio Scribe, Prompt Lab — links directly to API documentation and example code from its own screen. This is a teaching environment, not just a demo. Google built it for developers to read, fork, and build on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI Chat&lt;/strong&gt; supports Thinking Mode — Gemma 4's step-by-step reasoning exposed inline. Useful before you wire a model into anything production-facing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Mobile Actions&lt;/strong&gt; runs on MobileActions-270M — 289 MB, fully offline. A 270M parameter model doing device automation. For context, Gemma 4 E2B is roughly 10x that size and handles general reasoning. The argument being made with that design: narrow fine-tunes at sub-300MB can do discrete tasks better than a general model, and they fit anywhere.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tiny Garden&lt;/strong&gt; — 289 MB, natural language gardening game — is the same point made playfully. Watch how function-calling works on-device in a consequence-free environment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Install Google AI Edge Gallery (Play Store)
   Requires capable Android hardware — tested on Pixel

2. Download Gemma-4-E2B-it (2.6 GB)
   Only Gemma 4 models run Agent Skills

3. Open Agent Skills → tap the Skills or MCP button in the toolbar

4. For built-in skills: toggle on what you need
   For MCP: tap MCP → Add MCP server → enter URL + auth

5. Start chatting — the model sees your active skills and connected tools
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source code: &lt;a href="https://github.com/google-ai-edge/gallery" rel="noopener noreferrer"&gt;github.com/google-ai-edge/gallery&lt;/a&gt;. Community skills shareable via GitHub Discussions.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Developers Should Take From This
&lt;/h2&gt;

&lt;p&gt;The Gemma 4-only lock on Agent Skills is the tell. This isn't a checkbox feature. Google shipped agentic tool use where the model can handle it reliably, and locked out weaker models until that changes. That's a better decision than letting anything run and degrading silently.&lt;/p&gt;

&lt;p&gt;The OAuth (WIP) flag on MCP auth is the other honest signal. Public MCP servers work today. Enterprise-grade authenticated connections aren't there yet. That's not a failure — it's a preview of where this is going, with the current edges visible rather than hidden.&lt;/p&gt;

&lt;p&gt;The 270M fine-tuned models are the underrated part of this release. MobileActions and TinyGarden are evidence of a different architecture: specialized micro-models for narrow tasks, general models for reasoning, LiteRT-LM as the runtime connecting them. At 289 MB each, those models fit anywhere.&lt;/p&gt;

&lt;p&gt;MCP being here matters because it's the same protocol across Claude, Cursor, VS Code extensions, and now Google's on-device runtime. Build a tool as an MCP server once and every compatible client picks it up. That's not small.&lt;/p&gt;




&lt;h2&gt;
  
  
  Verdict
&lt;/h2&gt;

&lt;p&gt;AI Edge Gallery is the most developer-forward release from Google I/O 2026. Not a consumer product — a reference implementation of what on-device agents look like when an open protocol, a capable model family, and a mobile inference runtime land in the same place.&lt;/p&gt;

&lt;p&gt;If you're building with MCP today, install the app and point it at your existing server. Your tools already work. That's not a coincidence — it's what a protocol looks like when it actually wins.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Written with assistance from Claude (Anthropic). Hands-on testing on Pixel: model list, Agent Skills interface, MCP setup flow, and skills management observed directly. Gemma-4-E2B-it downloaded; model inference and chat results not included in this article.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>googleiochallenge</category>
      <category>gemma</category>
      <category>mcp</category>
      <category>devchallenge</category>
    </item>
    <item>
      <title>Cloudflare Deprecated My Production Model. The Recommended Upgrade Costs $4/M Tokens. Gemma 4 MoE Doesn't.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 19 May 2026 11:55:01 +0000</pubDate>
      <link>https://dev.to/dannwaneri/cloudflare-deprecated-my-production-model-the-recommended-upgrade-costs-4m-tokens-gemma-4-moe-3hd7</link>
      <guid>https://dev.to/dannwaneri/cloudflare-deprecated-my-production-model-the-recommended-upgrade-costs-4m-tokens-gemma-4-moe-3hd7</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;On May 8, Cloudflare posted a deprecation notice.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;@cf/moonshot/kimi-k2.5&lt;/code&gt; — the model synthesising knowledge across 45,000 of my saved tweets — was going away on May 30.&lt;/p&gt;

&lt;p&gt;I had a live production system, a daily cron, and 100,000+ indexed documents depending on that model. I had 22 days.&lt;/p&gt;

&lt;p&gt;Cloudflare's recommended replacement: &lt;code&gt;@cf/google/gemma-4-26b-a4b-it&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So I migrated  and benchmarked every step. Here's what I found, what broke, and why Gemma 4 MoE was the right call even after a better Kimi arrived.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;bookmark-cli&lt;/strong&gt; is a personal knowledge engine I built after getting frustrated with X's native search. It syncs my bookmarks and likes into local SQLite, then pushes everything into a Cloudflare Worker for semantic retrieval.&lt;/p&gt;

&lt;p&gt;The numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;45,053 tweets (11,835 bookmarks + 33,218 likes)&lt;/li&gt;
&lt;li&gt;7,155 photo tweets enriched by Llama 4 Scout vision descriptions&lt;/li&gt;
&lt;li&gt;100,302 total documents in the vector index&lt;/li&gt;
&lt;li&gt;Daily cron syncing new content automatically&lt;/li&gt;
&lt;li&gt;$5/month total running cost&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture: bookmark-cli calls vectorize-mcp-worker, which runs hybrid BM25 + vector search, cross-encoder reranking, and a knowledge reflection layer that synthesises connections across documents.&lt;/p&gt;

&lt;p&gt;One question worth answering upfront: &lt;em&gt;if the data is from 2023, what good is it?&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This isn't a news feed — it's a thinking tool. When you liked a tweet about RAG failure modes two years ago, you were signalling "this matters to me." The reflection engine connects that to four other things you saved that week across different topics and surfaces the thread you didn't consciously notice. The index only contains what you chose to save. No engagement algorithm, no ads, no recency bias — just your own curation, made searchable and cross-referenced. Google searches the internet. This searches your mind.&lt;/p&gt;

&lt;p&gt;A reflection the engine generated from tweets I saved about AI and work — none of which said this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Non-technical users are increasingly using AI agents to 'vibe-code' large amounts of software without manual code review or verification. This reliance on generated outputs often involves a level of blind trust that bypasses the rigorous research and scrutiny essential to traditional programming. Although this method can appear highly productive, the lack of technical expertise makes debugging these systems exceptionally difficult and prone to subtle, painful failures."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That's three fragments from different weeks, connected by the model into one coherent insight. The technical details are below.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/bC0trhgE8VU"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;p&gt;Live dashboard: &lt;a href="https://vectorize-mcp-worker.fpl-test.workers.dev/dashboard" rel="noopener noreferrer"&gt;vectorize-mcp-worker.fpl-test.workers.dev/dashboard&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;vectorize-mcp-worker&lt;/strong&gt;: &lt;a href="https://github.com/dannwaneri/vectorize-mcp-worker" rel="noopener noreferrer"&gt;github.com/dannwaneri/vectorize-mcp-worker&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;bookmark-cli&lt;/strong&gt;: &lt;a href="https://github.com/dannwaneri/bookmark-cli" rel="noopener noreferrer"&gt;github.com/dannwaneri/bookmark-cli&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Benchmark endpoint: &lt;code&gt;POST /benchmark&lt;/code&gt; with &lt;code&gt;Authorization: Bearer&lt;/code&gt; header&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why Gemma 4 MoE specifically
&lt;/h3&gt;

&lt;p&gt;Three Gemma 4 variants exist on Workers AI. I needed to pick one.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Active params&lt;/th&gt;
&lt;th&gt;Best for&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-e4b-it&lt;/td&gt;
&lt;td&gt;4B total (dense)&lt;/td&gt;
&lt;td&gt;Local / memory-constrained&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-27b-it&lt;/td&gt;
&lt;td&gt;27B dense&lt;/td&gt;
&lt;td&gt;Max quality, more compute&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;gemma-4-26b-a4b-it&lt;/td&gt;
&lt;td&gt;26B total, &lt;strong&gt;4B active&lt;/strong&gt; (MoE)&lt;/td&gt;
&lt;td&gt;Edge inference, reasoning depth&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The reflection layer does multi-document synthesis — it reads 5 related chunks and produces a structured 3-sentence insight. That's not a summarisation task, it's a reasoning task. The 4B dense model would have been too shallow. The 27B dense would have been too slow at the edge.&lt;/p&gt;

&lt;p&gt;4B active parameters per forward pass. 26B total. At the edge, you need the first number. For multi-document synthesis, you need the second. The MoE architecture is the only way to have both.&lt;/p&gt;

&lt;p&gt;The entire pipeline — embed, retrieve, rerank, reflect — runs inside one Cloudflare Worker. Gemma 4 MoE is a native Workers AI binding. No external API call. No data leaving the edge.&lt;/p&gt;

&lt;h3&gt;
  
  
  The migration
&lt;/h3&gt;

&lt;p&gt;The codebase already had a &lt;code&gt;REFLECTION_MODEL&lt;/code&gt; env var. The model registry needed one addition:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;REFLECTION_MODELS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gemma-4&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@cf/google/gemma-4-26b-a4b-it&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Gemma 4 26B MoE (4B active)&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;note&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Recommended. 4B active params via MoE — edge-native, no external hop.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;kimi-k2.5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@cf/moonshotai/kimi-k2.5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Kimi K2.5&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;note&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Deprecated May 30 2026.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;wrangler secret put REFLECTION_MODEL
&lt;span class="c"&gt;# enter: gemma-4&lt;/span&gt;

wrangler deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That was the migration. The reflection engine reads &lt;code&gt;env.REFLECTION_MODEL&lt;/code&gt; dynamically. Nothing else changed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Three gotchas worth knowing
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;1. max_tokens.&lt;/strong&gt; Gemma 4 is a thinking model. It writes a full reasoning chain before producing output. With &lt;code&gt;max_tokens: 180&lt;/code&gt; set for the old model, Gemma 4 was spending all its tokens on internal reasoning and returning empty content. Bumping to &lt;code&gt;max_tokens: 2048&lt;/code&gt; fixed it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Response extraction.&lt;/strong&gt; For thinking models, use &lt;code&gt;choices[0].message.content&lt;/code&gt; — not &lt;code&gt;.reasoning&lt;/code&gt; and not &lt;code&gt;.response&lt;/code&gt;. The reasoning field is the internal chain of thought, not the answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Prompt format.&lt;/strong&gt; Verbose rule-lists trigger Gemma 4's constraint-analysis behaviour — it restates your rules as bullet points instead of following them. Keep prompts simple and end with a direct action cue:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Read the new source and related sources below, then write 3 plain prose sentences
that synthesise them into a knowledge base entry. No bullets. No analysis. No preamble.
Just 3 sentences.

New: "..."
Related: ...

Write the 3-sentence synthesis now:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The benchmark
&lt;/h3&gt;

&lt;p&gt;I built a &lt;code&gt;/benchmark&lt;/code&gt; endpoint that runs both models in parallel against the same query, logs latency and response to D1, and returns side-by-side results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;POST&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;/benchmark&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"What are the common failure modes of RAG systems?"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Results from D1 (9 real queries):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;Gemma 4 MoE&lt;/th&gt;
&lt;th&gt;Kimi K2.5&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;RAG failure modes&lt;/td&gt;
&lt;td&gt;12.9s&lt;/td&gt;
&lt;td&gt;12.4s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Embedding model selection&lt;/td&gt;
&lt;td&gt;9.9s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;90.7s ⚠️&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;BM25 vs vector search&lt;/td&gt;
&lt;td&gt;19.6s&lt;/td&gt;
&lt;td&gt;7.3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reducing hallucination&lt;/td&gt;
&lt;td&gt;19.0s&lt;/td&gt;
&lt;td&gt;6.9s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Chunking strategies&lt;/td&gt;
&lt;td&gt;9.3s&lt;/td&gt;
&lt;td&gt;9.0s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Edge AI model selection&lt;/td&gt;
&lt;td&gt;11.8s&lt;/td&gt;
&lt;td&gt;8.3s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;MoE efficiency at scale&lt;/td&gt;
&lt;td&gt;16.5s&lt;/td&gt;
&lt;td&gt;8.0s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloudflare Workers AI&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22.8s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;FAILED&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;KB maintenance&lt;/td&gt;
&lt;td&gt;10.1s&lt;/td&gt;
&lt;td&gt;5.5s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Kimi K2.5 was faster on 7 of 9 queries. But it produced a 90-second response on one query and failed outright on another — within a single benchmark run. A model that's faster on average but unreliable under load isn't a production model.&lt;/p&gt;

&lt;p&gt;Gemma 4 MoE was consistent. Every query returned. Every response was coherent. Latency was predictable.&lt;/p&gt;

&lt;p&gt;Beyond the latency numbers, the Kimi K2.5 reflections in the index all started with &lt;code&gt;"Here are the 3 sentences:"&lt;/code&gt; — the model was leaking the instruction prefix into every stored reflection. Gemma 4 produces clean prose output with the right prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's live now
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;GET /stats → models.reflection: "@cf/google/gemma-4-26b-a4b-it"
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The live dashboard is at &lt;a href="https://vectorize-mcp-worker.fpl-test.workers.dev/dashboard" rel="noopener noreferrer"&gt;vectorize-mcp-worker.fpl-test.workers.dev/dashboard&lt;/a&gt; — open it and the active reflection model is listed in the stats panel. Gemma 4 MoE, running in production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1,525 reflections generated since the migration.&lt;/strong&gt; The cron added more this morning. Verify live: &lt;a href="https://vectorize-mcp-worker.fpl-test.workers.dev/public-stats" rel="noopener noreferrer"&gt;&lt;code&gt;/public-stats&lt;/code&gt;&lt;/a&gt; — no API key needed.&lt;/p&gt;

&lt;p&gt;A second reflection, this one on AI and management:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"AI is increasing individual contributor leverage and is frequently marketed as a labor replacement, driving companies to prioritize cost-cutting and individual productivity. This trend often places pressure on managers to perform individual contributor roles, potentially devaluing the necessity of human oversight and organizational management. Relying on these technologies also introduces risks involving accountability for failures, misunderstandings of AI's true capabilities, and the loss of human-centric benefits like upskilling."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This came from unrelated tweets saved across different weeks, connected by the engine into a single coherent insight, stored back into the index so it surfaces when I search anything adjacent to AI, management, or developer tooling. That's the reflection layer working as intended.&lt;/p&gt;

&lt;p&gt;Full pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;bookmark-cli → vectorize-mcp-worker
  embed &lt;span class="o"&gt;(&lt;/span&gt;BGE Small&lt;span class="o"&gt;)&lt;/span&gt; →
  retrieve &lt;span class="o"&gt;(&lt;/span&gt;Vectorize + BM25&lt;span class="o"&gt;)&lt;/span&gt; →
  rerank &lt;span class="o"&gt;(&lt;/span&gt;BGE cross-encoder&lt;span class="o"&gt;)&lt;/span&gt; →
  reflect &lt;span class="o"&gt;(&lt;/span&gt;Gemma 4 MoE&lt;span class="o"&gt;)&lt;/span&gt; ← NEW
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything stays inside one Cloudflare Worker. No external hop for the reasoning layer.&lt;/p&gt;

&lt;p&gt;Gemma 4 MoE isn't here because of a challenge. It's here because Cloudflare deprecated the model it replaced and this was the right call. It will still be running after June 4.&lt;/p&gt;

&lt;h3&gt;
  
  
  The verdict
&lt;/h3&gt;

&lt;p&gt;Gemma 4 MoE is not faster than Kimi K2.5 was on average. If raw speed were the only metric, and if Kimi K2.5 were staying around, I'd have a harder decision.&lt;/p&gt;

&lt;p&gt;But it isn't staying around.&lt;/p&gt;

&lt;p&gt;Cloudflare has since released Kimi K2.6 — 1T parameters, 262k context window, reasoning, vision, tool calling. It's impressive. It's also $0.95/M input tokens and $4.00/M output tokens. The reflection layer synthesises on every ingest. At that pricing, running it across a 100k-document backlog would end the $5/month cost story in a single batch. Gemma 4 MoE, as a native Workers AI model, stays within the free tier. The upgrade path wasn't really an upgrade for this use case.&lt;/p&gt;

&lt;p&gt;And for a reflection layer specifically — where the task is multi-document synthesis, where you need reasoning depth more than raw throughput, and where you want the entire pipeline to stay edge-native — Gemma 4 MoE is the right model. The MoE architecture is why. 4B active parameters gives you the inference speed you need at the edge. 26B total parameters gives you the knowledge depth the task requires.&lt;/p&gt;

&lt;p&gt;At $4/M output tokens, the upgrade wasn't an upgrade. Gemma 4 MoE still is. The daily cron doesn't know it's in a challenge. It ran this morning.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next for Gemma 4 MoE in this pipeline
&lt;/h2&gt;

&lt;p&gt;The reflection layer is one use. The code already has a second.&lt;/p&gt;

&lt;p&gt;Every 3 new ingests, the pipeline runs a consolidation pass — Gemma 4 MoE reads the 10 most recent reflections and merges them into a single &lt;code&gt;doc_type='summary'&lt;/code&gt;: dominant theme, two or three specific non-obvious facts, and the most persistent open question across all the reflections. The summary lands in Vectorize and surfaces in search exactly like a reflection does. Reflections capture individual connections. Summaries capture patterns across connections. Both are Gemma 4 MoE, both are edge-native, both add to the index without touching the $5/month cost ceiling.&lt;/p&gt;

&lt;p&gt;That's the current state. Three extensions are already scoped:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Query-time answer synthesis.&lt;/strong&gt; Right now the pipeline retrieves chunks and returns them. The next layer uses Gemma 4 MoE to read the top 5 retrieved chunks and produce a direct answer — not a list of results, an actual response grounded in what you saved. The retrieval already works. The synthesis step is the same task the reflection layer already does, with a different prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Routing upgrade.&lt;/strong&gt; The V4 intelligent router currently runs on Llama 3.2 3B — fast classification into six query routes (SQL, BM25, vector, graph, etc.). Moving that to Gemma 4 MoE's thinking mode means the router can reason about ambiguous queries instead of classifying them. A question like "what did I save about RAG that I disagreed with?" hits multiple routes simultaneously. A 3B classifier guesses. A 26B MoE reasons.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gap detection.&lt;/strong&gt; The reflection engine already identifies gaps — questions the combined knowledge doesn't answer. A weekly pass that reads all gap annotations across the index and surfaces the three most persistent unanswered questions would make the tool actively useful for research, not just reactive to search queries. One scheduled cron, one Gemma 4 MoE call per week, zero additional cost in the free tier.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Personal preference reranker.&lt;/strong&gt; The index contains 100k+ documents across AI, politics, sports, and everything else saved since 2016. Every bookmark and like is a signal: this person found this worth keeping. The longer-term path is fine-tuning a small cross-encoder on that signal — not domain expertise, but preference prediction. A model trained on "did this person save this or not" beats every general reranker at one narrow task: knowing what you care about. It slots into the existing pipeline as a final reranking layer after BGE, before the reflection pass. The training data is a decade of curation. The narrow task is yours alone.&lt;/p&gt;

&lt;p&gt;The reflection layer was the migration. These four are the reason it stays.&lt;/p&gt;

</description>
      <category>gemmachallenge</category>
      <category>gemma</category>
      <category>devchallenge</category>
    </item>
    <item>
      <title>I Ran SERP Feature Detection on 8 Nigerian Creator Queries. Every Single One Had an AI Overview.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Thu, 14 May 2026 14:40:49 +0000</pubDate>
      <link>https://dev.to/dannwaneri/i-ran-serp-feature-detection-on-8-nigerian-creator-queries-every-single-one-had-an-ai-overview-51ko</link>
      <guid>https://dev.to/dannwaneri/i-ran-serp-feature-detection-on-8-nigerian-creator-queries-every-single-one-had-an-ai-overview-51ko</guid>
      <description>&lt;p&gt;I built a SERP feature detection module for my SEO agent. Then I ran it on the queries I'm targeting for a site about how Nigerian creators get paid online.&lt;/p&gt;

&lt;p&gt;The results were more uniform than I expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  What the Module Does
&lt;/h2&gt;

&lt;p&gt;The module calls SerpApi for each target query and checks the structured JSON response for seven SERP features:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI Overview&lt;/li&gt;
&lt;li&gt;Featured snippet&lt;/li&gt;
&lt;li&gt;People Also Ask (PAA)&lt;/li&gt;
&lt;li&gt;Image pack&lt;/li&gt;
&lt;li&gt;Video results&lt;/li&gt;
&lt;li&gt;Local pack&lt;/li&gt;
&lt;li&gt;Knowledge panel&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each query comes back with a feature matrix and a list of content opportunities. No browser, no CAPTCHA, no bot detection — SerpApi handles the residential proxy infrastructure on their end.&lt;/p&gt;

&lt;p&gt;The free tier is 100 searches/month. I have 8 target queries. That's comfortable.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Results
&lt;/h2&gt;

&lt;p&gt;I ran it on these queries for naija-vpn.com:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;does twitch pay nigerians
how to receive money from twitch in nigeria
cleva vs geegpay nigeria
payoneer nigeria freelancers
how nigerians get paid on youtube
how to receive fiverr payment in nigeria
does tiktok pay nigerians
best dollar account for nigerian freelancers
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Query&lt;/th&gt;
&lt;th&gt;AI Overview&lt;/th&gt;
&lt;th&gt;Feat. Snippet&lt;/th&gt;
&lt;th&gt;PAA&lt;/th&gt;
&lt;th&gt;Images&lt;/th&gt;
&lt;th&gt;Video&lt;/th&gt;
&lt;th&gt;Local&lt;/th&gt;
&lt;th&gt;KP&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;does twitch pay nigerians&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;how to receive money from twitch in nigeria&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cleva vs geegpay nigeria&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;payoneer nigeria freelancers&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;how nigerians get paid on youtube&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;how to receive fiverr payment in nigeria&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;does tiktok pay nigerians&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;best dollar account for nigerian freelancers&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;✓&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;8 out of 8 queries: AI Overview ✓, PAA ✓, Video ✓.&lt;/p&gt;

&lt;p&gt;No featured snippets. No local pack. No knowledge panels. Clean organic SERPs with three consistent rich features sitting above the fold.&lt;/p&gt;




&lt;h2&gt;
  
  
  What This Actually Means
&lt;/h2&gt;

&lt;h3&gt;
  
  
  AI Overview on every query
&lt;/h3&gt;

&lt;p&gt;Google is summarising all of these topics directly in the search results. The old model — rank #1, get the click — has a layer on top of it now. The AI Overview reads the top sources and generates a summary. If your content is the clearest, most direct answer, there's a chance you get cited inside the summary. If it's buried in context, you don't.&lt;/p&gt;

&lt;p&gt;The implication for writing: the first paragraph of every article in this niche now needs to be a complete, direct answer. Not "In this article, we'll explore..." — an actual answer. 40–60 words, no preamble.&lt;/p&gt;

&lt;h3&gt;
  
  
  PAA on every query
&lt;/h3&gt;

&lt;p&gt;People Also Ask boxes are Google surfacing the secondary questions that the same user is likely to have. They're also secondary ranking opportunities — each box is its own small search result.&lt;/p&gt;

&lt;p&gt;The catch: to rank in a PAA box, your heading needs to match the question phrasing closely. "Common Questions" sections with paraphrased questions miss the slot. The exact wording matters.&lt;/p&gt;

&lt;p&gt;I ran &lt;code&gt;related_questions&lt;/code&gt; from the SerpApi response for my two priority queries:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"does twitch pay nigerians":&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is Nigeria eligible for Twitch monetization?&lt;/li&gt;
&lt;li&gt;How much money for 1000 views on Twitch?&lt;/li&gt;
&lt;li&gt;Does streaming pay in Nigeria?&lt;/li&gt;
&lt;li&gt;How much is 20k gifted subs on Twitch in Nigeria?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;"how to receive money from twitch in nigeria":&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Is Nigeria eligible for Twitch monetization?&lt;/li&gt;
&lt;li&gt;How to make money from Twitch in Nigeria?&lt;/li&gt;
&lt;li&gt;How much is 20k gifted subs on Twitch in Nigeria?&lt;/li&gt;
&lt;li&gt;How many viewers on Twitch to make $1000 a month?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These became the &lt;code&gt;&amp;lt;h3&amp;gt;&lt;/code&gt; headings in my FAQ section — verbatim, with full answers under each one.&lt;/p&gt;

&lt;h3&gt;
  
  
  Video on every query
&lt;/h3&gt;

&lt;p&gt;YouTube videos are ranking for all 8 queries. I don't have videos. That's a gap I'm noting but not chasing right now — text content first, get indexed and cited, video is a later play.&lt;/p&gt;

&lt;h3&gt;
  
  
  No featured snippets
&lt;/h3&gt;

&lt;p&gt;The AI Overview is eating what would otherwise be featured snippets. This isn't surprising — Google uses featured snippets to feed AI Overviews. If you're getting cited in the AI Overview, the featured snippet slot effectively doesn't matter.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Changed on the Pages
&lt;/h2&gt;

&lt;p&gt;I updated three pages (Twitch, TikTok, YouTube) with the same two interventions:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Opening paragraph rewrite&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Before:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Quick Answer:&lt;/strong&gt; Nigerian streamers receive Twitch payments by using virtual US dollar accounts...&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Yes, Twitch pays Nigerians. Nigerian streamers can earn from subscriptions, bits, and ads through the Twitch Affiliate and Partner programs. To receive payments, you need a virtual dollar account — Cleva or Geegpay — which gives you real US bank details for Twitch's ACH transfers. Setup takes under 10 minutes with your NIN and BVN.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference: the second version opens with "Yes" (direct answer to the query), names the mechanism (virtual dollar account), names the specific products (Cleva, Geegpay), and gives the time estimate. All within 55 words. That's extractable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. FAQ section with exact PAA wording&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Replaced generic Q&amp;amp;A sections with &lt;code&gt;&amp;lt;h3&amp;gt;&lt;/code&gt; headings matching the SerpApi PAA questions word for word. Full paragraph answers under each.&lt;/p&gt;

&lt;p&gt;I also tested adding &lt;code&gt;@type: FAQPage&lt;/code&gt; structured data. I removed it before publishing — Google deprecated FAQ rich results in August 2023 and pulled it from their documentation entirely in September 2024. The schema does nothing now. The &lt;code&gt;&amp;lt;h3&amp;gt;&lt;/code&gt; headings with exact PAA wording are what actually matter for ranking in PAA boxes, not the schema.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Tool That Did This
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;serp-features&lt;/code&gt; is a module in my open-source SEO agent. You give it a list of queries, it returns a feature matrix and opportunity notes. It's about 260 lines of Python.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install&lt;/span&gt;
pip &lt;span class="nb"&gt;install &lt;/span&gt;httpx

&lt;span class="c"&gt;# Set your SerpApi key (free tier, no credit card)&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SERPAPI_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key-here"&lt;/span&gt;

&lt;span class="c"&gt;# Run on a single query&lt;/span&gt;
python main.py serp-features &lt;span class="nt"&gt;--query&lt;/span&gt; &lt;span class="s2"&gt;"does twitch pay nigerians"&lt;/span&gt; &lt;span class="nt"&gt;--project&lt;/span&gt; naija-payments

&lt;span class="c"&gt;# Run on a file of queries&lt;/span&gt;
python main.py serp-features &lt;span class="nt"&gt;--queries&lt;/span&gt; queries.txt &lt;span class="nt"&gt;--project&lt;/span&gt; naija-payments
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is a markdown file with the feature matrix and per-query opportunity notes.&lt;/p&gt;

&lt;p&gt;GitHub: &lt;a href="https://github.com/dannwaneri/seo-agent" rel="noopener noreferrer"&gt;github.com/dannwaneri/seo-agent&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What I Learned Building It
&lt;/h2&gt;

&lt;p&gt;The first version used Playwright to visit the real Google SERP. It worked for 1–3 queries before Google's bot detection kicked in. I tried user agents, delays, &lt;code&gt;networkidle&lt;/code&gt; waits — none of it made a meaningful difference. Google's detection isn't based on request timing; it's based on IP reputation at scale.&lt;/p&gt;

&lt;p&gt;The solution was to stop trying to scrape Google and use an API that already solved the infrastructure problem. SerpApi uses residential proxy networks — the same approach DataForSEO uses, just accessible to individuals on a free tier.&lt;/p&gt;

&lt;p&gt;The rewrite was cleaner than the original. No Playwright dependency, no browser window opening, no CAPTCHA prompts. One &lt;code&gt;httpx.get()&lt;/code&gt; call per query, structured JSON in, feature flags out.&lt;/p&gt;

&lt;p&gt;Sometimes the right answer is to not fight the infrastructure problem yourself.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is part of an ongoing series on building an open-source SEO co-pilot. The full agent handles core audits, GSC analysis, backlink scoring, internal link mapping, SERP feature detection, and LLM visibility checking — all local, all free or near-free to run.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>seo</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>OpenSEO Has 1.7k GitHub Stars. I Built the Same Thing for $0.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Thu, 14 May 2026 14:37:44 +0000</pubDate>
      <link>https://dev.to/dannwaneri/openseo-has-17k-github-stars-i-built-the-same-thing-for-0-1dip</link>
      <guid>https://dev.to/dannwaneri/openseo-has-17k-github-stars-i-built-the-same-thing-for-0-1dip</guid>
      <description>&lt;p&gt;I saw OpenSEO trending and did what every developer does.&lt;/p&gt;

&lt;p&gt;I starred it before reading the pricing.&lt;/p&gt;

&lt;p&gt;Then I read the pricing.&lt;/p&gt;




&lt;h2&gt;
  
  
  The appeal is real
&lt;/h2&gt;

&lt;p&gt;The pitch is clean: open source, self-hostable, pay-as-you-go. No Semrush subscription. No bloat. Fork it and add your own features. For developers tired of paying $200/month for tools that do 10x more than they need, it lands perfectly.&lt;/p&gt;

&lt;p&gt;1.7k stars. 196 forks. Active releases. The community is real.&lt;/p&gt;

&lt;p&gt;I get it. I would have starred it too.&lt;/p&gt;




&lt;h2&gt;
  
  
  Then I opened the pricing section
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"OpenSEO itself remains free. It works by using DataForSEO's APIs, which is a paid third-party service."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So it's free the same way a printer is free.&lt;/p&gt;

&lt;p&gt;Here's what DataForSEO actually costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Minimum top-up: &lt;strong&gt;$50&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Backlinks API: &lt;strong&gt;$100/month commitment&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;100 keyword research requests: &lt;strong&gt;$3.50–$7.00&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;100 domain overviews: &lt;strong&gt;$4.01&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Rank tracking at scale: climbs fast depending on keywords and devices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The stars came from developers who love the idea. The cost reality hits after setup.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I built instead
&lt;/h2&gt;

&lt;p&gt;Part of what I do is build &lt;a href="https://dannwaneri.com/seo-automation/" rel="noopener noreferrer"&gt;real-browser SEO automation tools&lt;/a&gt; — agents that visit pages the way Google does, not the way a scraper does.&lt;/p&gt;

&lt;p&gt;My SEO agent does a full site audit — titles, meta descriptions, H1s, canonical tags, broken links, GSC quick wins, internal link clusters — in a real Chromium browser.&lt;/p&gt;

&lt;p&gt;Not an API. An actual browser visiting each page.&lt;/p&gt;

&lt;p&gt;Here's what it costs to run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Browser visits:&lt;/strong&gt; $0. Playwright is free.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GSC data:&lt;/strong&gt; $0. Google already collected it for you.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude API calls:&lt;/strong&gt; fractions of a cent per page on Haiku.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Total for a full audit:&lt;/strong&gt; under $0.01 for most sites.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I wrote about the exact cost breakdown &lt;a href="https://dev.to/dannwaneri/i-was-paying-0006-per-url-for-seo-audits-until-i-realized-most-needed-0-132j"&gt;here&lt;/a&gt;. The short version: I was paying $0.006 per URL until I realized most URLs needed $0.&lt;/p&gt;




&lt;h2&gt;
  
  
  The technical difference that matters
&lt;/h2&gt;

&lt;p&gt;OpenSEO pulls data from DataForSEO's index. That index is updated periodically. It tells you what DataForSEO's crawlers saw, when they saw it.&lt;/p&gt;

&lt;p&gt;My agent visits the page right now, in a real browser, and extracts what's actually there — rendered JavaScript, actual title tags, real canonical values, live broken links.&lt;/p&gt;

&lt;p&gt;If a page has a client-side rendering issue that hides the H1 from crawlers, a scraper-based tool misses it. A real browser catches it.&lt;/p&gt;

&lt;p&gt;This is the same principle behind the &lt;a href="https://dannwaneri.com/cloudflare-automation/" rel="noopener noreferrer"&gt;Cloudflare-based automations&lt;/a&gt; I build for clients — edge-deployed, real output, not cached assumptions.&lt;/p&gt;

&lt;p&gt;That's not a criticism of OpenSEO. It's a different architectural choice with a real tradeoff.&lt;/p&gt;




&lt;h2&gt;
  
  
  What OpenSEO has that I don't
&lt;/h2&gt;

&lt;p&gt;I'll be honest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rank tracking over time — I don't have this&lt;/li&gt;
&lt;li&gt;Keyword research at scale — not built&lt;/li&gt;
&lt;li&gt;Backlink analysis — not in my agent&lt;/li&gt;
&lt;li&gt;A polished UI — mine outputs JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you need those features and you're comfortable with the DataForSEO cost model, OpenSEO is a reasonable choice.&lt;/p&gt;

&lt;p&gt;But if your core need is: &lt;em&gt;does this page have what Google needs to rank it&lt;/em&gt; — a real browser costs less and sees more.&lt;/p&gt;




&lt;h2&gt;
  
  
  One more thing
&lt;/h2&gt;

&lt;p&gt;OpenSEO's contributor list includes &lt;strong&gt;&lt;a class="mentioned-user" href="https://dev.to/claude"&gt;@claude&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;So does mine.&lt;/p&gt;

&lt;p&gt;We're both &lt;a href="https://dannwaneri.com/ai-agents/" rel="noopener noreferrer"&gt;building production tools with the Claude API&lt;/a&gt;. The difference is what you're optimizing for — features, or cost per insight.&lt;/p&gt;

&lt;p&gt;I chose cost per insight. My sites are proof it works.&lt;/p&gt;




&lt;h2&gt;
  
  
  The agent is open source
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/dannwaneri/seo-agent" rel="noopener noreferrer"&gt;github.com/dannwaneri/seo-agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Run it on your own site. It's resumable, so if it crashes at URL 47 it picks up at URL 48. No DataForSEO account needed.&lt;/p&gt;

&lt;p&gt;If you want a version that runs on &lt;a href="https://dannwaneri.com/cloudflare-automation/" rel="noopener noreferrer"&gt;Cloudflare Workers at the edge&lt;/a&gt;, that's something I build for clients too.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I build AI agents and SEO automation tools at &lt;a href="https://dannwaneri.com" rel="noopener noreferrer"&gt;dannwaneri.com&lt;/a&gt;. Everything I ship, I've run on my own domains first.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>seo</category>
      <category>opensource</category>
      <category>python</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Language Wars Are Over. The Ground Shifted Without You.</title>
      <dc:creator>Daniel Nwaneri</dc:creator>
      <pubDate>Tue, 12 May 2026 14:12:05 +0000</pubDate>
      <link>https://dev.to/dannwaneri/the-language-wars-are-over-the-ground-shifted-without-you-49pb</link>
      <guid>https://dev.to/dannwaneri/the-language-wars-are-over-the-ground-shifted-without-you-49pb</guid>
      <description>&lt;p&gt;I stopped caring which language someone uses. Somewhere in the last eighteen months, that happened without me deciding it.&lt;/p&gt;

&lt;p&gt;Not because I became a better person. Because the argument stopped mattering.&lt;/p&gt;

&lt;p&gt;In August 2025, TypeScript surpassed both Python and JavaScript as the most-used language on GitHub for the first time ever. Not because developers sat down and decided TypeScript won. Because AI tools handle it better, so it spread. The debate didn't resolve. The ground shifted underneath it and most people are still fighting on the old map.&lt;/p&gt;




&lt;h2&gt;
  
  
  The War That Already Ended
&lt;/h2&gt;

&lt;p&gt;The Python vs JavaScript argument ran for a decade. Rust evangelism became a personality type. C++ veterans looked down on everyone. The fight was never really about syntax — it was about belonging. Who gets to call themselves a real developer. Who gets filtered out at the interview. Who gets taken seriously in the architecture meeting.&lt;/p&gt;

&lt;p&gt;That argument is over.&lt;/p&gt;

&lt;p&gt;Not because anyone won. Because something else became the constraint.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Replaced It
&lt;/h2&gt;

&lt;p&gt;The new constraints aren't linguistic. Tokens — how much context a session can hold before the model starts forgetting what it's building. Context windows — how much of your codebase an agent can actually see at once. Prompt discipline — whether your instructions are tight enough that the agent doesn't guess. Three things. None of them are in any job description yet.&lt;/p&gt;

&lt;p&gt;Nobody voted on this shift. There was no announcement. It just became true while we were arguing about whether Rust was worth learning.&lt;/p&gt;

&lt;p&gt;The developer who ships consistently now isn't the one who knows the most syntax. It's the one who can structure a spec tightly enough that the agent doesn't hallucinate the requirements, manage a context window without losing architectural coherence across sessions, and catch what the model got confidently wrong before it reaches production.&lt;br&gt;
 I’ve been experimenting heavily with this in my own &lt;a href="https://dannwaneri.com/ai-agents/" rel="noopener noreferrer"&gt;production AI agents&lt;/a&gt; and real-browser automation workflows.&lt;/p&gt;

&lt;p&gt;That's a different skill. No bootcamp teaches it yet. Most job descriptions don't list it.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Gate Didn't Disappear. It Moved.
&lt;/h2&gt;

&lt;p&gt;Language gatekeeping excluded people by syntax preference. You didn't know pointers? Not a real programmer. You used PHP? Embarrassing. You learned with a framework instead of from scratch? Shortcuts.&lt;/p&gt;

&lt;p&gt;The new gatekeeping is quieter. You're not excluded for your language anymore.&lt;/p&gt;

&lt;p&gt;You're excluded for your context budget.&lt;/p&gt;

&lt;p&gt;Token limits are a billing problem dressed as a technical one. But knowing how to structure prompts, manage agent memory, and stay coherent across a long multi-step workflow — these compound. The developer who can do this produces dramatically better output than the one who can't. The gap is real and it grows with complexity.&lt;/p&gt;

&lt;p&gt;Same exclusion mechanism. Different surface. Less visible, which makes it harder to name and harder to argue against.&lt;/p&gt;

&lt;p&gt;The old gatekeeping was at least honest about what it was filtering for. The new one looks like a productivity difference.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Doesn't Change
&lt;/h2&gt;

&lt;p&gt;Not everything shifted.&lt;/p&gt;

&lt;p&gt;That person is still you.&lt;/p&gt;

&lt;p&gt;The things that actually matter — judgment, accountability, knowing when the confident answer is wrong — those don't change with the terrain. They get &lt;em&gt;more&lt;/em&gt; important as generation gets cheaper.&lt;/p&gt;

&lt;p&gt;Uncle Bob Martin, who spent months coding with Claude and wrote about it publicly, noticed something: Claude codes faster, holds more details, but can't hold the big picture. It doesn't foresee the disaster it's creating. Someone still has to see that. Someone still has to slow down and ask whether this is right, not just whether it compiles.&lt;/p&gt;

&lt;p&gt;But the marker of competence shifted. The proxy changed. The new proxy is harder to fake than the old one.&lt;/p&gt;

&lt;p&gt;You can memorize syntax. You can pass a whiteboard interview on language trivia. You can't fake knowing how to structure a ten-step agent workflow without the context collapsing at step seven, or how to write a spec that gives an agent something real to work with instead of something it'll interpret five different ways.&lt;br&gt;
This is exactly why I built my own &lt;a href="https://dannwaneri.com/seo-automation/" rel="noopener noreferrer"&gt;SEO automation agent&lt;/a&gt; that runs unsupervised on Cloudflare.&lt;/p&gt;

&lt;p&gt;The old gate was about what you'd memorized. The new one is about how you think.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part That's Still Unresolved
&lt;/h2&gt;

&lt;p&gt;I don't know if the new gate is better than the old one.&lt;/p&gt;

&lt;p&gt;The old gatekeeping protected a social hierarchy more than it protected code quality. CS degrees, whiteboard interviews, years-of-experience requirements — they controlled access. They decided who got to call themselves real engineers. That architecture was never really about quality.&lt;/p&gt;

&lt;p&gt;The new constraints are at least about something real. Context discipline, prompt structure, verification habits — these produce actual output differences. The filter is less arbitrary.&lt;/p&gt;

&lt;p&gt;But "less arbitrary" isn't the same as "fair." Token budgets cost money. The developer in Lagos with a $20 API limit and the developer in San Francisco with a $200 plan are not operating in the same environment. The new constraint is technical and financial simultaneously. That's not a coincidence — it's just the old hierarchy in different clothes.&lt;/p&gt;

&lt;p&gt;We spent years arguing about languages. Now the argument is how well you can give instructions.&lt;/p&gt;

&lt;p&gt;That's not obviously worse. It's just different.&lt;/p&gt;

&lt;p&gt;And we haven't decided yet whether the new gate is better than the old one, or just less visible.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>discuss</category>
      <category>programming</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
