<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Umesh Malik</title>
    <description>The latest articles on DEV Community by Umesh Malik (@umesh_malik).</description>
    <link>https://dev.to/umesh_malik</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3777486%2F9bb4f37b-acd0-4752-9675-5e1cf9dd0b78.jpg</url>
      <title>DEV Community: Umesh Malik</title>
      <link>https://dev.to/umesh_malik</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/umesh_malik"/>
    <language>en</language>
    <item>
      <title>Agentic Browsing in PageSpeed Insights: How to Make Your Website AI-Ready (2026)</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 19 Jun 2026 10:00:11 +0000</pubDate>
      <link>https://dev.to/umesh_malik/agentic-browsing-in-pagespeed-insights-how-to-make-your-website-ai-ready-2026-n1e</link>
      <guid>https://dev.to/umesh_malik/agentic-browsing-in-pagespeed-insights-how-to-make-your-website-ai-ready-2026-n1e</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PageSpeed Insights now grades your site for AI agents, not just humans.&lt;/strong&gt; Lighthouse 13.3 (May 7, 2026) added an &lt;strong&gt;Agentic Browsing&lt;/strong&gt; category to the default config; PageSpeed Insights inherited it within two weeks. It sits next to Performance, Accessibility, Best Practices and SEO — and reports a ratio like &lt;code&gt;3/3&lt;/code&gt;, not a score out of 100.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It checks three things by default:&lt;/strong&gt; a clean &lt;strong&gt;accessibility tree&lt;/strong&gt;, a &lt;strong&gt;stable layout&lt;/strong&gt; (low CLS), and a valid &lt;strong&gt;&lt;code&gt;llms.txt&lt;/code&gt;&lt;/strong&gt; at your domain root. The wider Lighthouse category also audits &lt;strong&gt;WebMCP&lt;/strong&gt; — annotated forms and registered agent tools.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Google added it because agents are now real traffic.&lt;/strong&gt; Operator, Computer Use, Project Mariner, Perplexity and ChatGPT's browse mode visit sites on a person's behalf. A growing share of requests hitting your server are software, not people.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You don't fail for lacking AI features&lt;/strong&gt; — the category is informational. But "AI-ready" is now a measurable, public number, and it's about to become a competitive one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proof it's achievable:&lt;/strong&gt; a real Lighthouse 13.4 run on my own site, &lt;code&gt;umesh-malik.com&lt;/code&gt;, scores &lt;strong&gt;Agentic Browsing 3/3&lt;/strong&gt; and &lt;strong&gt;100 on Accessibility, Best Practices and SEO&lt;/strong&gt; — and it passes &lt;strong&gt;7/7 on the &lt;a href="https://isitagentready.com/" rel="noopener noreferrer"&gt;isitagentready.com&lt;/a&gt; protocol-discovery checklist&lt;/strong&gt; — using nothing but static files and one Cloudflare Worker. Here's exactly how, so you can copy it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Your Website Is Now Being Graded for Robots
&lt;/h2&gt;

&lt;p&gt;Open &lt;a href="https://pagespeed.web.dev/" rel="noopener noreferrer"&gt;pagespeed.web.dev&lt;/a&gt;, run a report, and you'll see something that wasn't there a month ago: a category called &lt;strong&gt;Agentic Browsing&lt;/strong&gt;, sitting right alongside Performance and SEO. It doesn't show a number out of 100. It shows a ratio — &lt;code&gt;2/3&lt;/code&gt;, &lt;code&gt;3/3&lt;/code&gt; — and a short list of checks with names like "accessibility tree" and "llms.txt".&lt;/p&gt;

&lt;p&gt;That ratio is not measuring how fast your page loads or whether your headings are in order. It's measuring &lt;strong&gt;how well an AI agent can read your page, understand it, and act on it — with no human in the loop.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is the quiet half of a shift that's been building all year. We spent two decades optimizing for two audiences: humans who read, and crawlers that index. There's now a third, and it behaves like neither. An agent doesn't skim your hero copy or admire your animations. It wants structure it can parse, facts it can extract, and tools it can call. Google just turned "are you ready for that audience?" into a number anyone can pull up.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: SEO made your site &lt;em&gt;findable&lt;/em&gt;. GEO made it &lt;em&gt;quotable&lt;/em&gt;. Agentic Browsing makes it &lt;em&gt;usable by software&lt;/em&gt;. These are three different jobs, and the third one is now scored in the same tool you already use for Core Web Vitals.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What Is the Agentic Browsing Category in PageSpeed Insights?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Agentic Browsing is a Lighthouse category that scores how ready a page is for an AI agent to read it, understand it, and act on it without a human driving.&lt;/strong&gt; It was introduced in &lt;strong&gt;Lighthouse 13.3 on May 7, 2026&lt;/strong&gt;, moved straight into the default config, and PageSpeed Insights picked it up within a couple of weeks. As of mid-June 2026 it's live for everyone.&lt;/p&gt;

&lt;p&gt;A few things make it behave differently from every other Lighthouse category:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;It reports a ratio, not a score out of 100.&lt;/strong&gt; You'll see &lt;code&gt;3/3&lt;/code&gt;, not &lt;code&gt;92&lt;/code&gt;. It's a count of passed checks, not a weighted index.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It's marked "under development."&lt;/strong&gt; The exact audits and the way they're scored will change. Don't carve a &lt;code&gt;3/3&lt;/code&gt; into your OKRs yet.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It won't fail you for having no AI features.&lt;/strong&gt; &lt;code&gt;example.com&lt;/code&gt; — a page with almost nothing on it — earns a perfect ratio. This is a checklist of opportunities, not a penalty box.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In PageSpeed Insights, the default result is built from &lt;strong&gt;three checks&lt;/strong&gt;. The broader Lighthouse category runs &lt;strong&gt;four audits&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;Notice what's &lt;em&gt;not&lt;/em&gt; there: nothing about keywords, backlinks, or meta descriptions. This audit is about &lt;strong&gt;machine comprehension and machine action&lt;/strong&gt;, full stop.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Google Added It — Agents Are Real Traffic Now
&lt;/h2&gt;

&lt;p&gt;The cynical read is "Google wants another number to chase." The accurate read is simpler: &lt;strong&gt;a meaningful and growing share of the requests hitting public web servers are agents, not humans&lt;/strong&gt; — and the tooling finally caught up to that reality.&lt;/p&gt;

&lt;p&gt;Look at who's browsing on a user's behalf in 2026: OpenAI's Operator, Anthropic's Computer Use, Google's own Project Mariner, Perplexity, and ChatGPT's browse mode. These don't issue a query and read ten blue links. They get a &lt;em&gt;task&lt;/em&gt; — "compare these three products and book the cheapest one that ships by Friday" — and they execute it across multiple sites. To do that, they have to read your page, model what's on it, and act.&lt;/p&gt;

&lt;p&gt;When an agent hits a page built only for human eyes, three things go wrong:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Comprehension is expensive.&lt;/strong&gt; Feeding raw HTML or a screenshot into a model burns tokens and invites mistakes. A clean accessibility tree is an order of magnitude cheaper to reason over.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Action is fragile.&lt;/strong&gt; If your "Add to cart" is a &lt;code&gt;&amp;lt;div&amp;gt;&lt;/code&gt; with an &lt;code&gt;onclick&lt;/code&gt;, an agent has to &lt;em&gt;guess&lt;/em&gt;. Annotated forms and registered tools remove the guessing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Discovery is a coin flip.&lt;/strong&gt; Without an &lt;code&gt;llms.txt&lt;/code&gt; or a tool manifest, the agent has to reverse-engineer your site's structure every single visit.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The honest caveat&lt;/strong&gt;&lt;br&gt;
This category is new and explicitly "under development." &lt;code&gt;llms.txt&lt;/code&gt; in particular isn't yet widely consumed by AI tools — even the Lighthouse team says so. None of this is a guaranteed ranking lever today. It's a low-cost bet on where the web is obviously heading, made measurable a year or two before most sites will bother. That early-mover window is the whole point.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How It Helps Developers (and Agents)
&lt;/h2&gt;

&lt;p&gt;For &lt;strong&gt;agents&lt;/strong&gt;, the payoff is obvious: cheaper comprehension, reliable action, less hallucination about what your site does.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;developers&lt;/strong&gt;, the wins are quieter but real:&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Make Any Website AI-Ready
&lt;/h2&gt;

&lt;p&gt;Here's where it gets practical. The PageSpeed category is the &lt;em&gt;headline&lt;/em&gt;, but the fuller checklist lives at &lt;strong&gt;&lt;a href="https://isitagentready.com/" rel="noopener noreferrer"&gt;isitagentready.com&lt;/a&gt;&lt;/strong&gt; — a free scanner that groups agent-readiness into five categories. I've used its taxonomy to structure the work below, because it maps cleanly onto what agents actually look for.&lt;/p&gt;

&lt;p&gt;You don't need all five. A blog needs the first four and can ignore commerce entirely. A store needs all five. Work top-down — discoverability is the cheapest and highest-leverage.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: The single highest-leverage change most sites can make is also the most boring — &lt;strong&gt;stop rendering text and diagrams as images.&lt;/strong&gt; An image of a table is invisible to the accessibility tree. A real &lt;code&gt;&amp;lt;table&amp;gt;&lt;/code&gt; (or a component that renders one) is readable by screen readers, crawlers, &lt;em&gt;and&lt;/em&gt; agents in one shot. I'll come back to this, because it's exactly how I scored my own site.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;If you want to go deeper on the callable layer, I wrote a full walkthrough of &lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;building a production MCP server&lt;/a&gt; and &lt;a href="https://umesh-malik.com/blog/deploy-mcp-server-cloudflare-workers" rel="noopener noreferrer"&gt;deploying it on Cloudflare Workers&lt;/a&gt; — that's the same server backing the numbers below.&lt;/p&gt;

&lt;h2&gt;
  
  
  Proof: How AI-Ready Is umesh-malik.com?
&lt;/h2&gt;

&lt;p&gt;Talk is cheap, so here's the receipt. I ran my own site — &lt;code&gt;umesh-malik.com&lt;/code&gt; — through &lt;strong&gt;Lighthouse 13.4&lt;/strong&gt; (the engine behind PageSpeed Insights) and the &lt;a href="https://isitagentready.com/" rel="noopener noreferrer"&gt;isitagentready.com&lt;/a&gt; checklist. This site is a SvelteKit SSG — &lt;strong&gt;fully static, one Cloudflare Worker, no special infrastructure.&lt;/strong&gt; Everything below ships from files in a public repo.&lt;/p&gt;

&lt;p&gt;The Agentic Browsing category comes back &lt;strong&gt;3/3&lt;/strong&gt; — every weighted check passing:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhw9vgtwyf35is2y02p6c.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fhw9vgtwyf35is2y02p6c.png" alt="Lighthouse 13.4 Agentic Browsing result for umesh-malik.com: a green 3 out of 3 score, with three passed audits — accessibility tree well-formed, Cumulative Layout Shift of 0, and llms.txt follows recommendations — plus three WebMCP audits marked Not Applicable" width="800" height="1118"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here's the honest, category-by-category breakdown — including what the site &lt;strong&gt;doesn't&lt;/strong&gt; have:&lt;/p&gt;

&lt;p&gt;.md and /llms.txt', tone: 'neutral' }] },&lt;br&gt;
    { label: 'llms.txt + llms-full.txt', cells: [{ text: 'Yes', tone: 'positive' }, { text: 'Valid H1, description, ~3,500 words, every post linked', tone: 'neutral' }] },&lt;br&gt;
    { label: 'MCP server card', cells: [{ text: 'Yes', tone: 'positive' }, { text: '/.well-known/mcp/server-card.json — 4 tools, Streamable HTTP', tone: 'neutral' }] },&lt;br&gt;
    { label: 'Agent Skills + API catalog + WebMCP', cells: [{ text: 'Yes', tone: 'positive' }, { text: 'agent-skills/index.json, RFC 9727 linkset, navigator.modelContext', tone: 'neutral' }] },&lt;br&gt;
    { label: 'OAuth discovery + auth.md', cells: [{ text: 'Yes', tone: 'positive' }, { text: 'Honestly declares the site as public/anonymous — no fake auth server', tone: 'neutral' }] },&lt;br&gt;
    { label: 'DNS-AID + Web Bot Auth', cells: [{ text: 'Not yet', tone: 'negative' }, { text: 'On the roadmap — newer, lower-leverage for a content site', tone: 'neutral' }] },&lt;br&gt;
    { label: 'Commerce (x402, MPP, UCP, ACP)', cells: [{ text: 'N/A', tone: 'neutral' }, { text: 'It\'s a portfolio + blog. Nothing to sell, nothing to fake.', tone: 'neutral' }] }&lt;br&gt;
  ]}&lt;br&gt;
/&amp;gt;&lt;/p&gt;

&lt;p&gt;The result: the site passes &lt;strong&gt;every category that applies to it&lt;/strong&gt; and scores &lt;strong&gt;7/7 on protocol discovery&lt;/strong&gt;. The 3/3 above is a real Lighthouse run, not a mockup — and I'm not pretending DNS-AID and Web Bot Auth are done, because they aren't. (Note the three WebMCP audits show as &lt;em&gt;Not Applicable&lt;/em&gt; in the screenshot: the site registers WebMCP tools in code, but Lighthouse's WebMCP audits are still informational and didn't score them on this page — an honest nuance of a category that's openly "under development.") That candor is the point. &lt;strong&gt;Agent-readiness is a real engineering state, not a vanity badge.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How I Actually Got Here (and How You Can Copy It)
&lt;/h3&gt;

&lt;p&gt;None of this required a backend rewrite. The whole agent-discovery layer is one Cloudflare Worker plus a handful of static files:&lt;/p&gt;

&lt;p&gt;The most important line in that list is the one about &lt;strong&gt;diagrams as DOM components&lt;/strong&gt;. This very post is the proof: every chart, table and step list you've scrolled past is a real Svelte component rendering semantic HTML — not a PNG. That's why an agent (or a screen reader) can read all of it, and it's a large part of why the accessibility-tree check passes. One decision, paid back across three audiences.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The reusable principle&lt;/strong&gt;&lt;br&gt;
You don't need Cloudflare, SvelteKit, or my stack. The pattern generalizes: &lt;strong&gt;reuse the content you already publish&lt;/strong&gt; (your feed, your Markdown, your profile) and expose it through machine-readable surfaces — robots.txt, llms.txt, Link headers, an MCP endpoint. The data already exists. Agent-readiness is mostly about &lt;em&gt;presenting&lt;/em&gt; it in formats agents understand, not creating new content.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Common Mistakes to Avoid
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Chasing the ratio instead of the readiness.&lt;/strong&gt; &lt;code&gt;example.com&lt;/code&gt; scores a perfect ratio with nothing real behind it. A green &lt;code&gt;3/3&lt;/code&gt; on a site full of image-of-text content is a lie you're telling yourself. Optimize the underlying state, not the badge.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Faking an auth server.&lt;/strong&gt; If your site is public, &lt;em&gt;say so&lt;/em&gt; in auth.md and OAuth discovery. Advertising endpoints that don't exist breaks the agents that trust them. Honest "anonymous, no auth" beats a fictional token endpoint.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Treating llms.txt as a keyword dump.&lt;/strong&gt; It's a map, not a meta-keywords tag. Give it a clear H1, a real description, and links to your genuinely best content. Stuffing it is the 2007 SEO mistake in a new file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Shipping images of text.&lt;/strong&gt; The single most common thing that quietly fails the accessibility-tree check. If a human needs to read words in it, it should be real text, not a screenshot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignoring CLS because "it's just SEO."&lt;/strong&gt; Layout that jumps confuses screenshot-based agents the same way it annoys users. Your &lt;a href="https://umesh-malik.com/blog/core-web-vitals-optimization-guide" rel="noopener noreferrer"&gt;Core Web Vitals work&lt;/a&gt; now pays an agentic dividend too.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Agentic Browsing in PageSpeed Insights is small today — a ratio, marked "under development," consumed by tools that are themselves a year young. It would be easy to dismiss. Don't.&lt;/p&gt;

&lt;p&gt;The trajectory is unmistakable: agents are becoming a first-class audience for the web, Google just made their needs &lt;em&gt;measurable&lt;/em&gt;, and the work to satisfy them is cheap, mostly static, and overlaps almost entirely with accessibility and good engineering you should be doing anyway. The sites that ship a clean accessibility tree, a real &lt;code&gt;llms.txt&lt;/code&gt;, and a callable MCP endpoint &lt;strong&gt;now&lt;/strong&gt; will be the ones agents reach for when the rest of the web is still serving them screenshots.&lt;/p&gt;

&lt;p&gt;I made my own site agent-ready with a Worker and some static files, and I documented every move so you can do the same. Start with robots.txt and llms.txt this week. Then decide how deep into protocol discovery your site deserves to go.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written for &lt;a href="https://umesh-malik.com" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt; — no-fluff technical writing on AI, Web Dev, and Engineering. Curious how the callable layer works? Read &lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;How to Build a Production MCP Server&lt;/a&gt; next.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/agentic-browsing-pagespeed-ai-ready" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/autonomous-ai-agents-production-gap-2026" rel="noopener noreferrer"&gt;AI Agents That Run the Business in 2026: Why 77% Never Reach Production (and What the 23% Do Differently)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;How to Build a Production MCP Server (I Added One to My Site)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/build-rag-pipeline-from-scratch" rel="noopener noreferrer"&gt;Build a RAG Pipeline From Scratch (Production Patterns That Actually Matter)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>agenticbrowsing</category>
      <category>pagespeedinsights</category>
      <category>aiagents</category>
      <category>geo</category>
    </item>
    <item>
      <title>AI Agents That Run the Business in 2026: Why 77% Never Reach Production (and What the 23% Do Differently)</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Sun, 14 Jun 2026 17:00:02 +0000</pubDate>
      <link>https://dev.to/umesh_malik/ai-agents-that-run-the-business-in-2026-why-77-never-reach-production-and-what-the-23-do-23j7</link>
      <guid>https://dev.to/umesh_malik/ai-agents-that-run-the-business-in-2026-why-77-never-reach-production-and-what-the-23-do-23j7</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Building an AI agent is easy. Shipping one that runs your business is where roughly 77% of projects die.&lt;/strong&gt; The demo-to-production gap is the real story of agentic AI in 2026 — not the model benchmarks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Three things kill agents in production:&lt;/strong&gt; compounding error across long tool chains, fuzzy accountability when the agent acts on its own, and the unglamorous integration work nobody puts in a demo.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The 23% who ship all do the same five things:&lt;/strong&gt; pick a narrow, high-volume task; keep a human on the risky steps; scope permissions tightly; build an eval harness before scaling; and graduate from shadow mode to autonomy instead of flipping a switch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proof it works when it's bounded:&lt;/strong&gt; Lassie raised $35M in June 2026 to run medical- and dental-practice back offices for 700+ businesses, reclaiming about 250,000 staff-hours a year.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Everyone Is Building AI Agents. Almost Nobody Is Shipping Them.
&lt;/h2&gt;

&lt;p&gt;Lassie just raised $35 million to make small businesses run themselves. Andreessen Horowitz led the round in June 2026, and the pitch is exactly as ambitious as it sounds: &lt;strong&gt;autonomous AI agents&lt;/strong&gt; that don't just help a medical practice with its back office — they &lt;em&gt;run&lt;/em&gt; it. Payment enrollment, reconciliation, insurance appeals, follow-up. The software does the work, not the staff.&lt;/p&gt;

&lt;p&gt;Here's the uncomfortable part. For every Lassie, there are a hundred agent projects quietly dying in a sandbox. McKinsey's 2026 numbers say it plainly: &lt;strong&gt;62% of organizations are experimenting with agents, but only 23% have scaled them.&lt;/strong&gt; Gartner expects 40% of enterprise apps to embed task-specific agents by the end of 2026 — up from less than 5% — which means the gap between "we built an agent" and "the business runs on it" is about to become the most expensive gap in software.&lt;/p&gt;

&lt;p&gt;That funnel is the whole article. The winners aren't the teams with the smartest model — by 2026 everyone has access to roughly the same frontier models. The winners are the teams that treated &lt;em&gt;autonomous&lt;/em&gt; as an outcome to earn, not a switch to flip. Let's break down exactly where the 77% fall out, and what the survivors do differently.&lt;/p&gt;

&lt;h2&gt;
  
  
  What an Autonomous AI Agent That Runs the Business Actually Means
&lt;/h2&gt;

&lt;p&gt;The word "agent" got stretched into meaninglessness in 2025. Half the products calling themselves agents are chatbots with a system prompt. So let's be precise.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Definition&lt;/strong&gt;&lt;br&gt;
An &lt;strong&gt;autonomous AI agent&lt;/strong&gt; is software that pursues a goal by deciding its own steps and taking real actions across tools and systems — not just answering a prompt. A copilot suggests; an agent acts. An agent that "runs the business" owns a workflow end to end, with a human in the loop by exception rather than by default.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It helps to see it as a ladder:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;chatbot&lt;/strong&gt; answers questions. It has no hands.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;copilot&lt;/strong&gt; drafts and suggests. You review every output and you take the action.&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;autonomous agent&lt;/strong&gt; takes the action itself — it books, files, reconciles, emails — and only escalates to a human when its own policy says to.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The jump that matters is from &lt;em&gt;suggesting&lt;/em&gt; to &lt;em&gt;acting&lt;/em&gt;. That single step is where reliability, trust, and accountability stop being nice-to-haves and start being the entire engineering problem. It's also why "agentic" is not the same as "automation." Classic automation follows a fixed script you wrote. An agent chooses the path at runtime — which is exactly what makes it powerful and exactly what makes it hard to ship.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Suddenly Real in 2026
&lt;/h2&gt;

&lt;p&gt;Agents aren't new as an idea. What changed in 2026 is that three curves crossed at once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Models got good enough and cheap enough.&lt;/strong&gt; The June 2026 release wave pushed frontier capability up and token prices down hard. Reasoning that was research-grade in 2024 is now a line item.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Integration got standardized.&lt;/strong&gt; The Model Context Protocol turned "wire the agent into your stack" from a bespoke six-week project into a connector you can reuse. Plumbing was the silent blocker, and it got a lot less silent.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The economics finally make sense for the back office.&lt;/strong&gt; This is the part founders underrate. The money isn't in flashy consumer demos — it's in the boring, expensive work every small business drowns in.&lt;/p&gt;

&lt;p&gt;Andreessen Horowitz called small businesses "the next frontier for AI" for exactly this reason: a single medical practice can burn over 100 hours a month and roughly $200,000 a year on administrative work that is repetitive, rule-bound, and perfect for an agent — &lt;em&gt;if&lt;/em&gt; you can get the agent into production. Which brings us to the hard part.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fv6p084l8o499nztjx3nt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fv6p084l8o499nztjx3nt.png" alt="Funnel showing 62% of organizations experimenting with AI agents but only 23% reaching production — the 77% that stall between pilot and deployment" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why 77% of AI Agents Never Reach Production
&lt;/h2&gt;

&lt;p&gt;The gap is not a model problem. It's a systems problem. Here are the five failure modes that kill agents between an impressive demo and a dependable deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Compounding error is the silent killer
&lt;/h3&gt;

&lt;p&gt;A demo runs three steps and looks like magic. Production runs twenty and falls apart. Reliability multiplies — it doesn't average.&lt;/p&gt;

&lt;p&gt;An agent chaining 20 tool calls at 95% per-step reliability succeeds end to end only about &lt;strong&gt;36% of the time&lt;/strong&gt; (0.95^20 ≈ 0.36). That's not a model you can ship; that's a coin flip you'd lose two times out of three. Push per-step reliability to a heroic 99% and you're still only at 82% across 20 steps.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F5f0i67ojjyk07nw3mfgu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F5f0i67ojjyk07nw3mfgu.png" alt="Compounding error chart: at 95% per-step reliability, an autonomous agent's end-to-end success rate falls to about 36% over 20 steps" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The fix is not a cleverer prompt. It's fewer steps, verification between steps, and retries that actually check their work. The teams that ship design &lt;em&gt;short, checkpointed&lt;/em&gt; chains. The teams that stall keep adding steps and hoping.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Nobody owns the outcome
&lt;/h3&gt;

&lt;p&gt;When a copilot suggests something wrong, a human catches it. When an agent files the insurance appeal, posts the transaction, or emails the customer, there is no catch — unless you built one.&lt;/p&gt;

&lt;p&gt;Demos hide this because the person demoing is the safety net. Production has to encode the safety net as policy: approval gates on irreversible actions, reversibility where you can manage it, and an audit trail for everything. "Who is accountable when the agent is wrong?" is a question you answer in your architecture, not your marketing.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. The integration tax nobody demos
&lt;/h3&gt;

&lt;p&gt;The exciting part is reasoning. The expensive part is plumbing — connecting to the practice-management system, the payment processor, the ledger, the CRM, the half-documented internal API from 2014.&lt;/p&gt;

&lt;p&gt;Most pilots stall here. Not because the agent can't &lt;em&gt;think&lt;/em&gt;, but because it can't reliably &lt;em&gt;act&lt;/em&gt; in the messy systems a real business runs on. Standardization like &lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;the Model Context Protocol made this tractable&lt;/a&gt; — it did not make it trivial. Budget for the integration tax or it will quietly eat your timeline.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. No evals, no production
&lt;/h3&gt;

&lt;p&gt;If you can't measure whether the agent did the job, you cannot ship it. Yet most teams still test by vibes: try a few prompts, it looks good, ship it.&lt;/p&gt;

&lt;p&gt;Production needs an eval harness — a labeled set of real tasks, an automated grader, and a single number you can watch move as you change things. This is the same discipline behind &lt;a href="https://umesh-malik.com/blog/spec-driven-development-ai-agents-addy-osmani" rel="noopener noreferrer"&gt;spec-driven development&lt;/a&gt;: write down what "done" means before you trust a machine to do it. No harness, no honest answer to "is it good enough yet?"&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Cost and latency at scale
&lt;/h3&gt;

&lt;p&gt;A run that costs $0.40 and takes 90 seconds is delightful in a demo and brutal at 50,000 runs a day. The unit economics of the agent loop — tokens, retries, tool round-trips — decide whether the pilot survives contact with real volume.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight:&lt;/strong&gt; The teams that ship treat reliability as an engineering budget to spend, not a model property to wait for.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What the 23% Do Differently
&lt;/h2&gt;

&lt;p&gt;The survivors are almost boring about it. They don't chase the most autonomous agent they can build — they build the most &lt;em&gt;bounded&lt;/em&gt; agent that solves a real problem, then earn autonomy from there.&lt;/p&gt;

&lt;p&gt;Notice what's missing: "use a bigger model." Model choice matters, but it's table stakes. The differentiator is operating discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Autonomy Spectrum: It's a Dial, Not a Switch
&lt;/h2&gt;

&lt;p&gt;The biggest framing mistake teams make is treating autonomy as binary — either the human does it or the agent does. In reality it's a spectrum, and the credible 2026 deployments cluster in the middle.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3cedxuplz4dyb5zy23vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F3cedxuplz4dyb5zy23vs.png" alt="Autonomy spectrum from L0 assist to L4 full autonomy, with most credible 2026 business deployments sitting at L2–L3 supervised autonomy" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;L4 makes the headlines and the funding decks. L2 and L3 make the money. A supervised agent that handles 90% of cases autonomously and escalates the weird 10% to a human is worth far more than a fully autonomous agent that's right 70% of the time and unaccountable for the other 30%. Earn your way up the ladder; don't start at the top.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Mistakes That Keep Teams Stuck
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Avoid these five traps&lt;/strong&gt;&lt;br&gt;
The same anti-patterns show up in almost every stalled agent project. If you recognize more than one, that's your roadmap.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The "run my whole company" fantasy.&lt;/strong&gt; Broad, open-ended scope is undemoable and unshippable. Narrow until the task is boring, then ship the boring version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Demo-driven development.&lt;/strong&gt; Optimizing for the three-step happy path that looks great on stage and ignores the long tail that breaks in production.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Over-permissioned agents.&lt;/strong&gt; Handing the agent god-mode credentials "to move fast." You're one prompt injection away from regret. Scope everything.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skipping evals.&lt;/strong&gt; Without a number, "good enough" is a feeling, and feelings don't survive a board meeting after the agent fails publicly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ignoring the integration tax.&lt;/strong&gt; Treating the messy back-office plumbing as an afterthought, then discovering it &lt;em&gt;is&lt;/em&gt; the project.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Case Study: How Lassie Actually Ships Autonomy
&lt;/h2&gt;

&lt;p&gt;Lassie is a useful case study precisely because it isn't trying to do everything. It picked one vertical with a brutal admin burden and went deep.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fw7w48p2u6633e9d913lh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2Fw7w48p2u6633e9d913lh.png" alt="Case study diagram of Lassie's autonomous agents handling medical-practice back office: payment reconciliation, appeals, reporting — reclaiming 250,000 hours a year" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The lesson isn't "build a Lassie." It's that their design choices are the production playbook in disguise: a narrow vertical (bounded scope), high-volume repetitive tasks (testable reliability), and a workflow with clear success criteria (eval-friendly). They didn't win by being more autonomous than everyone else. They won by being autonomous about the &lt;em&gt;right, small thing&lt;/em&gt; — and being able to prove it worked.&lt;/p&gt;

&lt;h2&gt;
  
  
  Is Your Agent Actually Production-Ready?
&lt;/h2&gt;

&lt;p&gt;Before you let an agent touch anything a customer or a regulator will see, run this checklist. If you can't tick the criticals, you have a demo, not a deployment.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;In 2026, building an autonomous AI agent is a weekend project. Building one your business can actually run on is a discipline — and it's a discipline most teams skip on the way to a demo that wins applause and a deployment that never arrives.&lt;/p&gt;

&lt;p&gt;Stop trying to flip the autonomy switch. Pick one narrow, high-volume, boring task. Put a human on the dangerous steps. Scope the permissions. Build the eval harness. Run it in the shadows until the numbers earn your trust — then, and only then, let it act on its own. That's how the 23% ship while everyone else demos.&lt;/p&gt;

&lt;p&gt;If this was useful, read &lt;a href="https://umesh-malik.com/blog/agentic-ai-enterprise-security-model" rel="noopener noreferrer"&gt;how agentic AI breaks the enterprise security model&lt;/a&gt; next — because the moment your agent can act, security stops being optional. Then learn to &lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;build the integration layer with MCP&lt;/a&gt; and to &lt;a href="https://umesh-malik.com/blog/spec-driven-development-ai-agents-addy-osmani" rel="noopener noreferrer"&gt;pin down "done" with spec-driven development&lt;/a&gt;.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written for &lt;a href="https://umesh-malik.com" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt; — no-fluff technical writing on AI, Web Dev, and Engineering.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  SEO Summary (unpublished)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Suggested slug:&lt;/strong&gt; /blog/autonomous-ai-agents-production-gap-2026&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Meta description: "&lt;/strong&gt; Everyone's building autonomous AI agents in 2026 — but only 23% reach production. The demo-to-production gap, why agents fail, and the playbook the winners use."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Primary keyword:&lt;/strong&gt; autonomous AI agents 2026&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Secondary keywords:&lt;/strong&gt; AI agents for business, AI agents in production, agentic AI 2026, why AI agents fail, human-in-the-loop agents, vertical AI agents, AI agent reliability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GEO hooks:&lt;/strong&gt; "What an Autonomous AI Agent That Runs the Business Actually Means", "Why 77% of AI Agents Never Reach Production", "What the 23% Do Differently"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Internal links:&lt;/strong&gt; agentic-ai-enterprise-security-model, how-to-build-mcp-server, spec-driven-development-ai-agents-addy-osmani&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Featured snippet opportunity:&lt;/strong&gt; Y — the autonomy spectrum table and the "Why 77% never reach production" list&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/autonomous-ai-agents-production-gap-2026" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;How to Build a Production MCP Server (I Added One to My Site)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/rag-vs-fine-tuning-llms-2026" rel="noopener noreferrer"&gt;RAG vs Fine-Tuning for LLMs in 2026: A Production Decision Framework With Real Tradeoffs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/agentic-browsing-pagespeed-ai-ready" rel="noopener noreferrer"&gt;Agentic Browsing in PageSpeed Insights: How to Make Your Website AI-Ready (2026)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aiagents</category>
      <category>agenticai</category>
      <category>llmengineering</category>
      <category>aiinproduction</category>
    </item>
    <item>
      <title>How to Write a CLAUDE.md That Actually Helps</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:24:05 +0000</pubDate>
      <link>https://dev.to/umesh_malik/how-to-write-a-claudemd-that-actually-helps-2o3j</link>
      <guid>https://dev.to/umesh_malik/how-to-write-a-claudemd-that-actually-helps-2o3j</guid>
      <description>&lt;p&gt;Most CLAUDE.md files are useless in one of two ways: they're empty, or they're bloated with things the agent could figure out in five seconds. Both waste the one thing the file exists to spend well — the agent's attention at the start of every session.&lt;/p&gt;

&lt;p&gt;A good CLAUDE.md isn't documentation. It's a &lt;strong&gt;briefing for a senior engineer joining your team today&lt;/strong&gt; who happens to read fast and forget nothing. You don't hand that person a file tree. You tell them how to run the thing, how the pieces fit, the conventions that aren't obvious, and the traps that already bit you.&lt;/p&gt;

&lt;p&gt;I maintain CLAUDE.md files across my projects, and the difference between a good one and a bad one is the difference between an agent that moves like a teammate and one that re-derives your architecture every session. Here's what actually works.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;CLAUDE.md is a context file Claude Code reads automatically&lt;/strong&gt; — treat it as a briefing, not documentation.&lt;/li&gt;
&lt;li&gt;Include the &lt;strong&gt;non-obvious&lt;/strong&gt;: commands, big-picture architecture, cross-cutting conventions, and gotchas.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Leave out&lt;/strong&gt; what the agent discovers instantly (file trees) or what rots (generic best-practice filler).&lt;/li&gt;
&lt;li&gt;Use a &lt;strong&gt;root file for the big picture&lt;/strong&gt; and &lt;strong&gt;nested files&lt;/strong&gt; for subproject-specific rules.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Keep it alive.&lt;/strong&gt; A CLAUDE.md that drifts from reality is worse than none — it actively misleads.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What a CLAUDE.md actually is
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;CLAUDE.md is a Markdown file that Claude Code loads as project context at the start of a session.&lt;/strong&gt; Whatever you put in it becomes part of what the agent knows before it reads a single line of your code. That's the whole mechanism — and it's why the file is so easy to get wrong. Anything you write is "free" knowledge the agent starts with; anything you omit, it has to rediscover (and sometimes guess at) every time.&lt;/p&gt;

&lt;p&gt;So the real question isn't "what could I document?" It's &lt;strong&gt;"what does the agent most need to know that it can't quickly find out itself?"&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: Optimize for the agent's first five minutes. What would a sharp new hire need to be productive — and what would they figure out on their own without being told?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What actually belongs in it
&lt;/h2&gt;

&lt;p&gt;Four things earn their place. Almost nothing else does.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Commands
&lt;/h3&gt;

&lt;p&gt;How to build, test, lint, run, and deploy — including the non-obvious incantations. If running a single test needs a specific flag, if dev mode needs two terminals, if there's a pre-commit gate, that's gold. The agent will otherwise guess, and guess wrong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Commands&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`pnpm dev`&lt;/span&gt; — main site on :5173
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`pnpm check`&lt;/span&gt; — typecheck; MUST pass before commit
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`pnpm build`&lt;/span&gt; — proves the static prerender works
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. The big-picture architecture
&lt;/h3&gt;

&lt;p&gt;Not every file — the &lt;em&gt;shape&lt;/em&gt;. How the major pieces fit, what talks to what, where the boundaries are. The things that require reading five files to understand. A short prose map or a simple diagram here saves the agent (and you) enormous time.&lt;/p&gt;

&lt;p&gt;This is where "the analytics API is the only server-side code" or "sub-apps are built separately and copied in at build time" belongs — facts that aren't visible from any single file.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Cross-cutting conventions
&lt;/h3&gt;

&lt;p&gt;The rules that span the codebase and that the agent would otherwise violate: "everything ships static, never introduce SSR," "use runes only, no legacy reactive syntax," "canonical domain is X, use the central config." State them as rules, with the &lt;em&gt;why&lt;/em&gt; when it isn't obvious.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. Gotchas and hard-won lessons
&lt;/h3&gt;

&lt;p&gt;The traps. "This build step fails silently if X." "Don't edit the generated covers by hand." "Running git add from inside the subdir breaks paths." These are the highest-value lines in the file because they're the ones nobody could infer.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to leave out
&lt;/h2&gt;

&lt;p&gt;Every line that doesn't earn its place dilutes the ones that do. Cut:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Exhaustive file trees and component lists.&lt;/strong&gt; The agent can run &lt;code&gt;ls&lt;/code&gt; and &lt;code&gt;grep&lt;/code&gt; faster than you can maintain a manifest. Describe structure only where it's non-obvious.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Generic best practices.&lt;/strong&gt; "Write unit tests." "Handle errors gracefully." "Don't commit secrets." The agent already knows. These read as noise and train it to skim.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Restating the code.&lt;/strong&gt; If a function's behavior is clear from its name and body, don't narrate it in CLAUDE.md.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anything that rots.&lt;/strong&gt; Version numbers, line counts, "currently we're working on X" — unless you'll actually keep them current. Stale instructions are worse than missing ones because the agent &lt;em&gt;trusts&lt;/em&gt; them.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: A CLAUDE.md that's wrong is worse than one that's empty. An empty file makes the agent investigate; a wrong file makes it confidently do the wrong thing.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  A structure that works
&lt;/h2&gt;

&lt;p&gt;You don't need a rigid template, but this shape covers the essentials without bloat:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gh"&gt;# CLAUDE.md&lt;/span&gt;

&lt;span class="gu"&gt;## What This Repo Is&lt;/span&gt;
[One paragraph: what it is, the stack, how it's organized.]

&lt;span class="gu"&gt;## Architecture / How the Pieces Fit&lt;/span&gt;
[The big picture. A diagram or short prose map. Boundaries and data flow.]

&lt;span class="gu"&gt;## Common Commands&lt;/span&gt;
[Build, test, run, deploy — including the non-obvious ones.]

&lt;span class="gu"&gt;## Conventions&lt;/span&gt;
[Cross-cutting rules the agent must follow, with the why.]

&lt;span class="gu"&gt;## Gotchas&lt;/span&gt;
[Traps, footguns, things that bit you before.]

&lt;span class="gu"&gt;## Keeping This File Up To Date&lt;/span&gt;
[A note that this is a living map, updated as part of normal work.]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a monorepo, go further: a &lt;strong&gt;root CLAUDE.md&lt;/strong&gt; for the big picture, plus a &lt;strong&gt;nested CLAUDE.md inside each subproject&lt;/strong&gt; for rules specific to it. Claude Code reads the relevant files based on where you're working, so subproject rules stay close to the code they govern. There's also a user-level &lt;code&gt;~/.claude/CLAUDE.md&lt;/code&gt; for personal preferences that follow you across every project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The kitchen sink.&lt;/strong&gt; Dumping everything turns the signal-to-noise ratio against you. Be ruthless.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write-once, never-update.&lt;/strong&gt; The fastest way to make a CLAUDE.md harmful is to let it drift. Update it &lt;em&gt;in the same change&lt;/em&gt; that alters what it describes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Documenting the obvious.&lt;/strong&gt; If the agent learns it in one &lt;code&gt;grep&lt;/code&gt;, it doesn't belong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No commands.&lt;/strong&gt; The single most useful section, and the one people most often skip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vague rules.&lt;/strong&gt; "Follow good practices" tells the agent nothing. "Never use &lt;code&gt;transition-all&lt;/code&gt;; always name the exact property" tells it exactly what to do.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Bootstrap with &lt;code&gt;/init&lt;/code&gt;, then trim.&lt;/strong&gt; Claude Code's &lt;code&gt;/init&lt;/code&gt; generates a first draft from your repo. Treat it as a starting point and cut it down to the non-obvious essentials.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write rules, not essays.&lt;/strong&gt; Short, imperative, specific. "Do X. Never Y. Because Z."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Put rules near the code.&lt;/strong&gt; Root file for the big picture; nested files for subproject specifics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Make it a living document.&lt;/strong&gt; Update it as part of the change that affects it — not as a someday chore.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-read it as the agent would.&lt;/strong&gt; If a line wouldn't change what the agent does, or it could learn it instantly, delete it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Lead with commands and gotchas.&lt;/strong&gt; They're the highest-leverage content in the file.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;A CLAUDE.md is the cheapest leverage you have over how well an AI agent works in your codebase — and almost everyone either skips it or stuffs it. Write it like a briefing for a sharp teammate: the commands, the architecture that isn't obvious, the conventions that matter, the traps that bite. Cut everything they'd discover on their own. Then keep it honest.&lt;/p&gt;

&lt;p&gt;Do that, and the agent stops re-learning your project every session and starts acting like it already knows it.&lt;/p&gt;

&lt;p&gt;Going deeper on agentic coding? See &lt;a href="https://umesh-malik.com/topics/claude-code" rel="noopener noreferrer"&gt;Claude Code — Guides &amp;amp; Deep Dives&lt;/a&gt; and &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents — Agentic AI for Developers&lt;/a&gt;, or the &lt;a href="https://docs.claude.com/en/docs/claude-code" rel="noopener noreferrer"&gt;official Claude Code documentation&lt;/a&gt; for the full feature set.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explore more:&lt;/strong&gt; &lt;a href="https://umesh-malik.com/topics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/llm-engineering" rel="noopener noreferrer"&gt;LLM Engineering&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/how-to-write-claude-md" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/cursor-vs-claude-code-vs-copilot" rel="noopener noreferrer"&gt;Cursor vs Claude Code vs Copilot (2026): Which AI Coding Tool, for What&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/claude-fable-5-streaming-microservice-one-day" rel="noopener noreferrer"&gt;How I Built a Full Audio/Video Streaming Microservice in One Day with Claude Fable 5 Auto Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/anthropic-code-review-claude-code-guide" rel="noopener noreferrer"&gt;Claude Code Review by Anthropic: Multi-Agent PR Reviews, Pricing, Setup Guide, and Limits (2026)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>claudecode</category>
      <category>claudemd</category>
      <category>aicodingagents</category>
      <category>developertooling</category>
    </item>
    <item>
      <title>How to Build a Production MCP Server (I Added One to My Site)</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:23:33 +0000</pubDate>
      <link>https://dev.to/umesh_malik/how-to-build-a-production-mcp-server-i-added-one-to-my-site-2mbh</link>
      <guid>https://dev.to/umesh_malik/how-to-build-a-production-mcp-server-i-added-one-to-my-site-2mbh</guid>
      <description>&lt;p&gt;Most sites are built for humans to read and for crawlers to scrape. But the agents showing up now — Claude, ChatGPT, Cursor — don't want your HTML. They want to &lt;em&gt;call&lt;/em&gt; you. Parsing a page to extract three facts is wasteful and fragile; calling a typed tool that returns those three facts is neither.&lt;/p&gt;

&lt;p&gt;That's what the &lt;strong&gt;Model Context Protocol (MCP)&lt;/strong&gt; is for. And the fastest way to understand it is to build one. So I added a production MCP server to this site — it lets an agent search my posts, fetch one as clean Markdown, list my topic hubs, and read my profile — and this is exactly how I did it, with the real code.&lt;/p&gt;

&lt;p&gt;No framework, no database, about 300 lines on a Cloudflare Worker.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;An &lt;strong&gt;MCP server exposes tools&lt;/strong&gt; (functions) that AI agents call over JSON-RPC 2.0 — turning your site from &lt;em&gt;agent-readable&lt;/em&gt; into &lt;em&gt;agent-callable&lt;/em&gt;.&lt;/li&gt;
&lt;li&gt;Use the &lt;strong&gt;Streamable HTTP transport&lt;/strong&gt;: one endpoint, &lt;code&gt;POST /mcp&lt;/code&gt;, that speaks JSON-RPC. A &lt;strong&gt;stateless&lt;/strong&gt; server that returns plain JSON is fully spec-compliant and the easiest to run.&lt;/li&gt;
&lt;li&gt;You need exactly four method handlers: &lt;code&gt;initialize&lt;/code&gt;, &lt;code&gt;tools/list&lt;/code&gt;, &lt;code&gt;tools/call&lt;/code&gt;, and &lt;code&gt;ping&lt;/code&gt; — plus a no-op for notifications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You don't need new infrastructure.&lt;/strong&gt; Back your tools with assets you already publish (a JSON feed, your Markdown pages). One source of truth, nothing to sync.&lt;/li&gt;
&lt;li&gt;Make it discoverable with a manifest at a well-known URL, an entry in your API catalog, and a &lt;code&gt;Link&lt;/code&gt; header.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What is an MCP server?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;An MCP server is a small service that exposes tools an AI agent can invoke over a standard protocol.&lt;/strong&gt; The protocol is JSON-RPC 2.0; the "tools" are named functions with a JSON-Schema for their arguments. When an agent connects, it asks the server "what can you do?" (&lt;code&gt;tools/list&lt;/code&gt;), gets back a list of tools, then calls them (&lt;code&gt;tools/call&lt;/code&gt;) and receives structured results.&lt;/p&gt;

&lt;p&gt;Think of it as a typed API designed specifically for language models. Where a REST API is built for your frontend, an MCP server is built for an agent's reasoning loop: the descriptions are written for a model to read, the inputs are schema-validated, and errors are reported in a way the model can recover from.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: REST is for your app. MCP is for the agent. The difference isn't the wire format — it's that every field is written to be understood by a model, not a developer.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why build one for your own site
&lt;/h2&gt;

&lt;p&gt;Search and chat are moving inside agents. When someone asks Claude or ChatGPT about a topic you've written about, the model is far more likely to use you well if it can call a &lt;code&gt;search_posts&lt;/code&gt; tool than if it has to guess your URL structure and scrape rendered HTML.&lt;/p&gt;

&lt;p&gt;Three concrete wins:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Precision over scraping.&lt;/strong&gt; A tool returns exactly the fields the agent needs — title, URL, summary — with no markup noise.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You control the surface.&lt;/strong&gt; You decide what's callable and what each tool returns. That's a far stronger signal than hoping a crawler parses your page correctly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;It compounds with the rest of your AI-readiness.&lt;/strong&gt; An MCP server sits naturally alongside &lt;code&gt;llms.txt&lt;/code&gt;, structured data, and an API catalog as part of making your site first-class for agents.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It will not, on its own, make every agent "pick" your site — that still depends on relevance and authority. But it removes every technical reason an agent &lt;em&gt;couldn't&lt;/em&gt; use you well.&lt;/p&gt;

&lt;h2&gt;
  
  
  What we're building
&lt;/h2&gt;

&lt;p&gt;Four read-only tools:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;th&gt;Backed by&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;search_posts&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Ranked search over blog posts&lt;/td&gt;
&lt;td&gt;a JSON feed I already publish&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;get_post&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Returns one post as clean Markdown&lt;/td&gt;
&lt;td&gt;prerendered &lt;code&gt;/blog/&amp;lt;slug&amp;gt;.md&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;list_topics&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Lists curated topic hubs&lt;/td&gt;
&lt;td&gt;a small constant&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;get_profile&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Returns the author profile&lt;/td&gt;
&lt;td&gt;my existing &lt;code&gt;llms.txt&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The whole thing runs on a &lt;strong&gt;Cloudflare Worker&lt;/strong&gt; as a &lt;strong&gt;stateless&lt;/strong&gt; JSON-RPC handler. Stateless matters: with no session to track, every request is self-contained, which is the simplest possible thing to host and scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 — The transport
&lt;/h2&gt;

&lt;p&gt;MCP defines two transports. For local tools you use stdio; for a &lt;strong&gt;remote&lt;/strong&gt; server you use &lt;strong&gt;Streamable HTTP&lt;/strong&gt; — a single endpoint that accepts JSON-RPC messages over &lt;code&gt;POST&lt;/code&gt;. The spec lets the server reply with either an SSE stream or a plain JSON body. A read-only server has no streaming notifications to push, so &lt;strong&gt;plain JSON is the right call&lt;/strong&gt; and the simplest.&lt;/p&gt;

&lt;p&gt;Every MCP message is JSON-RPC 2.0. Two tiny helpers cover all our responses:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;rpcResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;jsonrpc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;result&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;rpcError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;unknown&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;jsonrpc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;error&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The endpoint parses the POST body, routes on &lt;code&gt;method&lt;/code&gt;, and returns the JSON-RPC response. Requests carry an &lt;code&gt;id&lt;/code&gt;; &lt;strong&gt;notifications don't&lt;/strong&gt; — and a notification gets no response body, just a &lt;code&gt;202 Accepted&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 — Define your tools
&lt;/h2&gt;

&lt;p&gt;A tool is metadata plus an input schema. The &lt;code&gt;description&lt;/code&gt; is not for you — it's the prompt the model reads to decide whether and how to call the tool. Write it like you're briefing a smart colleague who can't see your code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;search_posts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Search blog posts&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Full-text search across the blog (titles, summaries, tags). Returns matching &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;posts with slug, title, URL, summary, tags and publish date. Use for topics &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
      &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;like AI engineering, LLMs, RAG, Claude Code, or web development.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;object&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;properties&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;string&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Search terms.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;integer&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Max results (default 10, max 30).&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="na"&gt;required&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;query&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="c1"&gt;// get_post, list_topics, get_profile ...&lt;/span&gt;
&lt;span class="p"&gt;];&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: Tool descriptions are prompt engineering. A vague description means the model calls the wrong tool or skips it. Spell out &lt;em&gt;when&lt;/em&gt; to use it and &lt;em&gt;what it returns&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Step 3 — Back tools with data you already have
&lt;/h2&gt;

&lt;p&gt;This is the part most tutorials overcomplicate. &lt;strong&gt;You don't need a database.&lt;/strong&gt; I back every tool with assets the site already prerenders:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;search_posts&lt;/code&gt; fetches my existing &lt;code&gt;/feed.json&lt;/code&gt; (a JSON Feed of every post) and ranks it.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_post&lt;/code&gt; fetches the already-generated &lt;code&gt;/blog/&amp;lt;slug&amp;gt;.md&lt;/code&gt; Markdown variant.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;get_profile&lt;/code&gt; returns my &lt;code&gt;llms.txt&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;On a Cloudflare Worker you reach those via the assets binding, so there's one source of truth and nothing to keep in sync:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;searchPosts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/feed.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ok&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Post index unavailable&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;terms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;query&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sr"&gt;/&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="sr"&gt;+/&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Boolean&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;items&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;hay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[]).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)}&lt;/span&gt;&lt;span class="s2"&gt; &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
      &lt;span class="c1"&gt;// weight title hits over tags over summary&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;terms&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reduce&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;s&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;s&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;tags&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;[]).&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt; &lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toLowerCase&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;hay&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;includes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;x&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nx"&gt;b&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="nx"&gt;a&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;slice&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map&lt;/span&gt;&lt;span class="p"&gt;(({&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;summary&lt;/span&gt; &lt;span class="p"&gt;}));&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Always &lt;strong&gt;validate inputs&lt;/strong&gt; before using them. &lt;code&gt;get_post&lt;/code&gt; takes a slug straight from the model, so it gets a strict regex check before it ever touches a path:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;SLUG_RE&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sr"&gt;/^&lt;/span&gt;&lt;span class="se"&gt;[&lt;/span&gt;&lt;span class="sr"&gt;a-z0-9&lt;/span&gt;&lt;span class="se"&gt;][&lt;/span&gt;&lt;span class="sr"&gt;a-z0-9-&lt;/span&gt;&lt;span class="se"&gt;]{0,120}&lt;/span&gt;&lt;span class="sr"&gt;$/&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="nx"&gt;SLUG_RE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;test&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;`Invalid slug "&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;slug&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;". Use a slug from search_posts.`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Step 4 — Handle the protocol
&lt;/h2&gt;

&lt;p&gt;The router is small. Four real methods, plus notification handling:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;handleRpc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isNotification&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;id&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="k"&gt;switch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;initialize&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;rpcResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;protocolVersion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;2025-06-18&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;listChanged&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;serverInfo&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;my-site&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;1.0.0&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="na"&gt;instructions&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;Tools for querying my blog and profile.&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;
      &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;ping&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;rpcResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{});&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tools/list&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;rpcResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;TOOLS&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
    &lt;span class="k"&gt;case&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;tools/call&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;arguments&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;params&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="p"&gt;{};&lt;/span&gt;
      &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;callTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;args&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;rpcResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;text&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt; &lt;span class="na"&gt;isError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Report tool errors IN-BAND so the model can see and react to them.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;rpcResult&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[{&lt;/span&gt; &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;err&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;message&lt;/span&gt; &lt;span class="p"&gt;}],&lt;/span&gt; &lt;span class="na"&gt;isError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt; &lt;span class="p"&gt;});&lt;/span&gt;
      &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="nl"&gt;default&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
      &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;isNotification&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;            &lt;span class="c1"&gt;// ignore unknown notifications&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;rpcError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;32601&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;`Method not found: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;method&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things people get wrong here, and they all live in this function:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;initialize&lt;/code&gt; must echo a &lt;code&gt;protocolVersion&lt;/code&gt;&lt;/strong&gt; the client understands and declare your &lt;code&gt;capabilities&lt;/code&gt;. Skip it and the handshake fails before any tool runs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool failures are not protocol errors.&lt;/strong&gt; A bad slug returns a normal result with &lt;code&gt;isError: true&lt;/code&gt; and a message — so the model reads the failure and retries — &lt;em&gt;not&lt;/em&gt; a JSON-RPC &lt;code&gt;error&lt;/code&gt;. Reserve &lt;code&gt;error&lt;/code&gt; (&lt;code&gt;-32601&lt;/code&gt;, &lt;code&gt;-32700&lt;/code&gt;, etc.) for malformed protocol.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notifications get no response.&lt;/strong&gt; If &lt;code&gt;notifications/initialized&lt;/code&gt; arrives, acknowledge with &lt;code&gt;202&lt;/code&gt; and an empty body. Returning a JSON-RPC object for a notification breaks strict clients.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Step 5 — Make it discoverable
&lt;/h2&gt;

&lt;p&gt;A server nobody can find is useless. Advertise it three ways:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A manifest&lt;/strong&gt; at &lt;code&gt;/.well-known/mcp&lt;/code&gt; — name, endpoint, transport, and the tool list.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;An entry in your API catalog&lt;/strong&gt; (&lt;code&gt;/.well-known/api-catalog&lt;/code&gt;, RFC 9727) pointing at the manifest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A &lt;code&gt;Link&lt;/code&gt; header&lt;/strong&gt; on your HTML responses: &lt;code&gt;Link: &amp;lt;/.well-known/mcp&amp;gt;; rel="service-desc"; type="application/json"&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Then point an MCP client straight at &lt;code&gt;https://yoursite.com/mcp&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Testing your MCP server
&lt;/h2&gt;

&lt;p&gt;You don't need a fancy client to test — &lt;code&gt;curl&lt;/code&gt; speaks JSON-RPC fine. List the tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://yoursite.com/mcp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"jsonrpc":"2.0","id":1,"method":"tools/list"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST https://yoursite.com/mcp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"jsonrpc":"2.0","id":2,"method":"tools/call",
       "params":{"name":"search_posts","arguments":{"query":"RAG","limit":3}}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Work through the lifecycle: &lt;code&gt;initialize&lt;/code&gt; → &lt;code&gt;tools/list&lt;/code&gt; → &lt;code&gt;tools/call&lt;/code&gt;, then confirm the edges — an invalid slug returns &lt;code&gt;isError: true&lt;/code&gt;, a notification returns &lt;code&gt;202&lt;/code&gt; with no body, an unknown method returns &lt;code&gt;-32601&lt;/code&gt;, and a &lt;code&gt;GET&lt;/code&gt; returns &lt;code&gt;405&lt;/code&gt;. If all of those behave, real clients will too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Treating tool errors as protocol errors.&lt;/strong&gt; The single most common bug. Use &lt;code&gt;isError: true&lt;/code&gt; in the result; keep JSON-RPC &lt;code&gt;error&lt;/code&gt; for malformed requests only.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Building stateful sessions you don't need.&lt;/strong&gt; A read-only server should be stateless. Sessions add complexity and a scaling headache for zero benefit here.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thin tool descriptions.&lt;/strong&gt; "Search" tells the model nothing. Say what it searches, what it returns, and when to reach for it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Duplicating your data.&lt;/strong&gt; Don't copy your content into the server. Point tools at what you already publish so there's nothing to keep in sync.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting CORS.&lt;/strong&gt; Browser-based MCP clients need it. Handle &lt;code&gt;OPTIONS&lt;/code&gt; and allow the &lt;code&gt;Mcp-Session-Id&lt;/code&gt; / &lt;code&gt;Mcp-Protocol-Version&lt;/code&gt; headers.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stateless first.&lt;/strong&gt; Reach for sessions only when a tool genuinely needs continuity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validate every argument.&lt;/strong&gt; Treat tool inputs like any untrusted input — schema plus a guard.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Write descriptions as prompts.&lt;/strong&gt; They're the only thing the model sees when deciding to call a tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reuse existing assets.&lt;/strong&gt; Your feed, your Markdown, your profile file — one source of truth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advertise it.&lt;/strong&gt; Manifest + API catalog + &lt;code&gt;Link&lt;/code&gt; header, so agents can find it without being told.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test the edges, not just the happy path.&lt;/strong&gt; Notifications, unknown methods, invalid inputs, wrong HTTP verb.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;An MCP server is less code than you expect — a JSON-RPC router, four well-described tools, and a thin layer over content you already ship. The mental shift is the real work: stop thinking of your site as pages to be read and start thinking of it as &lt;strong&gt;capabilities to be called&lt;/strong&gt;. That's the interface agents actually want.&lt;/p&gt;

&lt;p&gt;I built mine on a Cloudflare Worker in an afternoon, and it now sits alongside the rest of this site's agent-readiness as a first-class surface. If you've already got a JSON feed and Markdown pages, you're most of the way there.&lt;/p&gt;

&lt;p&gt;If this was useful, go deeper next: see how the pieces fit together across &lt;a href="https://umesh-malik.com/topics/llm-engineering" rel="noopener noreferrer"&gt;LLM Engineering — RAG, Fine-Tuning &amp;amp; Production LLMs&lt;/a&gt; and &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents — Agentic AI for Developers&lt;/a&gt;, or read the &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;official MCP specification&lt;/a&gt; for the full protocol.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explore more:&lt;/strong&gt; &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/llm-engineering" rel="noopener noreferrer"&gt;LLM Engineering&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/deploy-mcp-server-cloudflare-workers" rel="noopener noreferrer"&gt;Deploy an MCP Server on Cloudflare Workers (Free, Stateless, at the Edge)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/autonomous-ai-agents-production-gap-2026" rel="noopener noreferrer"&gt;AI Agents That Run the Business in 2026: Why 77% Never Reach Production (and What the 23% Do Differently)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/agentic-browsing-pagespeed-ai-ready" rel="noopener noreferrer"&gt;Agentic Browsing in PageSpeed Insights: How to Make Your Website AI-Ready (2026)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>mcp</category>
      <category>modelcontextprotocol</category>
      <category>aiagents</category>
      <category>cloudflareworkers</category>
    </item>
    <item>
      <title>Deploy an MCP Server on Cloudflare Workers (Free, Stateless, at the Edge)</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:23:32 +0000</pubDate>
      <link>https://dev.to/umesh_malik/deploy-an-mcp-server-on-cloudflare-workers-free-stateless-at-the-edge-31d6</link>
      <guid>https://dev.to/umesh_malik/deploy-an-mcp-server-on-cloudflare-workers-free-stateless-at-the-edge-31d6</guid>
      <description>&lt;p&gt;You built an MCP server — a JSON-RPC handler with a few well-described tools. Now it has to live somewhere an agent can reach it, 24/7, without you babysitting a server. &lt;strong&gt;Cloudflare Workers is close to the ideal host for this&lt;/strong&gt;, and most of the reasons come down to one property of a read-only MCP server: it's stateless.&lt;/p&gt;

&lt;p&gt;This is the deployment half of the story. If you haven't written the server logic yet, start with &lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;How to Build a Production MCP Server&lt;/a&gt; — this post picks up where that one ends and gets it onto the edge, on the free tier, on your own domain.&lt;/p&gt;

&lt;p&gt;I run exactly this setup for my own site. Here's the whole thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;A read-only MCP server is &lt;strong&gt;stateless&lt;/strong&gt;, which is precisely what edge runtimes do best — so Workers is a natural fit, not a compromise.&lt;/li&gt;
&lt;li&gt;The entire deploy is a &lt;strong&gt;&lt;code&gt;wrangler.toml&lt;/code&gt;&lt;/strong&gt;, one &lt;code&gt;wrangler deploy&lt;/code&gt;, and a route check for &lt;code&gt;/mcp&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;run_worker_first = true&lt;/code&gt;&lt;/strong&gt; is the setting people miss — it lets your Worker intercept &lt;code&gt;/mcp&lt;/code&gt; before the static-assets binding serves a file.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wrangler needs Node.js 22+.&lt;/strong&gt; This is the single most common "it works in CI but not on my machine" gotcha.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;free tier (100k requests/day)&lt;/strong&gt; comfortably covers a personal or documentation MCP server.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why Workers is the right host
&lt;/h2&gt;

&lt;p&gt;The defining trait of a read-only MCP server — one whose tools only &lt;em&gt;fetch&lt;/em&gt; data — is that it holds no state between requests. Every &lt;code&gt;tools/call&lt;/code&gt; is self-contained. That single fact knocks out the usual reasons you'd reach for a long-lived Node process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No session store&lt;/strong&gt;, so nothing to persist between requests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No warm-up&lt;/strong&gt;, so cold starts don't hurt — there's no database connection pool to spin up.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Embarrassingly parallel&lt;/strong&gt;, so horizontal scaling is automatic.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stateless request/response at global scale &lt;em&gt;is&lt;/em&gt; the edge-function sweet spot. Add the practical wins — runs in 300+ locations near your users, scales to zero when idle, and the free tier handles &lt;strong&gt;100,000 requests/day&lt;/strong&gt; — and Workers stops being a creative choice and becomes the obvious one.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: Don't add a database or sessions to an MCP server that only reads. Statelessness isn't a limitation here — it's the feature that makes edge hosting trivial.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The whole config: wrangler.toml
&lt;/h2&gt;

&lt;p&gt;Here's the real &lt;code&gt;wrangler.toml&lt;/code&gt; running my server. It does three jobs: point at the Worker, bind the static assets, and run the Worker first.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"my-site"&lt;/span&gt;
&lt;span class="py"&gt;compatibility_date&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2024-01-01"&lt;/span&gt;
&lt;span class="py"&gt;main&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"worker/index.ts"&lt;/span&gt;

&lt;span class="nn"&gt;[assets]&lt;/span&gt;
&lt;span class="py"&gt;directory&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"./build"&lt;/span&gt;
&lt;span class="py"&gt;binding&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"ASSETS"&lt;/span&gt;
&lt;span class="c"&gt;# Run the Worker before serving static assets so our routes (like /mcp)&lt;/span&gt;
&lt;span class="c"&gt;# are intercepted before the assets binding can short-circuit them.&lt;/span&gt;
&lt;span class="py"&gt;run_worker_first&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the core of it. &lt;code&gt;main&lt;/code&gt; is your Worker entry. The &lt;code&gt;[assets]&lt;/code&gt; block lets the same Worker also serve a static site from &lt;code&gt;./build&lt;/code&gt; — handy if, like me, your MCP server lives alongside a real website. If your server is standalone, you can drop the assets block entirely.&lt;/p&gt;

&lt;h2&gt;
  
  
  The setting everyone misses: run_worker_first
&lt;/h2&gt;

&lt;p&gt;When you attach a static-assets binding, Cloudflare's default is to &lt;strong&gt;check for a matching file first&lt;/strong&gt; and only fall through to your Worker if there's no file. That's great for a plain static site — and quietly broken for an API route.&lt;/p&gt;

&lt;p&gt;Without &lt;code&gt;run_worker_first = true&lt;/code&gt;, a request to &lt;code&gt;/mcp&lt;/code&gt; can get intercepted by the assets layer before your Worker ever sees it. Set it to &lt;code&gt;true&lt;/code&gt; and the order flips: &lt;strong&gt;your Worker runs first&lt;/strong&gt;, handles &lt;code&gt;/mcp&lt;/code&gt;, and explicitly serves static files for everything else via &lt;code&gt;env.ASSETS.fetch()&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you ever see your MCP endpoint returning a 404 or an HTML page instead of JSON-RPC, this flag is the first thing to check.&lt;/p&gt;

&lt;h2&gt;
  
  
  Routing the endpoint
&lt;/h2&gt;

&lt;p&gt;With the Worker running first, routing is a path check at the top of &lt;code&gt;fetch&lt;/code&gt;, before the asset fallback:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="na"&gt;request&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;env&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;Env&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// MCP endpoints — handled before anything else&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/mcp&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;handleMcp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ASSETS&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;pathname&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/.well-known/mcp/server-card.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;mcpServerCard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// Everything else: serve the static site&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ASSETS&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;request&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="nx"&gt;satisfies&lt;/span&gt; &lt;span class="nx"&gt;ExportedHandler&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice &lt;code&gt;handleMcp&lt;/code&gt; receives &lt;code&gt;env.ASSETS&lt;/code&gt;. That's deliberate: my tools are backed by files the site already publishes (a JSON feed, Markdown pages), and the Worker reads them through the same assets binding. &lt;strong&gt;One source of truth, zero duplicated data&lt;/strong&gt; — the deployment story and the data story are the same story.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// inside a tool: read an asset the site already serves&lt;/span&gt;
&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;assets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;URL&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;/feed.json&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;origin&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Local development
&lt;/h2&gt;

&lt;p&gt;Test before you ship. &lt;code&gt;wrangler dev&lt;/code&gt; runs the Worker and serves the static assets locally:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx wrangler dev
&lt;span class="c"&gt;# Ready on http://localhost:8787&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then exercise it with &lt;code&gt;curl&lt;/code&gt; — no special client needed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST http://localhost:8787/mcp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"jsonrpc":"2.0","id":1,"method":"tools/list"}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;⚠️ &lt;strong&gt;The Node version gotcha&lt;/strong&gt;: recent Wrangler (v4+) &lt;strong&gt;requires Node.js 22 or newer&lt;/strong&gt;. If &lt;code&gt;wrangler dev&lt;/code&gt; or &lt;code&gt;wrangler deploy&lt;/code&gt; errors with a version complaint, you're on an older Node. Switch with &lt;code&gt;nvm use 22&lt;/code&gt; (or &lt;code&gt;fnm&lt;/code&gt;). This is the number-one reason a deploy works in CI but fails locally.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Going live
&lt;/h2&gt;

&lt;p&gt;Two ways, pick one:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Manual deploy&lt;/strong&gt; — one command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx wrangler deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It bundles the Worker (esbuild, no config needed), uploads your &lt;code&gt;./build&lt;/code&gt; assets, and your server is live globally in seconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Git integration (what I use)&lt;/strong&gt; — connect the repo in the Cloudflare dashboard and every push to &lt;code&gt;main&lt;/code&gt; builds and deploys automatically. The build command runs your site build, and the Worker deploys alongside it. After that, publishing is just &lt;code&gt;git push&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Either way, your MCP endpoint is live at &lt;code&gt;https://yourdomain.com/mcp&lt;/code&gt; — on your own domain, because the Worker is serving that domain. No separate subdomain, no extra DNS.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Forgetting &lt;code&gt;run_worker_first&lt;/code&gt;.&lt;/strong&gt; Your &lt;code&gt;/mcp&lt;/code&gt; route returns HTML or 404 because the assets binding ate the request. The fix is one line.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Running an old Node.&lt;/strong&gt; Wrangler v4 needs Node 22+. The error is clear once you read it, but easy to miss in CI logs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adding state you don't need.&lt;/strong&gt; Durable Objects and KV are great tools — and overkill for a read-only server. Stay stateless until a tool genuinely requires continuity.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not handling &lt;code&gt;OPTIONS&lt;/code&gt;/CORS.&lt;/strong&gt; Browser-based MCP clients send a preflight. Return CORS headers and handle &lt;code&gt;OPTIONS&lt;/code&gt;, or those clients silently fail.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardcoding the origin.&lt;/strong&gt; Build asset URLs from the incoming request's origin so the same code works on &lt;code&gt;localhost&lt;/code&gt;, preview deploys, and production.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Stay stateless.&lt;/strong&gt; It's the whole reason Workers fits. Earn your way into KV/Durable Objects only when a tool needs memory.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Reuse the assets binding for data.&lt;/strong&gt; If your server sits alongside a site, read the files it already publishes instead of duplicating content.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cache where you can.&lt;/strong&gt; Read-only tool data is cacheable — set &lt;code&gt;Cache-Control&lt;/code&gt; on responses backed by static assets.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pin your Node version.&lt;/strong&gt; Document Node 22+ in your README and CI so "works on my machine" stays true everywhere.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test the lifecycle locally.&lt;/strong&gt; &lt;code&gt;initialize&lt;/code&gt; → &lt;code&gt;tools/list&lt;/code&gt; → &lt;code&gt;tools/call&lt;/code&gt;, plus the edges, against &lt;code&gt;wrangler dev&lt;/code&gt; before every deploy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use your own domain.&lt;/strong&gt; Serving &lt;code&gt;/mcp&lt;/code&gt; from your primary domain is a stronger trust and discovery signal than a throwaway subdomain.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Hosting an MCP server sounds like infrastructure work and turns out to be a config file. The reason it's that easy is the reason worth internalizing: &lt;strong&gt;a read-only MCP server is stateless, and stateless request/response at global scale is exactly what the edge is for.&lt;/strong&gt; &lt;code&gt;wrangler.toml&lt;/code&gt;, &lt;code&gt;run_worker_first&lt;/code&gt;, one deploy, your own domain. That's it.&lt;/p&gt;

&lt;p&gt;Build the server logic in &lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;How to Build a Production MCP Server&lt;/a&gt;, then ship it with this. For where MCP fits in the bigger agent picture, see &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents — Agentic AI for Developers&lt;/a&gt; and &lt;a href="https://umesh-malik.com/topics/llm-engineering" rel="noopener noreferrer"&gt;LLM Engineering&lt;/a&gt;, or read the &lt;a href="https://developers.cloudflare.com/workers/" rel="noopener noreferrer"&gt;Cloudflare Workers docs&lt;/a&gt; for the platform details.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explore more:&lt;/strong&gt; &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/llm-engineering" rel="noopener noreferrer"&gt;LLM Engineering&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/deploy-mcp-server-cloudflare-workers" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;How to Build a Production MCP Server (I Added One to My Site)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/agentic-browsing-pagespeed-ai-ready" rel="noopener noreferrer"&gt;Agentic Browsing in PageSpeed Insights: How to Make Your Website AI-Ready (2026)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/autonomous-ai-agents-production-gap-2026" rel="noopener noreferrer"&gt;AI Agents That Run the Business in 2026: Why 77% Never Reach Production (and What the 23% Do Differently)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>mcp</category>
      <category>modelcontextprotocol</category>
      <category>cloudflareworkers</category>
      <category>wrangler</category>
    </item>
    <item>
      <title>Cursor vs Claude Code vs Copilot (2026): Which AI Coding Tool, for What</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:23:00 +0000</pubDate>
      <link>https://dev.to/umesh_malik/cursor-vs-claude-code-vs-copilot-2026-which-ai-coding-tool-for-what-i1b</link>
      <guid>https://dev.to/umesh_malik/cursor-vs-claude-code-vs-copilot-2026-which-ai-coding-tool-for-what-i1b</guid>
      <description>&lt;p&gt;The "best AI coding tool" question is the wrong question. Cursor, Claude Code, and GitHub Copilot aren't three versions of the same thing competing on quality — they're three different &lt;em&gt;interaction models&lt;/em&gt;, and the right one depends entirely on what you're doing. Pick by the shape of the work, not the leaderboard.&lt;/p&gt;

&lt;p&gt;I use all three, daily, for different jobs. Here's how they actually differ and how to choose — without the marketing.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub Copilot&lt;/strong&gt; — in-editor assistant. Best for fast autocomplete and lightweight chat, lowest friction, lowest price.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; — AI-first editor. Best when you want agentic multi-file edits &lt;em&gt;and&lt;/em&gt; a polished IDE with inline diffs and tab-completion.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; — terminal agent. Best for autonomous, multi-step tasks across a whole repo, plus scripting and CI.&lt;/li&gt;
&lt;li&gt;The real axis is &lt;strong&gt;autonomy&lt;/strong&gt;: Copilot accelerates your typing; Claude Code does the task. Cursor sits in between with an editor wrapped around it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;They're not exclusive.&lt;/strong&gt; The strongest setup often runs an in-editor tool &lt;em&gt;and&lt;/em&gt; a terminal agent.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The one distinction that matters: autonomy
&lt;/h2&gt;

&lt;p&gt;Forget feature checklists for a second. The axis that actually separates these tools is &lt;strong&gt;how much work they do on their own&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Copilot&lt;/strong&gt; completes the line or block you're typing and answers questions in a side panel. &lt;em&gt;You&lt;/em&gt; are driving every keystroke; it predicts the next one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cursor&lt;/strong&gt; does that too, but adds an agent that can edit multiple files from a single instruction, with diffs you approve in the editor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; takes a goal — "add auth to these endpoints," "migrate this module," "find and fix the failing test" — and plans, edits, runs commands, and iterates across the repo until it's done.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: Copilot makes &lt;em&gt;you&lt;/em&gt; faster. Claude Code does the task &lt;em&gt;for&lt;/em&gt; you. Cursor lets you slide between the two in one window. That's the whole comparison in one sentence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  GitHub Copilot
&lt;/h2&gt;

&lt;p&gt;The original, and still the lowest-friction. It lives inside VS Code (and other editors) as inline completions plus a chat panel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Frictionless autocomplete.&lt;/strong&gt; Best-in-class "finish my line/block" flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Deep VS Code + GitHub integration.&lt;/strong&gt; It's right there, no context switch.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cheapest&lt;/strong&gt; of the three, and the easiest to adopt on a team.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limits&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It's an &lt;strong&gt;assistant, not an agent.&lt;/strong&gt; Multi-file, multi-step autonomous work isn't its core model (even as it adds more agentic features).&lt;/li&gt;
&lt;li&gt;Output is scoped to what you're editing; it reasons less about the whole repo than a dedicated agent does.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; you want speed-of-typing gains with zero workflow change.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cursor
&lt;/h2&gt;

&lt;p&gt;An AI-first fork of VS Code. You get the familiar editor, plus tab-completion, chat, and an agent mode that edits across files with inline diffs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Best of both modes&lt;/strong&gt; in one place — completion &lt;em&gt;and&lt;/em&gt; multi-file agent edits, with a real editor UI.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inline diffs and approval&lt;/strong&gt; make agent edits easy to review without leaving the IDE.&lt;/li&gt;
&lt;li&gt;Strong codebase-aware context and a polished UX.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limits&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;It's &lt;strong&gt;another editor.&lt;/strong&gt; If you're committed to your current setup (Neovim, JetBrains, plain VS Code), switching is a real cost.&lt;/li&gt;
&lt;li&gt;Heavier and more opinionated than a completion plugin.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; you want agentic editing but you live in a GUI editor and want diffs and tab-completion in the same window.&lt;/p&gt;

&lt;h2&gt;
  
  
  Claude Code
&lt;/h2&gt;

&lt;p&gt;A terminal-based coding agent. You give it a goal; it explores the repo, makes a plan, edits files, runs commands and tests, and iterates — and it's scriptable, so it drops into CI and automation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Strengths&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Genuine autonomy&lt;/strong&gt; on multi-step, repo-wide tasks: refactors, migrations, "make the tests pass," cross-cutting changes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Editor-agnostic and scriptable&lt;/strong&gt; — it's a CLI, so it works with any editor and runs headless in pipelines.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Whole-repo reasoning&lt;/strong&gt;, guided by a &lt;a href="https://umesh-malik.com/blog/how-to-write-claude-md" rel="noopener noreferrer"&gt;CLAUDE.md&lt;/a&gt; that teaches it your project's commands, architecture, and conventions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Limits&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Terminal-first.&lt;/strong&gt; No inline editor diffs by default; you review changes as a diff in the terminal or your git client.&lt;/li&gt;
&lt;li&gt;The autonomy that makes it powerful also means you should scope tasks well and review output — it does a lot per step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use it when:&lt;/strong&gt; the unit of work is a &lt;em&gt;task&lt;/em&gt;, not a keystroke — and especially for large or repetitive changes you'd rather delegate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Side by side
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;GitHub Copilot&lt;/th&gt;
&lt;th&gt;Cursor&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Form factor&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Editor plugin&lt;/td&gt;
&lt;td&gt;AI-first editor&lt;/td&gt;
&lt;td&gt;Terminal agent (CLI)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interaction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Completions + chat&lt;/td&gt;
&lt;td&gt;Completions + chat + agent&lt;/td&gt;
&lt;td&gt;Goal → autonomous execution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Autonomy&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low (assist)&lt;/td&gt;
&lt;td&gt;Medium (agent in editor)&lt;/td&gt;
&lt;td&gt;High (multi-step agent)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Repo-wide reasoning&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Limited&lt;/td&gt;
&lt;td&gt;Good&lt;/td&gt;
&lt;td&gt;Strong&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Editor lock-in&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None (plugin)&lt;/td&gt;
&lt;td&gt;Yes (its own editor)&lt;/td&gt;
&lt;td&gt;None (any editor)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Scriptable / CI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;No&lt;/td&gt;
&lt;td&gt;Yes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best at&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fast autocomplete&lt;/td&gt;
&lt;td&gt;Agentic edits + IDE UX&lt;/td&gt;
&lt;td&gt;Autonomous tasks &amp;amp; automation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  How to actually choose
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;You want minimal change and faster typing&lt;/strong&gt; → Copilot.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want agent power but love a GUI editor with diffs&lt;/strong&gt; → Cursor.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You want to delegate whole tasks, work editor-agnostic, or automate in CI&lt;/strong&gt; → Claude Code.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You're a power user&lt;/strong&gt; → run an in-editor tool for flow &lt;em&gt;and&lt;/em&gt; Claude Code in the terminal for the heavy lifting. That combination beats any single tool.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Judging them on "which has the best model."&lt;/strong&gt; They all use strong frontier models; the &lt;em&gt;interaction model&lt;/em&gt; differentiates them far more than raw model quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Expecting Copilot to behave like an agent.&lt;/strong&gt; Different tool for a different job — don't fault a completion engine for not doing migrations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Refusing to combine them.&lt;/strong&gt; Treating it as a single-winner choice leaves value on the table; the tools compose.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Skipping setup on the agentic tools.&lt;/strong&gt; Cursor's rules and Claude Code's CLAUDE.md are what make their agents good — unconfigured, they underperform.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;There's no single winner because they aren't playing the same game. Copilot accelerates your typing, Cursor wraps an agent in a polished editor, and Claude Code autonomously executes whole tasks across your repo. Choose by the shape of the work — and if you do a lot of different work, use more than one.&lt;/p&gt;

&lt;p&gt;Going deeper on agentic coding? See &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents — Agentic AI for Developers&lt;/a&gt; and &lt;a href="https://umesh-malik.com/topics/claude-code" rel="noopener noreferrer"&gt;Claude Code — Guides &amp;amp; Deep Dives&lt;/a&gt;, and if you adopt Claude Code, start with &lt;a href="https://umesh-malik.com/blog/how-to-write-claude-md" rel="noopener noreferrer"&gt;how to write a CLAUDE.md that actually helps&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explore more:&lt;/strong&gt; &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/llm-engineering" rel="noopener noreferrer"&gt;LLM Engineering&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/cursor-vs-claude-code-vs-copilot" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/how-to-write-claude-md" rel="noopener noreferrer"&gt;How to Write a CLAUDE.md That Actually Helps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/claude-fable-5-streaming-microservice-one-day" rel="noopener noreferrer"&gt;How I Built a Full Audio/Video Streaming Microservice in One Day with Claude Fable 5 Auto Mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/claude-swap-multi-account-switcher-guide" rel="noopener noreferrer"&gt;How to Switch Between Multiple Claude Code Accounts Without Re-Logging In (claude-swap Guide)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>aicodingagents</category>
      <category>claudecode</category>
      <category>cursor</category>
      <category>githubcopilot</category>
    </item>
    <item>
      <title>How to Switch Between Multiple Claude Code Accounts Without Re-Logging In (claude-swap Guide)</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:22:26 +0000</pubDate>
      <link>https://dev.to/umesh_malik/how-to-switch-between-multiple-claude-code-accounts-without-re-logging-in-claude-swap-guide-34gd</link>
      <guid>https://dev.to/umesh_malik/how-to-switch-between-multiple-claude-code-accounts-without-re-logging-in-claude-swap-guide-34gd</guid>
      <description>&lt;p&gt;If you use Claude Code seriously, you have hit the wall: a usage limit pops up mid-task, and the only "official" way to keep going is to &lt;code&gt;/logout&lt;/code&gt;, open a browser, run the OAuth dance again, and pray your session state survives. Do that three times a day and it stops being a minor annoyance — it becomes a tax on your focus.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://github.com/realiti4/claude-swap" rel="noopener noreferrer"&gt;claude-swap&lt;/a&gt;&lt;/strong&gt; is the open-source CLI that deletes that tax. The short answer is simple: it backs up the OAuth credentials for each of your Claude accounts and swaps them in and out of Claude Code's credential store on demand. Switching accounts goes from a 60-second browser ritual to a single command — &lt;code&gt;cswap --switch&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you searched for &lt;strong&gt;how to switch Claude Code accounts&lt;/strong&gt;, &lt;strong&gt;using multiple Claude accounts&lt;/strong&gt;, or &lt;strong&gt;getting past Claude Code rate limits without logging out&lt;/strong&gt;, this is the practical teardown: what it is, why it exists, how it actually works under the hood, how to use it, and — just as importantly — where it falls short.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;claude-swap (CLI: &lt;code&gt;cswap&lt;/code&gt;)&lt;/strong&gt; lets you keep multiple Claude accounts registered and switch the active one in seconds — no logout, no repeated browser OAuth.&lt;/li&gt;
&lt;li&gt;It works by &lt;strong&gt;backing up and restoring Claude Code's OAuth tokens&lt;/strong&gt; plus the &lt;code&gt;oauthAccount&lt;/code&gt; block in &lt;code&gt;~/.claude/.claude.json&lt;/code&gt;, using OS-native secure storage (macOS Keychain, Windows Credential Manager) where available.&lt;/li&gt;
&lt;li&gt;The killer use case is &lt;strong&gt;dodging usage limits&lt;/strong&gt;: when one account taps out, &lt;code&gt;cswap --switch&lt;/code&gt; rotates to the next so you keep working.&lt;/li&gt;
&lt;li&gt;It is a &lt;strong&gt;credential swap, not a live session multiplexer&lt;/strong&gt; — you must restart Claude Code (or the VS Code extension tab) after switching for it to pick up the new tokens.&lt;/li&gt;
&lt;li&gt;It is &lt;strong&gt;unofficial&lt;/strong&gt;, not an Anthropic product. Treat account-juggling as a personal-productivity tool and stay aware of your plan's terms.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Is claude-swap?
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: claude-swap doesn't run multiple Claude sessions at once. It stores the credentials for several accounts and hot-swaps which one Claude Code sees as "logged in."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;claude-swap&lt;/strong&gt; is a multi-account switcher for &lt;a href="https://www.anthropic.com/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;. You register each account once, and from then on you can rotate between them with a single command instead of logging out and back in through the browser. It works with both the Claude Code CLI and the official VS Code extension, because both read from the same credential store on your machine.&lt;/p&gt;

&lt;p&gt;Under the surface it does exactly one clever thing well: it treats your &lt;strong&gt;OAuth tokens as a swappable asset&lt;/strong&gt;. Claude Code keeps the active account's credentials in one place; claude-swap keeps a backup of &lt;em&gt;every&lt;/em&gt; account's credentials in its own vault, and "switching" simply means copying the right backup back into the live store.&lt;/p&gt;

&lt;p&gt;The tool installs as a Python package and exposes a &lt;code&gt;cswap&lt;/code&gt; command. It's MIT-licensed, actively released (27+ releases by mid-2026), and built by the community — not Anthropic.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why It Exists: The Problem It Solves
&lt;/h2&gt;

&lt;p&gt;Claude Code's value is its flow. You get into a loop with the agent, it's editing files and running tests, and then — limit reached. On Pro and Max plans there are rolling session and weekly caps, and the moment you hit one, the native fix is brutal to your momentum:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;/logout&lt;/code&gt; out of the current account.&lt;/li&gt;
&lt;li&gt;Re-authenticate the second account through a browser OAuth redirect.&lt;/li&gt;
&lt;li&gt;Re-establish your working context and hope nothing got lost.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;People run into this constantly because &lt;strong&gt;having more than one Claude account is normal now&lt;/strong&gt;, not exotic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A &lt;strong&gt;personal&lt;/strong&gt; account and a separate &lt;strong&gt;work or client-billed&lt;/strong&gt; account.&lt;/li&gt;
&lt;li&gt;Multiple &lt;strong&gt;Max subscriptions&lt;/strong&gt; specifically to extend daily working hours.&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;team setup&lt;/strong&gt; where different accounts map to different billing buckets.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;claude-swap exists because the &lt;em&gt;credentials&lt;/em&gt; for all those accounts are just files and keychain entries. There's no technical reason switching should require a browser round-trip — so the tool removes it. You pay the OAuth cost &lt;strong&gt;once per account&lt;/strong&gt;, and every switch after that is instant.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;other&lt;/em&gt; escape hatch when you're genuinely out of cloud capacity is to stop depending on the cloud at all and &lt;a href="https://umesh-malik.com/blog/local-llm-coding-revolution-qwen3-coder-desktop" rel="noopener noreferrer"&gt;run a capable coding model locally&lt;/a&gt;. But if you're committed to Claude — and most of us are, for good reason — claude-swap is the pragmatic fix that keeps you in the tool you already like.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;What this is — and isn't&lt;/strong&gt;&lt;br&gt;
claude-swap is a convenience layer over accounts you already own. It is not an Anthropic product, and it does not conjure "free" capacity — every account keeps its own limits and billing. It just removes the browser round-trip between accounts that are yours to begin with.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How claude-swap Works (Under the Hood)
&lt;/h2&gt;

&lt;p&gt;This is where it gets interesting, because the design is refreshingly simple. Claude Code reads its credentials from one location at startup. claude-swap intercepts the &lt;em&gt;backup and restore&lt;/em&gt; of that location.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmctmvmebkhp59ejt772.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmmctmvmebkhp59ejt772.png" alt="Diagram showing claude-swap's switch flow: add account, back up tokens to a vault, run cswap switch, restore credentials and config to the live store, then restart Claude Code" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Where the credentials live
&lt;/h3&gt;

&lt;p&gt;The live store is platform-specific, and claude-swap respects each platform's conventions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1u1l3yyo84z4lh2r7nc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn1u1l3yyo84z4lh2r7nc.png" alt="Diagram showing where Claude Code credentials live per platform — macOS Keychain, Linux/WSL plaintext file, Windows Credential Manager — and the claude-swap backup vault holding per-account slots" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;macOS&lt;/strong&gt; — the system &lt;strong&gt;Keychain&lt;/strong&gt;, under the service name &lt;code&gt;Claude Code-credentials&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Linux / WSL&lt;/strong&gt; — a plaintext file at &lt;code&gt;~/.claude/.credentials.json&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Windows&lt;/strong&gt; — the &lt;strong&gt;Credential Manager&lt;/strong&gt;, plus the &lt;code&gt;oauthAccount&lt;/code&gt; section of &lt;code&gt;~/.claude/.claude.json&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;claude-swap keeps its own backups in a vault directory (&lt;code&gt;~/.local/share/claude-swap/&lt;/code&gt; on Linux, &lt;code&gt;~/.claude-swap-backup/&lt;/code&gt; elsewhere), with one slot per account.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;claude-swap/
├── credentials/
│   ├── .creds-1-you@personal.com.enc
│   └── .creds-2-you@work.com.enc
├── configs/
│   ├── .claude-config-1-you@personal.com.json
│   └── .claude-config-2-you@work.com.json
└── sequence.json        # tracks slots + rotation order
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The switch lifecycle
&lt;/h3&gt;

&lt;p&gt;That rollback behavior is the part most "swap a config file" hacks get wrong. claude-swap treats the swap as a transaction with recorded steps (&lt;code&gt;credentials_written&lt;/code&gt;, &lt;code&gt;config_written&lt;/code&gt;, &lt;code&gt;sequence_updated&lt;/code&gt;), so a failure halfway through reverts cleanly instead of stranding you.&lt;/p&gt;

&lt;p&gt;The one thing it &lt;em&gt;can't&lt;/em&gt; avoid: &lt;strong&gt;Claude Code only reads credentials at startup.&lt;/strong&gt; After a switch, you have to restart the CLI or close and reopen the VS Code extension tab. It's the small price for the fact that nothing has to be patched live.&lt;/p&gt;

&lt;h2&gt;
  
  
  How To Use claude-swap
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Installation
&lt;/h3&gt;

&lt;p&gt;The recommended path is &lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv&lt;/a&gt;, but &lt;code&gt;pipx&lt;/code&gt; works just as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Recommended&lt;/span&gt;
uv tool &lt;span class="nb"&gt;install &lt;/span&gt;claude-swap

&lt;span class="c"&gt;# Or with pipx&lt;/span&gt;
pipx &lt;span class="nb"&gt;install &lt;/span&gt;claude-swap

&lt;span class="c"&gt;# From source&lt;/span&gt;
git clone https://github.com/realiti4/claude-swap
&lt;span class="nb"&gt;cd &lt;/span&gt;claude-swap
uv &lt;span class="nb"&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upgrades are equally boring (the good kind):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv tool upgrade claude-swap      &lt;span class="c"&gt;# or: pipx upgrade claude-swap&lt;/span&gt;
cswap &lt;span class="nt"&gt;--upgrade&lt;/span&gt;                  &lt;span class="c"&gt;# self-update from PyPI&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Register your accounts
&lt;/h3&gt;

&lt;p&gt;You add accounts one at a time. Log into the account you want to capture &lt;strong&gt;first&lt;/strong&gt; (via normal Claude Code login), then register it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Log into account A in Claude Code, then:&lt;/span&gt;
cswap &lt;span class="nt"&gt;--add-account&lt;/span&gt;

&lt;span class="c"&gt;# Switch the Claude Code login to account B, then:&lt;/span&gt;
cswap &lt;span class="nt"&gt;--add-account&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each &lt;code&gt;--add-account&lt;/code&gt; captures whoever is &lt;em&gt;currently&lt;/em&gt; logged in. When a token later expires, re-run &lt;code&gt;cswap --add-account&lt;/code&gt; for that account — it &lt;strong&gt;updates the existing slot&lt;/strong&gt; rather than creating a duplicate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Switch, list, and check status
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cswap &lt;span class="nt"&gt;--switch&lt;/span&gt;              &lt;span class="c"&gt;# rotate to the next account in sequence&lt;/span&gt;
cswap &lt;span class="nt"&gt;--switch-to&lt;/span&gt; 2         &lt;span class="c"&gt;# switch to a specific slot number...&lt;/span&gt;
cswap &lt;span class="nt"&gt;--switch-to&lt;/span&gt; you@work.com   &lt;span class="c"&gt;# ...or by email&lt;/span&gt;
cswap &lt;span class="nt"&gt;--list&lt;/span&gt;                &lt;span class="c"&gt;# show accounts, usage metrics, and reset times&lt;/span&gt;
cswap &lt;span class="nt"&gt;--status&lt;/span&gt;              &lt;span class="c"&gt;# show which account is active right now&lt;/span&gt;
cswap &lt;span class="nt"&gt;--tui&lt;/span&gt;                 &lt;span class="c"&gt;# interactive menu with keyboard navigation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After any switch, &lt;strong&gt;restart Claude Code or reopen the VS Code extension tab.&lt;/strong&gt; That's the whole loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Headless and CI: register by token
&lt;/h3&gt;

&lt;p&gt;On a server with no browser, you can't do interactive OAuth. claude-swap lets you register an account directly from a setup token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cswap &lt;span class="nt"&gt;--add-token&lt;/span&gt; sk-ant-oat01-...      &lt;span class="c"&gt;# or pipe via stdin with: --add-token -&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Move accounts between machines
&lt;/h3&gt;

&lt;p&gt;Export creates a portable &lt;code&gt;.cswap&lt;/code&gt; file; import restores it on another machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cswap &lt;span class="nt"&gt;--export&lt;/span&gt; backup.cswap             &lt;span class="c"&gt;# all accounts&lt;/span&gt;
cswap &lt;span class="nt"&gt;--export&lt;/span&gt; backup.cswap &lt;span class="nt"&gt;--account&lt;/span&gt; 2 &lt;span class="c"&gt;# just one&lt;/span&gt;
cswap &lt;span class="nt"&gt;--import&lt;/span&gt; backup.cswap             &lt;span class="c"&gt;# restore (use --force to overwrite)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Export files contain live credentials&lt;/strong&gt;&lt;br&gt;
A &lt;code&gt;.cswap&lt;/code&gt; export holds real OAuth tokens. Treat it like a password file — don't commit it, don't drop it in shared storage, and delete it once the migration is done.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Command reference
&lt;/h3&gt;

&lt;h2&gt;
  
  
  When To Use It (and When Not To)
&lt;/h2&gt;

&lt;h2&gt;
  
  
  What claude-swap Still Misses
&lt;/h2&gt;

&lt;p&gt;No hype here — this is a small, focused tool, and its gaps are real. If you're deciding whether to adopt it, weigh these honestly.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: The biggest missing feature is automation. claude-swap makes a switch &lt;em&gt;cheap&lt;/em&gt;, but you still have to &lt;em&gt;decide&lt;/em&gt; to switch. A future version that watches &lt;code&gt;--list&lt;/code&gt; usage data and rotates on its own would close the loop.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Best Practices
&lt;/h2&gt;

&lt;h2&gt;
  
  
  How It Compares To The Alternatives
&lt;/h2&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;claude-swap is the kind of tool that shouldn't need to exist — and that's exactly why it's good. Anthropic gives you a credential store and a login flow; claude-swap notices that "switching accounts" is really just "swap two files and a keychain entry," and turns a 60-second browser ritual into one command.&lt;/p&gt;

&lt;p&gt;It's not magic. It won't auto-rotate when you hit a limit, it makes you restart after every swap, and it centralizes real OAuth tokens on your machine — so it's a personal-productivity tool, not an enterprise credential platform. Know those edges and they won't surprise you.&lt;/p&gt;

&lt;p&gt;But if you live in Claude Code and bounce between accounts more than once a day, the math is obvious: &lt;strong&gt;pay the OAuth cost once per account, then never pay it again.&lt;/strong&gt; For a free, MIT-licensed CLI, that's a remarkably good trade.&lt;/p&gt;

&lt;p&gt;If you found this useful, read &lt;a href="https://umesh-malik.com/blog/anthropic-code-review-claude-code-guide" rel="noopener noreferrer"&gt;how Anthropic Code Review fits into Claude Code&lt;/a&gt; next — it's the clearest signal of where the rest of the Claude Code workflow is heading, beyond the account you happen to be logged into.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/realiti4/claude-swap" rel="noopener noreferrer"&gt;claude-swap on GitHub&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/claude-code" rel="noopener noreferrer"&gt;Anthropic: Claude Code&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://code.claude.com/docs" rel="noopener noreferrer"&gt;Claude Code Docs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.astral.sh/uv/" rel="noopener noreferrer"&gt;uv — Python package and project manager&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Written for &lt;a href="https://umesh-malik.com" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt; — no-fluff technical writing on AI, Web Dev, and Engineering.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/claude-swap-multi-account-switcher-guide" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/local-llm-coding-revolution-qwen3-coder-desktop" rel="noopener noreferrer"&gt;Run an 80B-Parameter LLM on Your Desktop With Zero Cloud Bills: Qwen3-Coder Deep Dive&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/agents-md-ai-coding-agents-study" rel="noopener noreferrer"&gt;AGENTS.md Files Don't Work the Way You Think — A 138-Repo Study Proves It&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/claude-fable-5-streaming-microservice-one-day" rel="noopener noreferrer"&gt;How I Built a Full Audio/Video Streaming Microservice in One Day with Claude Fable 5 Auto Mode&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>claudecode</category>
      <category>claude</category>
      <category>developertools</category>
      <category>cli</category>
    </item>
    <item>
      <title>Claude Code Leak 2026: What Escaped, What Stayed Locked, and the Copyright Irony No One Is Talking About</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:22:25 +0000</pubDate>
      <link>https://dev.to/umesh_malik/claude-code-leak-2026-what-escaped-what-stayed-locked-and-the-copyright-irony-no-one-is-talking-4jin</link>
      <guid>https://dev.to/umesh_malik/claude-code-leak-2026-what-escaped-what-stayed-locked-and-the-copyright-irony-no-one-is-talking-4jin</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9igcj2y24pz6wz1uio2l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9igcj2y24pz6wz1uio2l.png" alt="Claude Code source map leak visualization showing TypeScript code exposed through npm package 2.1.88" width="800" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Claude Code source-map leak on &lt;strong&gt;March 31, 2026&lt;/strong&gt; was not a Hollywood breach. It was a mundane packaging mistake that briefly put ~512k lines of TypeScript orchestration logic on the public internet. Within hours, the repo topped GitHub's trending charts and Anthropic fired off DMCA notices - accidentally hitting forks of their &lt;em&gt;own official repository&lt;/em&gt; in the process.&lt;/p&gt;

&lt;p&gt;The model weights and customer data never left the vault, but the architectural blueprint did. And the internet's reaction? Let's just say developers had &lt;em&gt;opinions&lt;/em&gt; about what they found inside.&lt;/p&gt;

&lt;h2&gt;
  
  
  The fast version (TL;DR)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;An npm publish of &lt;code&gt;@anthropic-ai/claude-code@2.1.88&lt;/code&gt; accidentally shipped a massive &lt;code&gt;cli.js.map&lt;/code&gt;, exposing the full CLI/agent orchestration codebase.&lt;/li&gt;
&lt;li&gt;The leak gives competitors architectural insight and reveals unreleased toggles (like always-on daemon mode, KAIROS flags, and a "buddy" Tamagotchi-like companion experiment). It does &lt;strong&gt;not&lt;/strong&gt; give anyone Claude's model weights, safety data, or hosted inference stack.&lt;/li&gt;
&lt;li&gt;Developers roasted the code quality online - calling it "vibe coded garbage" - but the $2.5B ARR product proves that product-market fit beats code polish.&lt;/li&gt;
&lt;li&gt;Anthropic yanked the bad package, issued DMCA notices that briefly overreached (hitting their own repos), and is rotating internal keys plus tightening pre-publish checks.&lt;/li&gt;
&lt;li&gt;Clean-room reimplementations in Python and Rust appeared within 48 hours, sparking debates about AI copyright that mirror the industry's own training data controversies.&lt;/li&gt;
&lt;li&gt;For teams: clear caches of 2.1.88, upgrade, document removal for audit, and avoid touching leaked repos to stay clear of copyright and CFAA trouble.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What actually leaked vs. what did not
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Leaked&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;TypeScript orchestration for Claude Code's CLI, tool adapters, agent lifecycle, and feature flags.&lt;/li&gt;
&lt;li&gt;Internal naming and roadmap hints (e.g., &lt;code&gt;KAIROS&lt;/code&gt;, &lt;code&gt;daemon&lt;/code&gt;, "buddy" Tamagotchi-like companion experiments).&lt;/li&gt;
&lt;li&gt;Safety-bypass affordances visible in code paths that handle prompt and tool execution order.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Not leaked&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Claude model weights, safety datasets, or training recipes.&lt;/li&gt;
&lt;li&gt;Production API keys or customer artifacts.&lt;/li&gt;
&lt;li&gt;Hosted inference stack and scaling primitives that make Claude Code performant in production.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Why that matters&lt;/strong&gt;&lt;br&gt;
The leak is closer to blueprint theft than product theft. You can study the architecture, but you cannot run Claude Code at parity without Anthropic's hosted models and alignment stack. That is why "not everything is available for public usage" is literally true: the brains and the serving muscle stayed private.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What developers found (and what they said about it)
&lt;/h2&gt;

&lt;p&gt;The code quality debate became almost as viral as the leak itself. Within hours of mirrors appearing, developers were dissecting the codebase and sharing their takes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Vibe coded garbage that's making $2.5B ARR. The state of software in 2026."&lt;/p&gt;

&lt;p&gt;"This is what happens when you ship fast and iterate. It works. The code does not have to be beautiful."&lt;/p&gt;

&lt;p&gt;"I've seen worse in production at Fortune 500s. At least this actually works."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The reactions split into two camps:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Camp 1: "This proves code quality does not matter"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The codebase appeared rapidly developed, with shortcuts and patterns that would not pass a traditional code review&lt;/li&gt;
&lt;li&gt;Yet Claude Code captured ~$2.5B in annualized recurring revenue in under a year&lt;/li&gt;
&lt;li&gt;The lesson: product-market fit and user experience trump architectural purity&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Camp 2: "This is exactly why AI-generated code is concerning"&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Critics argued the codebase reflected the output of AI-assisted development pushed too fast&lt;/li&gt;
&lt;li&gt;The leaked source showed patterns consistent with LLM-generated code that was accepted without thorough review&lt;/li&gt;
&lt;li&gt;The counter-argument: does it matter if it works and ships?&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The uncomfortable truth&lt;/strong&gt;&lt;br&gt;
The real competitive advantage was never the code. OpenAI's Codex and Google's Gemini CLI are already open source. Claude Code dominates because of the seamless integration between the harness and Anthropic's models - not because the TypeScript is elegant.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The DMCA chaos: when Anthropic accidentally took down their own repos
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7764jitsfn4ouwvxncqn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7764jitsfn4ouwvxncqn.png" alt="Flow of the Claude Code leak from npm publish to GitHub mirrors and remediation steps" width="800" height="427"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Anthropic's response was swift - perhaps too swift. According to TechCrunch, the company "took down thousands of GitHub repos trying to yank its leaked source code," which they later characterized as "an accident."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What went wrong:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic issued broad DMCA takedown requests targeting any repository containing Claude Code patterns&lt;/li&gt;
&lt;li&gt;The net caught forks of their &lt;em&gt;own official&lt;/em&gt; &lt;code&gt;github.com/anthropics/claude-code&lt;/code&gt; repository&lt;/li&gt;
&lt;li&gt;Legitimate open-source contributions, examples, and tutorials were temporarily nuked&lt;/li&gt;
&lt;li&gt;Developer backlash forced Anthropic to narrow the scope&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The scale:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Initial sweep: ~8,100 repositories flagged&lt;/li&gt;
&lt;li&gt;After correction: Focus narrowed to repos containing actual leaked source map content&lt;/li&gt;
&lt;li&gt;Collateral damage: Unknown number of legitimate projects temporarily affected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The irony:&lt;/strong&gt; Anthropic, a company that has been sued for training on copyrighted content, aggressively pursued copyright enforcement against developers who may have been doing nothing more than forking their public repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Timeline you can brief leadership with
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The copyright irony nobody wants to talk about
&lt;/h2&gt;

&lt;p&gt;Here is where the story gets uncomfortable. Within 48 hours of the leak, "clean-room implementations" of Claude Code started appearing - developers rewrote the functionality from scratch in Python and Rust, using the leaked code as a reference for architecture but not copying it directly.&lt;/p&gt;

&lt;p&gt;Their argument? The same one AI companies use to justify training on copyrighted content:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Using AI to rewrite content does not constitute derivative work. This is how learning works."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;The debate:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic has been sued for training on copyrighted books, articles, and code without permission&lt;/li&gt;
&lt;li&gt;Anthropic argues this is "transformative fair use" and "how learning works"&lt;/li&gt;
&lt;li&gt;Developers now use the same argument to justify clean-room reimplementations of Claude Code&lt;/li&gt;
&lt;li&gt;Critics call it "Anthropic getting a taste of their own medicine"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The legal reality:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Violating API ToS through fraudulent accounts is clearer legal ground than training data questions&lt;/li&gt;
&lt;li&gt;But the clean-room reimplementers are not using fraudulent accounts - they are rewriting from public observation&lt;/li&gt;
&lt;li&gt;The frameworks for both situations remain unsettled and actively litigated&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;The uncomfortable parallel&lt;/strong&gt;&lt;br&gt;
The AI industry built norms around training on internet content that favor their business models. Now they are upset when others apply similar logic to their outputs. Whether there is a meaningful legal distinction remains unclear - but the optics are hard to ignore.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  How the leak changes the game (even without weights)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Faster Claude-like clones&lt;/strong&gt; - Open-model teams can mirror the orchestration pattern with their own models, compressing their time-to-market for developer agents.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Better red-team playbooks&lt;/strong&gt; - Seeing how Claude Code sequences tools and guards prompts gives attackers a richer map for prompt-injection and tool-escape tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise procurement friction&lt;/strong&gt; - Security and legal teams will now ask for stronger SBOMs, pre-publish gates, and attestation from any agent toolchain vendor, not just Anthropic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Legal chill for builders&lt;/strong&gt; - Using the leaked code directly risks DMCA/CFAA exposure; clean-room reimplementation or open alternatives (e.g., bespoke SvelteKit/Vite agents) are safer paths.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Architectural commoditization&lt;/strong&gt; - The leak confirms that agent harnesses are largely interchangeable; the model is the moat.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  What to do if you run Claude Code (or ship agents like it)
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Purge and upgrade&lt;/strong&gt;: Delete caches and lockfiles pointing to &lt;code&gt;@anthropic-ai/claude-code@2.1.88&lt;/code&gt;; install the latest fixed release.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rotate anyway&lt;/strong&gt;: Even though no secrets leaked, rotate CLI tokens and workstation credentials as a hygiene move.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gate your own publishes&lt;/strong&gt;: Add CI checks that block source maps or unusually large artifacts from going to npm/registries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document removal&lt;/strong&gt;: Keep an audit trail (ticket + commit) noting removal of the leaked artifact to prove non-use in case of legal scrutiny.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monitor copycats&lt;/strong&gt;: Set GitHub/npm alerts for packages mimicking Claude Code behaviors; add detection rules for suspicious agent execution patterns.&lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Legal line to keep clear&lt;/strong&gt;&lt;br&gt;
Downloading or reusing the leaked repository is still copyright infringement. If you need to study the architecture, do it through reporting, decompiled snippets in news coverage, or by reconstructing patterns from your own builds - not by hosting the leaked zip.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Reader-friendly checklist: is this "free Claude Code"?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Can you run Claude locally now?&lt;/strong&gt; No. You still need Claude model weights and Anthropic's hosted inference; neither leaked.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Can you strip safeguards?&lt;/strong&gt; You can study how safeguards are wired, which helps red-teamers, but production Claude safety lives in weights + policies you do not have.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is there sensitive customer data?&lt;/strong&gt; Anthropic says no customer or key material was inside the source map.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Is Anthropic's reputation hurt?&lt;/strong&gt; Yes - supply-chain trust took a hit - but capability control remains intact.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h2&gt;
  
  
  The bigger picture
&lt;/h2&gt;

&lt;p&gt;This leak is a window into three truths the AI industry does not like to discuss:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Code quality is overrated&lt;/strong&gt; - A "vibe coded" codebase is powering one of the fastest-growing AI products in history. Product-market fit and user experience beat architectural elegance every time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The real moat is the model&lt;/strong&gt; - Claude Code's source is now public knowledge, but competitors cannot replicate the experience without Anthropic's models. The harness is commodity; the AI is the product.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Copyright norms cut both ways&lt;/strong&gt; - AI companies have spent years arguing that learning from copyrighted content is fair use. They cannot be surprised when others apply that logic to their outputs.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;The leak hands the world a blueprint, not a working product. If you are a builder, treat it as a reminder to harden your own release pipelines. If you are an enterprise buyer, update your SBOM and publishing checks. And if you are tempted to grab the code from a mirror - do not. The parts you want most never left Anthropic's servers.&lt;/p&gt;

&lt;p&gt;The official &lt;code&gt;github.com/anthropics/claude-code&lt;/code&gt; repository remains active with 104k stars and 16.4k forks. That is where the legitimate skills, tutorials, and examples live. Everything else is legal risk without the actual value.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Sources: &lt;a href="https://www.axios.com/2026/04/01/anthropic-claude-code-source-leak" rel="noopener noreferrer"&gt;Axios&lt;/a&gt; reporting on the March 31 leak, &lt;a href="https://techcrunch.com/2026/04/02/anthropic-dmca-github-repos-claude-code-leak" rel="noopener noreferrer"&gt;TechCrunch&lt;/a&gt; on the DMCA overreach, &lt;a href="https://build.ms/2026/04/01/claude-code-leak" rel="noopener noreferrer"&gt;build.ms analysis&lt;/a&gt; of code quality observations, GitHub trending data, and community discussions on Hacker News and Twitter/X.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/claude-code-leak-march-2026" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/ai-agent-attacks-developer-matplotlib-open-source" rel="noopener noreferrer"&gt;An AI Agent Got Rejected on GitHub, Then Published an Attack Post About the Maintainer — Full Story&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/how-to-write-claude-md" rel="noopener noreferrer"&gt;How to Write a CLAUDE.md That Actually Helps&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/claude-swap-multi-account-switcher-guide" rel="noopener noreferrer"&gt;How to Switch Between Multiple Claude Code Accounts Without Re-Logging In (claude-swap Guide)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>anthropic</category>
      <category>claudecode</category>
      <category>security</category>
      <category>opensource</category>
    </item>
    <item>
      <title>ChatGPT Now Teaches Math and Science With Interactive Visuals — What You Need to Know</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:21:54 +0000</pubDate>
      <link>https://dev.to/umesh_malik/chatgpt-now-teaches-math-and-science-with-interactive-visuals-what-you-need-to-know-2poj</link>
      <guid>https://dev.to/umesh_malik/chatgpt-now-teaches-math-and-science-with-interactive-visuals-what-you-need-to-know-2poj</guid>
      <description>&lt;p&gt;OpenAI launched &lt;strong&gt;interactive math and science visuals in ChatGPT on March 10, 2026&lt;/strong&gt;, and this is exactly the kind of AI update that normal people will care about immediately.&lt;/p&gt;

&lt;p&gt;Not because it wins another benchmark. Not because it adds a new model name. But because it changes one of the most common real-world uses of ChatGPT: &lt;strong&gt;trying to understand something hard&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The short answer is simple: &lt;strong&gt;ChatGPT is moving from "explaining a concept" to "letting you explore a concept."&lt;/strong&gt; Instead of only reading a paragraph about slope, pressure, or the Pythagorean theorem, users can now manipulate variables, see relationships update, and build intuition visually.&lt;/p&gt;

&lt;p&gt;That matters everywhere, but it is especially relevant in the &lt;strong&gt;United States&lt;/strong&gt;. Gallup found that &lt;strong&gt;60% of U.S. adults feel challenged by doing math&lt;/strong&gt;, and parents who feel better about math are far more confident helping their children with it. At the same time, OpenAI already has a &lt;strong&gt;free ChatGPT for Teachers plan for verified U.S. K-12 educators through June 2027&lt;/strong&gt;. So this is not just a product update. It is an education-distribution story with a clear U.S. audience from day one.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;OpenAI launched interactive math and science visuals in ChatGPT on March 10, 2026.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;OpenAI says &lt;strong&gt;more than 140 million people&lt;/strong&gt; already use ChatGPT for math and science each week.&lt;/li&gt;
&lt;li&gt;The feature starts with &lt;strong&gt;70+ core concepts&lt;/strong&gt; and lets users interact with ideas visually instead of only reading text explanations.&lt;/li&gt;
&lt;li&gt;OpenAI’s examples include topics like the &lt;strong&gt;Pythagorean theorem, slope-intercept form, ideal gas law, Charles' law, Hooke's law, kinetic energy, lens equation, compound interest, and exponential decay&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;As of &lt;strong&gt;March 12, 2026&lt;/strong&gt;, OpenAI says rollout is going to &lt;strong&gt;all logged-in ChatGPT users across plans&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;This is especially relevant in the &lt;strong&gt;U.S.&lt;/strong&gt;, where Gallup found &lt;strong&gt;60% of adults feel challenged by math&lt;/strong&gt; and OpenAI already offers &lt;strong&gt;ChatGPT for Teachers free through June 2027 for verified U.S. K-12 educators&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The biggest shift is conceptual: &lt;strong&gt;ChatGPT is becoming more exploratory and less purely answer-based&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The biggest limitation is also obvious: &lt;strong&gt;interactive visuals do not remove the need for teachers, verification, or real understanding&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchp0imcoy8vimti1ldzc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fchp0imcoy8vimti1ldzc.png" alt="ChatGPT interactive learning loop showing concept prompt, live visual exploration, variable changes, and deeper understanding" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What OpenAI Actually Launched on March 10, 2026
&lt;/h2&gt;

&lt;p&gt;The official OpenAI framing is straightforward.&lt;/p&gt;

&lt;p&gt;ChatGPT can now generate &lt;strong&gt;interactive learning visuals&lt;/strong&gt; for math and science topics. Instead of explaining everything in static text, the product can show relationships visually and let the learner change inputs to see what happens.&lt;/p&gt;

&lt;p&gt;That sounds simple, but it changes the experience in an important way:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;text answers tell you &lt;strong&gt;what&lt;/strong&gt; a concept is&lt;/li&gt;
&lt;li&gt;interactive visuals help you feel &lt;strong&gt;how&lt;/strong&gt; a concept behaves&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For learning, that difference is huge.&lt;/p&gt;

&lt;p&gt;If a student changes the slope in a line equation and sees the line rotate, or changes pressure and temperature in a gas law example and watches the relationship update, the concept stops being just another memorized statement. It becomes something they can inspect.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Availability note&lt;/strong&gt;&lt;br&gt;
OpenAI says rollout started on March 10, 2026 for logged-in ChatGPT users across plans. If you do not see the new experience on a given topic yet, that can be topic coverage or rollout timing rather than a plan limitation.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Why This Story Is Bigger in the U.S.
&lt;/h2&gt;

&lt;p&gt;This is where the audience targeting matters.&lt;/p&gt;

&lt;p&gt;If you write this as generic AI product news, it looks like another feature release. If you write it from the perspective of the people most likely to care, it becomes much more interesting:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;students&lt;/strong&gt; who already use ChatGPT for homework and test prep&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;parents&lt;/strong&gt; trying to help with math or science at home&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;teachers&lt;/strong&gt; deciding what productive AI use should actually look like&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The U.S. angle is especially strong for two reasons.&lt;/p&gt;

&lt;p&gt;First, Gallup found that &lt;strong&gt;60% of U.S. adults feel challenged by doing math&lt;/strong&gt;, and parents with more positive math feelings are much more confident helping their children. That means a tool that can make concepts clearer is not just a student story. It is a family story.&lt;/p&gt;

&lt;p&gt;Second, OpenAI has already built a specific U.S. education distribution path. In &lt;strong&gt;November 2025&lt;/strong&gt;, it launched &lt;strong&gt;ChatGPT for Teachers&lt;/strong&gt;, free through &lt;strong&gt;June 2027&lt;/strong&gt; for verified &lt;strong&gt;U.S. K-12 educators&lt;/strong&gt;. That means the new student-facing visual layer and the U.S.-teacher product story now reinforce each other.&lt;/p&gt;

&lt;p&gt;This is why the launch has real mainstream-read potential. It is AI, but it is also:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;homework help&lt;/li&gt;
&lt;li&gt;parent confidence&lt;/li&gt;
&lt;li&gt;classroom workflow&lt;/li&gt;
&lt;li&gt;education anxiety&lt;/li&gt;
&lt;li&gt;a familiar consumer product people already use&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That combination is stronger than a typical model release.&lt;/p&gt;

&lt;h2&gt;
  
  
  OpenAI’s Education Stack Is Getting Clearer
&lt;/h2&gt;

&lt;p&gt;The easiest way to understand this launch is to stop treating it as a random feature and start treating it as another layer in OpenAI’s education strategy.&lt;/p&gt;

&lt;p&gt;That stack makes strategic sense.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;study mode&lt;/strong&gt; helps with pedagogy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;interactive visuals&lt;/strong&gt; help with conceptual intuition&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;teacher workspaces&lt;/strong&gt; help with real classroom adoption&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Individually, each feature is useful. Together, they suggest OpenAI is serious about being part of the learning workflow, not just a generic chatbot students occasionally visit.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg913qir31cd2xdqd48lb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fg913qir31cd2xdqd48lb.png" alt="Topic map showing the kinds of math and science concepts OpenAI highlighted for ChatGPT interactive visuals at launch" width="800" height="507"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What Topics Matter Most at Launch
&lt;/h2&gt;

&lt;p&gt;OpenAI’s examples are revealing.&lt;/p&gt;

&lt;p&gt;This is not a launch built around obscure graduate-level edge cases. The emphasis is on concepts that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;common in school&lt;/li&gt;
&lt;li&gt;visually teachable&lt;/li&gt;
&lt;li&gt;painful to explain with text alone&lt;/li&gt;
&lt;li&gt;familiar enough that parents and teachers immediately understand the value&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples OpenAI highlighted include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Pythagorean theorem&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;slope-intercept form&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;difference of squares&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;area of a circle&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;surface area of a cone&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;ideal gas law&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Charles' law&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Coulomb's law&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hooke's law&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;kinetic energy&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;lens equation&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;compound interest&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;exponential decay&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That list is the real clue.&lt;/p&gt;

&lt;p&gt;OpenAI is not trying to start with “everything.” It is starting with concepts that sit at the intersection of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;high student demand&lt;/li&gt;
&lt;li&gt;high parent-recognition value&lt;/li&gt;
&lt;li&gt;strong visual payoff&lt;/li&gt;
&lt;li&gt;clear classroom relevance&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  How To Get the Most Useful Experience Tonight
&lt;/h2&gt;

&lt;p&gt;Most people will get more value from this feature if they use it like a learning tool, not like a shortcut machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where This Still Falls Short
&lt;/h2&gt;

&lt;p&gt;This launch is strong, but the wrong expectations will ruin it.&lt;/p&gt;

&lt;p&gt;The biggest mistake would be to frame this as “AI solves education now.”&lt;/p&gt;

&lt;p&gt;That is not what happened.&lt;/p&gt;

&lt;p&gt;What happened is more useful and more believable: &lt;strong&gt;OpenAI made ChatGPT better at helping people build intuition in subjects that often feel abstract, intimidating, or hard to explain with words alone.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Final Take
&lt;/h2&gt;

&lt;p&gt;This is one of the most readable, mainstream AI stories of March 2026 because the value is instantly legible.&lt;/p&gt;

&lt;p&gt;People do not need to understand model architecture to care about this. They only need to recognize one of these situations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“My child is stuck on math homework.”&lt;/li&gt;
&lt;li&gt;“I never felt confident in math myself.”&lt;/li&gt;
&lt;li&gt;“I teach students who need a better visual explanation.”&lt;/li&gt;
&lt;li&gt;“I want ChatGPT to help me understand, not just answer.”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why this update matters.&lt;/p&gt;

&lt;p&gt;OpenAI is clearly pushing ChatGPT toward a broader learning role:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;study mode&lt;/strong&gt; for guided thinking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;teacher tools&lt;/strong&gt; for classroom adoption&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;interactive visuals&lt;/strong&gt; for concept exploration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For U.S. readers in particular, that combination is compelling because it lands right where the pain already is: homework stress, math anxiety, parent confidence, and teacher workload.&lt;/p&gt;

&lt;p&gt;If OpenAI executes this well, the long-term win is not that ChatGPT becomes a better cheat sheet. The long-term win is that it becomes a more usable bridge between confusion and understanding.&lt;/p&gt;

&lt;p&gt;That is a much more important product story.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/new-ways-to-learn-math-and-science-in-chatgpt/" rel="noopener noreferrer"&gt;OpenAI: New ways to learn math and science in ChatGPT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/chatgpt-study-mode/" rel="noopener noreferrer"&gt;OpenAI: Introducing study mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/chatgpt-for-teachers/" rel="noopener noreferrer"&gt;OpenAI: A free version of ChatGPT built for teachers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://news.gallup.com/poll/690956/math-moves-americans-mentally-emotionally.aspx" rel="noopener noreferrer"&gt;Gallup: Math Moves Americans, Mentally and Emotionally&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/chatgpt-interactive-math-science-visuals-guide" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/chatgpt-adult-mode-delay-guide" rel="noopener noreferrer"&gt;ChatGPT \"Adult Mode\": What OpenAI's Delayed Feature Means for U.S. Adults, Parents, and Privacy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/openai-gpt-5-4-complete-guide" rel="noopener noreferrer"&gt;OpenAI GPT-5.4 Complete Guide: Benchmarks, Use Cases, Pricing, API, and GPT-5.4 Pro Comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/openai-gpt-5-3-instant-fewer-refusals-better-answers" rel="noopener noreferrer"&gt;OpenAI GPT-5.3 Instant: 26.8% Fewer Hallucinations, Reduced Refusals, and Better Web Answers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>chatgpt</category>
      <category>education</category>
    </item>
    <item>
      <title>ChatGPT \"Adult Mode\": What OpenAI's Delayed Feature Means for U.S. Adults, Parents, and Privacy</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:21:53 +0000</pubDate>
      <link>https://dev.to/umesh_malik/chatgpt-adult-mode-what-openais-delayed-feature-means-for-us-adults-parents-and-privacy-29h9</link>
      <guid>https://dev.to/umesh_malik/chatgpt-adult-mode-what-openais-delayed-feature-means-for-us-adults-parents-and-privacy-29h9</guid>
      <description>&lt;p&gt;OpenAI's reported ChatGPT "adult mode" is one of those AI stories that sounds narrow until you translate it into normal human terms.&lt;/p&gt;

&lt;p&gt;This is not really a niche feature story. It is a &lt;strong&gt;mass-market internet policy story&lt;/strong&gt; about what happens when a product used by hundreds of millions of people tries to give adults more conversational freedom without making life less safe for teenagers.&lt;/p&gt;

&lt;p&gt;The first fact to keep straight is the most important one: &lt;strong&gt;as of March 16, 2026, ChatGPT adult mode is not live.&lt;/strong&gt; What exists right now is a mix of public reporting about the planned feature and official OpenAI documents about age prediction, teen experiences, and parental controls.&lt;/p&gt;

&lt;p&gt;For &lt;strong&gt;U.S. readers&lt;/strong&gt;, that combination matters more than the headline alone suggests. Adults care about whether ChatGPT should stop acting overly paternalistic. Parents care about whether under-18 users could slip into the wrong experience. Privacy-sensitive users care about whether age checks turn into selfie or ID verification. That is why this story is drawing attention well beyond tech circles.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important status check&lt;/strong&gt;&lt;br&gt;
As of March 16, 2026, OpenAI has not launched adult mode and has not announced a new public release date. The product shape below is based on official OpenAI safety materials plus reporting from Axios, TechCrunch, and The Verge.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;ChatGPT adult mode is still delayed as of March 16, 2026.&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;On &lt;strong&gt;March 6, 2026&lt;/strong&gt;, Axios and TechCrunch reported that OpenAI delayed the feature again because it needed more time to get it right.&lt;/li&gt;
&lt;li&gt;On &lt;strong&gt;March 16, 2026&lt;/strong&gt;, The Verge reported that the expected launch version was &lt;strong&gt;text-only&lt;/strong&gt;, not an all-media adult product.&lt;/li&gt;
&lt;li&gt;The reporting suggests the feature is aimed at &lt;strong&gt;verified adults&lt;/strong&gt;, not general users and not minors.&lt;/li&gt;
&lt;li&gt;OpenAI's own safety roadmap already includes &lt;strong&gt;age prediction&lt;/strong&gt; for suspected teens, &lt;strong&gt;more restrictive under-18 experiences&lt;/strong&gt;, and &lt;strong&gt;parental controls&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The real public-interest question is not "Will ChatGPT get more explicit?" It is &lt;strong&gt;whether a mass-market AI product can give adults more latitude without weakening youth safety or pushing users into invasive age checks.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;For U.S. readers, this is especially relevant because it intersects with &lt;strong&gt;family use, college-age users, online safety, privacy expectations, and mainstream platform policy&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcby5w7lapw1eu3xcd3et.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcby5w7lapw1eu3xcd3et.png" alt="Diagram showing the reported scope of ChatGPT adult mode: verified adults, text-only interactions, no images or voice, and unresolved launch questions" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Is ChatGPT adult mode live right now?
&lt;/h2&gt;

&lt;p&gt;No.&lt;/p&gt;

&lt;p&gt;That point matters because a lot of social chatter makes this sound like a launch. It is not. The cleanest way to describe the situation on &lt;strong&gt;March 16, 2026&lt;/strong&gt; is this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;the feature has been reported publicly&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;the product direction is fairly clear&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;the launch is still delayed&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Axios reported on &lt;strong&gt;March 6, 2026&lt;/strong&gt; that OpenAI pushed the feature back again. TechCrunch separately reported the same delay. Then on &lt;strong&gt;March 16, 2026&lt;/strong&gt;, The Verge added more detail about the planned shape of the rollout, saying the first version was expected to focus on &lt;strong&gt;adult text conversations&lt;/strong&gt;, not images, voice, or video.&lt;/p&gt;

&lt;p&gt;That means the story people should read is not "OpenAI launched porn in ChatGPT." The story is closer to this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI appears to be building a more permissive adult conversational mode, but it still does not think the safety, age-gating, and policy edges are ready enough to ship publicly.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What would ChatGPT adult mode actually allow?
&lt;/h2&gt;

&lt;p&gt;Based on the reporting available as of &lt;strong&gt;March 16, 2026&lt;/strong&gt;, the likely launch direction is narrower than many people assume.&lt;/p&gt;

&lt;p&gt;The feature is being discussed as an &lt;strong&gt;adult text experience&lt;/strong&gt;, not a full adult-content platform. The Verge reported that the planned rollout was expected to be &lt;strong&gt;text-only at first&lt;/strong&gt;. That is a very different proposition from enabling adult images, live visual generation, or voice-first erotic roleplay.&lt;/p&gt;

&lt;p&gt;That distinction matters because it changes how readers should evaluate the story:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;this is about &lt;strong&gt;conversational permissiveness&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;not an immediate shift into &lt;strong&gt;all-format explicit media&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The strategic shift underneath all of this is simple.&lt;/p&gt;

&lt;p&gt;OpenAI has been moving toward the idea that &lt;strong&gt;adults should get a more adult-appropriate experience&lt;/strong&gt;, while &lt;strong&gt;under-18 users should get a tighter one&lt;/strong&gt;. "Adult mode" is just the most attention-grabbing version of that policy direction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why did OpenAI delay it?
&lt;/h2&gt;

&lt;p&gt;This is where the story becomes more important than the headline.&lt;/p&gt;

&lt;p&gt;If OpenAI were only optimizing for adult-user demand, it likely would have shipped sooner. The fact that it did not tells you that the company thinks the downside risk is real.&lt;/p&gt;

&lt;p&gt;The public reporting points to a cluster of concerns:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;whether under-18 users could still slip through&lt;/li&gt;
&lt;li&gt;whether age prediction is reliable enough for a high-risk feature&lt;/li&gt;
&lt;li&gt;whether adult users would accept stronger verification when the system is uncertain&lt;/li&gt;
&lt;li&gt;whether OpenAI can defend the product decision politically if something goes wrong&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Verge's March 16 report, citing Wall Street Journal reporting, described internal concern around child-safety implications. That lines up with OpenAI's own official materials, which already show a clear policy hierarchy: when there is tension between &lt;strong&gt;adult freedom&lt;/strong&gt;, &lt;strong&gt;teen safety&lt;/strong&gt;, and &lt;strong&gt;verification accuracy&lt;/strong&gt;, the company is willing to add friction rather than move fast.&lt;/p&gt;

&lt;p&gt;The takeaway is not "OpenAI changed its mind."&lt;/p&gt;

&lt;p&gt;The takeaway is that OpenAI seems to believe the product idea is directionally right, while the &lt;strong&gt;deployment conditions&lt;/strong&gt; are still fragile.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters so much to U.S. adults, parents, and college-age users
&lt;/h2&gt;

&lt;p&gt;This is where the audience targeting really matters.&lt;/p&gt;

&lt;p&gt;If you write this story as generic AI gossip, it looks unserious. If you write it from the standpoint of the people most likely to care, it becomes obviously mainstream:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;adult ChatGPT users&lt;/strong&gt; who want a less restrictive assistant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;parents&lt;/strong&gt; who do not want teens routed into adult experiences by mistake&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;college-age users&lt;/strong&gt; who sit near the adult threshold but still care about privacy and identity checks&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;households&lt;/strong&gt; where ChatGPT is already a shared, familiar product&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The U.S. angle is especially strong because OpenAI's parental-control rollout was built with organizations and regulators that U.S. families instantly recognize, including &lt;strong&gt;Common Sense Media&lt;/strong&gt; and the attorneys general of &lt;strong&gt;California&lt;/strong&gt; and &lt;strong&gt;Delaware&lt;/strong&gt;. That makes this more than an internal platform experiment. It is part of a broader public-facing trust strategy.&lt;/p&gt;

&lt;p&gt;OpenAI also now says ChatGPT reaches &lt;strong&gt;900 million weekly users&lt;/strong&gt;, which changes the stakes. Once a product is that mainstream, an adult-content policy debate is no longer about edge-case users. It becomes a platform-governance question that ordinary adults, parents, and students understand immediately.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuobgnr2b5yzmu3qvgjuw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuobgnr2b5yzmu3qvgjuw.png" alt="Visual map showing the three-way tension behind ChatGPT adult mode: adult freedom, teen safety, and age-verification privacy" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The real tension: adult freedom, teen safety, and verification privacy
&lt;/h2&gt;

&lt;p&gt;This is the part most headlines flatten.&lt;/p&gt;

&lt;p&gt;OpenAI is trying to satisfy three groups that do not want the same thing:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Adults&lt;/strong&gt; who want the model to stop refusing every mature topic by default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Parents and safety advocates&lt;/strong&gt; who want strong barriers around teen access.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Privacy-sensitive users&lt;/strong&gt; who do not want a casual chatbot turning into an ID-check funnel.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Those goals can conflict quickly.&lt;/p&gt;

&lt;p&gt;If you make the adult experience too easy to reach, critics will say teens can slip through. If you make it too hard to reach, adults will say the product is infantilizing them. If you solve the problem with aggressive verification, another set of users will object that the privacy cost is too high.&lt;/p&gt;

&lt;p&gt;That is why this story matters. It is not only about sexual content. It is about how mainstream AI products decide &lt;strong&gt;who gets what kind of freedom, under what level of certainty, with what amount of personal-data friction&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What to watch before OpenAI ships it
&lt;/h2&gt;

&lt;p&gt;This is the practical section.&lt;/p&gt;

&lt;p&gt;If you want to know whether OpenAI is actually ready to launch adult mode, ignore the loudest hot takes and watch for concrete implementation details.&lt;/p&gt;

&lt;p&gt;Those details will tell you more than any headline ever will.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;The most useful way to understand ChatGPT adult mode is not as "OpenAI wants erotica in ChatGPT."&lt;/p&gt;

&lt;p&gt;The more accurate read is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OpenAI is trying to draw a sharper line between what adults can do, what teens cannot do, and how confidently the product can tell the difference.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is why the story has traction. It touches several instincts at once:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;adult autonomy&lt;/li&gt;
&lt;li&gt;parenting anxiety&lt;/li&gt;
&lt;li&gt;teen online safety&lt;/li&gt;
&lt;li&gt;privacy around age verification&lt;/li&gt;
&lt;li&gt;trust in a chatbot that now operates at mass-market scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For U.S. readers especially, that is a highly clickable and highly relevant combination. It is about the kind of AI platform ChatGPT is becoming inside ordinary households, campuses, and daily life.&lt;/p&gt;

&lt;p&gt;If OpenAI eventually ships this well, the long-term story will not just be "adult mode exists." The long-term story will be that ChatGPT became more explicitly &lt;strong&gt;age-tiered&lt;/strong&gt;, and that a mainstream AI company finally had to show how adult freedom, child safety, and privacy can coexist in one product.&lt;/p&gt;

&lt;p&gt;That is the real story worth following.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.theverge.com/ai-artificial-intelligence/895130/openai-chatgpt-adult-mode-text-smut-written-erotica" rel="noopener noreferrer"&gt;The Verge: OpenAI wants to give ChatGPT an adult mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.axios.com/2026/03/06/openai-chatgpt-adult-mode-delay" rel="noopener noreferrer"&gt;Axios: OpenAI delays ChatGPT's "adult mode" again&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://techcrunch.com/2026/03/06/openai-delays-release-chatgpt-adult-mode/" rel="noopener noreferrer"&gt;TechCrunch: OpenAI delays the release of ChatGPT's adult mode&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/building-toward-age-appropriate-experiences/" rel="noopener noreferrer"&gt;OpenAI: Building toward age-appropriate experiences&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/age-prediction-chatgpt/" rel="noopener noreferrer"&gt;OpenAI: Age prediction now available in ChatGPT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/introducing-parental-controls-in-chatgpt/" rel="noopener noreferrer"&gt;OpenAI: Introducing parental controls in ChatGPT&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://openai.com/index/openai-for-families/" rel="noopener noreferrer"&gt;OpenAI: OpenAI for Families&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/chatgpt-adult-mode-delay-guide" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/chatgpt-interactive-math-science-visuals-guide" rel="noopener noreferrer"&gt;ChatGPT Now Teaches Math and Science With Interactive Visuals — What You Need to Know&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/openai-gpt-5-4-complete-guide" rel="noopener noreferrer"&gt;OpenAI GPT-5.4 Complete Guide: Benchmarks, Use Cases, Pricing, API, and GPT-5.4 Pro Comparison&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/openai-gpt-5-3-instant-fewer-refusals-better-answers" rel="noopener noreferrer"&gt;OpenAI GPT-5.3 Instant: 26.8% Fewer Hallucinations, Reduced Refusals, and Better Web Answers&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>openai</category>
      <category>chatgpt</category>
      <category>onlinesafety</category>
    </item>
    <item>
      <title>Nvidia OpenClaw Explained: What It Means for Your AI Agent Strategy (GTC 2026)</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:20:11 +0000</pubDate>
      <link>https://dev.to/umesh_malik/nvidia-openclaw-explained-what-it-means-for-your-ai-agent-strategy-gtc-2026-85m</link>
      <guid>https://dev.to/umesh_malik/nvidia-openclaw-explained-what-it-means-for-your-ai-agent-strategy-gtc-2026-85m</guid>
      <description>&lt;p&gt;On &lt;strong&gt;March 17, 2026&lt;/strong&gt;, Business Insider reported that Jensen Huang told GTC attendees every company "needs to have an OpenClaw strategy."&lt;/p&gt;

&lt;p&gt;That line sounds like classic conference theater until you translate it into plain English.&lt;/p&gt;

&lt;p&gt;Nvidia is saying the next enterprise AI decision is not just &lt;strong&gt;which model&lt;/strong&gt; or &lt;strong&gt;which chip&lt;/strong&gt; you buy. It is whether your company has a plan for &lt;strong&gt;AI agents that can actually do work&lt;/strong&gt;, plus the control layer that keeps those agents from becoming a governance nightmare.&lt;/p&gt;

&lt;p&gt;That is the real story.&lt;/p&gt;

&lt;p&gt;If you searched for &lt;strong&gt;Nvidia OpenClaw strategy&lt;/strong&gt;, &lt;strong&gt;what NemoClaw means&lt;/strong&gt;, or &lt;strong&gt;why Nvidia cares about AI agents now&lt;/strong&gt;, the short answer is this: Nvidia thinks AI is moving from &lt;strong&gt;answering questions&lt;/strong&gt; to &lt;strong&gt;completing tasks&lt;/strong&gt;, and it wants to own more of that stack than GPUs alone.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important status check&lt;/strong&gt;&lt;br&gt;
The phrase "OpenClaw strategy" comes from current reporting around GTC 2026. Nvidia's wider GTC materials clearly emphasize agentic AI, but some of the more detailed OpenClaw and NemoClaw framing is still coming through reporting from Business Insider, The Wall Street Journal, WIRED, and Ars Technica rather than one single Nvidia product page.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;On &lt;strong&gt;March 17, 2026&lt;/strong&gt;, Business Insider reported that Jensen Huang told GTC attendees every company "needs to have an OpenClaw strategy."&lt;/li&gt;
&lt;li&gt;Nvidia's own &lt;strong&gt;GTC 2026&lt;/strong&gt; materials already make the broader context clear: the company is centering &lt;strong&gt;agentic AI&lt;/strong&gt;, &lt;strong&gt;AI factories&lt;/strong&gt;, and &lt;strong&gt;physical AI&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;The practical meaning is simple: Nvidia thinks the next AI wave is about &lt;strong&gt;agents that complete work&lt;/strong&gt;, not only chatbots that answer prompts.&lt;/li&gt;
&lt;li&gt;Reporting from &lt;strong&gt;Business Insider&lt;/strong&gt;, &lt;strong&gt;The Wall Street Journal&lt;/strong&gt;, &lt;strong&gt;WIRED&lt;/strong&gt;, and &lt;strong&gt;Ars Technica&lt;/strong&gt; suggests Nvidia is pairing that OpenClaw push with &lt;strong&gt;NemoClaw&lt;/strong&gt;, a more secure enterprise layer around agent execution.&lt;/li&gt;
&lt;li&gt;The strategic bet is that companies will need a plan for &lt;strong&gt;where agents act&lt;/strong&gt;, &lt;strong&gt;what tools they can touch&lt;/strong&gt;, &lt;strong&gt;how they are observed&lt;/strong&gt;, and &lt;strong&gt;how they are stopped when something goes wrong&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;That makes this more than another silicon story. It is Nvidia pushing upward from chips into &lt;strong&gt;runtime&lt;/strong&gt;, &lt;strong&gt;governance&lt;/strong&gt;, and &lt;strong&gt;enterprise software control&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;U.S. enterprises&lt;/strong&gt;, the immediate relevance is strongest in regulated, high-trust workflows where "agentic AI" only matters if it can be deployed safely.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favp53c2j0xmuykgwrhwn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Favp53c2j0xmuykgwrhwn.png" alt="Diagram showing the shift from query AI to task AI, with OpenClaw representing execution and guardrails between user intent and enterprise actions" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What does Jensen Huang mean by an OpenClaw strategy?
&lt;/h2&gt;

&lt;p&gt;The most useful way to read the line is not as product branding but as a strategic instruction.&lt;/p&gt;

&lt;p&gt;An &lt;strong&gt;OpenClaw strategy&lt;/strong&gt; means your company needs a point of view on all of the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;which business tasks AI agents should actually perform&lt;/li&gt;
&lt;li&gt;which tools and systems those agents can access&lt;/li&gt;
&lt;li&gt;what approval, monitoring, and rollback model surrounds them&lt;/li&gt;
&lt;li&gt;how you keep agents useful without letting them become an enterprise liability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the real leap from chatbot thinking to agent thinking.&lt;/p&gt;

&lt;p&gt;In the chatbot era, the core question was: &lt;strong&gt;Which model gives us the best answer?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In the agent era, the better question is: &lt;strong&gt;Which system can safely take action inside our workflow?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That shift is why the phrase matters.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Nvidia thinks AI is moving from queries to tasks
&lt;/h2&gt;

&lt;p&gt;This is the key learning inside the story.&lt;/p&gt;

&lt;p&gt;The old mental model of AI was mostly prompt in, answer out. That model is still useful, but it is increasingly incomplete for enterprise work.&lt;/p&gt;

&lt;p&gt;Nvidia is clearly pushing a new framing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;a user or system assigns a goal&lt;/li&gt;
&lt;li&gt;the agent plans a sequence of actions&lt;/li&gt;
&lt;li&gt;the runtime manages tool access and execution&lt;/li&gt;
&lt;li&gt;the company audits what happened and decides how much autonomy is acceptable&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is a much bigger systems problem than autocomplete or chat.&lt;/p&gt;

&lt;p&gt;My inference from the current sources is that Nvidia is trying to make this mental shift feel inevitable. It wants companies to think: if cloud strategy became mandatory, and mobile strategy became mandatory, then &lt;strong&gt;agent strategy&lt;/strong&gt; will become mandatory too.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where NemoClaw fits, and why it matters
&lt;/h2&gt;

&lt;p&gt;This is where the story becomes more interesting than a slogan.&lt;/p&gt;

&lt;p&gt;Reporting from &lt;strong&gt;Business Insider&lt;/strong&gt;, &lt;strong&gt;The Wall Street Journal&lt;/strong&gt;, &lt;strong&gt;WIRED&lt;/strong&gt;, and &lt;strong&gt;Ars Technica&lt;/strong&gt; suggests Nvidia is also advancing &lt;strong&gt;NemoClaw&lt;/strong&gt;, which reads less like a flashy public phrase and more like the answer to the real enterprise question:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you let AI agents do useful work without giving them unsafe freedom?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the part CIOs, CISOs, and platform teams actually care about.&lt;/p&gt;

&lt;p&gt;If OpenClaw is the ambition, NemoClaw appears to be the control layer around that ambition.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5rcnons2jpd2tumvcm3m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5rcnons2jpd2tumvcm3m.png" alt="Diagram showing Nvidia's emerging agent stack: infrastructure and models at the bottom, runtime and policy in the middle, and enterprise AI workflows at the top" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this is bigger than another chip story
&lt;/h2&gt;

&lt;p&gt;If you only read this as Nvidia hype, you will miss the deeper signal.&lt;/p&gt;

&lt;p&gt;The official Nvidia GTC framing and the recent reporting point in the same direction: Nvidia is trying to extend its relevance upward through the stack.&lt;/p&gt;

&lt;p&gt;The point is not that Nvidia suddenly stopped caring about chips.&lt;/p&gt;

&lt;p&gt;The point is that chips alone are no longer enough to define the strategic narrative. The next fight is around &lt;strong&gt;how agents are deployed&lt;/strong&gt;, &lt;strong&gt;how safe they are&lt;/strong&gt;, and &lt;strong&gt;which company becomes the trusted layer between the model and the enterprise workflow&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That is a much more durable market position.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters so much to U.S. companies right now
&lt;/h2&gt;

&lt;p&gt;This is where the audience targeting matters.&lt;/p&gt;

&lt;p&gt;For a broad U.S. business audience, the relevance is immediate because the story sits at the overlap of:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;enterprise productivity pressure&lt;/li&gt;
&lt;li&gt;AI automation ambition&lt;/li&gt;
&lt;li&gt;regulatory and legal caution&lt;/li&gt;
&lt;li&gt;security and data-governance reality&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My inference from Nvidia's framing and the surrounding reporting is that the company is speaking directly to the people who approve enterprise AI budgets in the United States:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CTOs deciding where agents are allowed to act&lt;/li&gt;
&lt;li&gt;CIOs trying to standardize enterprise AI stacks&lt;/li&gt;
&lt;li&gt;CISOs worried about tool abuse and data leakage&lt;/li&gt;
&lt;li&gt;product and ops leaders looking for cost-effective automation that does not blow up governance&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why this story is more teachable than a normal conference recap. It gives readers a practical new lens:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real bottleneck in enterprise AI is no longer only intelligence. It is controlled execution.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  If your company agrees with Nvidia, the next move is operational
&lt;/h2&gt;

&lt;p&gt;This is the step most teams skip.&lt;/p&gt;

&lt;p&gt;They hear the strategic message, buy into the future, and then fail to translate it into a controlled rollout model. If you actually think OpenClaw-style planning matters, the right response is not "launch more agents." It is to narrow the scope and raise the discipline.&lt;/p&gt;

&lt;h2&gt;
  
  
  What teams should ask before adopting an OpenClaw strategy
&lt;/h2&gt;

&lt;p&gt;If the phrase sticks, a lot of teams will repeat it without translating it into operational questions.&lt;/p&gt;

&lt;p&gt;That would be a mistake.&lt;/p&gt;

&lt;p&gt;That is the difference between having a buzzword and having a strategy.&lt;/p&gt;

&lt;h2&gt;
  
  
  Final take
&lt;/h2&gt;

&lt;p&gt;The most useful reading of Nvidia's OpenClaw strategy line is not "Jensen Huang said something catchy at GTC."&lt;/p&gt;

&lt;p&gt;It is this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nvidia is trying to convince the market that AI agents are becoming a first-class enterprise planning problem, and that the winning companies will need a secure runtime around those agents, not just a smart model and fast hardware.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is a meaningful shift.&lt;/p&gt;

&lt;p&gt;It tells you where the AI market is going:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;from copilots to agents&lt;/li&gt;
&lt;li&gt;from prompts to tasks&lt;/li&gt;
&lt;li&gt;from model choice to runtime control&lt;/li&gt;
&lt;li&gt;from demo intelligence to enterprise trust&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For U.S. companies, that is a timely message because the next wave of AI adoption will be judged less by how impressive the model sounds and more by whether the system can safely act inside real workflows.&lt;/p&gt;

&lt;p&gt;That is why this is worth paying attention to.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.businessinsider.com/nvidia-ceo-jensen-huang-openclaw-ai-strategy-2026-3" rel="noopener noreferrer"&gt;Business Insider: Nvidia CEO Jensen Huang says every company needs an OpenClaw strategy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wsj.com/tech/ai/nvidia-jensen-huang-ai-agents-openclaw-nemoclaw-b1e09bf2" rel="noopener noreferrer"&gt;The Wall Street Journal: Nvidia CEO Jensen Huang is building a platform for AI agents&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://nvidianews.nvidia.com/news/nvidia-ceo-jensen-huang-and-global-technology-leaders-to-showcase-age-of-ai-at-gtc-2026" rel="noopener noreferrer"&gt;NVIDIA Newsroom: Jensen Huang and global technology leaders to showcase the age of AI at GTC 2026&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.nvidia.com/en-us/gtc/keynote/" rel="noopener noreferrer"&gt;NVIDIA GTC 2026 keynote&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.wired.com/story/nvidia-planning-ai-agent-platform-launch-open-source/" rel="noopener noreferrer"&gt;WIRED: Nvidia is planning to launch an open-source AI agent platform&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://arstechnica.com/ai/2026/03/nvidia-is-reportedly-planning-its-own-open-source-openclaw-competitor/" rel="noopener noreferrer"&gt;Ars Technica: Nvidia is reportedly planning its own open-source OpenClaw competitor&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.ft.com/content/9914d633-0343-4b8b-9b5d-e43cb5212e5c" rel="noopener noreferrer"&gt;Financial Times: Nvidia chief says AI chip market could reach $1tn by 2027&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/nvidia-openclaw-strategy-ai-agent-plan" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/agentic-ai-enterprise-security-model" rel="noopener noreferrer"&gt;Agentic AI Is Changing the Security Model for Enterprise Systems: What CISOs Need to Fix Now&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/autonomous-ai-agents-production-gap-2026" rel="noopener noreferrer"&gt;AI Agents That Run the Business in 2026: Why 77% Never Reach Production (and What the 23% Do Differently)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/anthropic-code-review-claude-code-guide" rel="noopener noreferrer"&gt;Claude Code Review by Anthropic: Multi-Agent PR Reviews, Pricing, Setup Guide, and Limits (2026)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>nvidia</category>
      <category>openclaw</category>
      <category>nemoclaw</category>
    </item>
    <item>
      <title>Build a RAG Pipeline From Scratch (Production Patterns That Actually Matter)</title>
      <dc:creator>Umesh Malik</dc:creator>
      <pubDate>Fri, 12 Jun 2026 20:19:37 +0000</pubDate>
      <link>https://dev.to/umesh_malik/build-a-rag-pipeline-from-scratch-production-patterns-that-actually-matter-2gh6</link>
      <guid>https://dev.to/umesh_malik/build-a-rag-pipeline-from-scratch-production-patterns-that-actually-matter-2gh6</guid>
      <description>&lt;p&gt;Most RAG tutorials stop at "embed your docs, do a similarity search, stuff the results in a prompt." That gets you a demo. It does not get you something that gives correct, grounded answers on real data — and the gap between those two is where all the actual engineering lives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A RAG pipeline is a series of stages, and a weak link in any one of them caps the quality of the whole thing.&lt;/strong&gt; You can have a frontier model and a beautiful prompt, and still ship garbage if your chunking is wrong. So this is the pipeline end to end, with the production patterns that decide whether it works — not just the happy-path demo.&lt;/p&gt;

&lt;p&gt;If you're still deciding whether RAG is even the right tool versus fine-tuning, read &lt;a href="https://umesh-malik.com/blog/rag-vs-fine-tuning-llms-2026" rel="noopener noreferrer"&gt;RAG vs Fine-Tuning for LLMs&lt;/a&gt; first. This post assumes you've decided to retrieve.&lt;/p&gt;

&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;RAG is a &lt;strong&gt;pipeline&lt;/strong&gt;: ingest → chunk → embed → store → retrieve → generate. The output is only as good as the weakest stage.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval quality is everything.&lt;/strong&gt; Most "the LLM hallucinated" bugs are actually "the right chunk never got retrieved" bugs.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk on meaning, not character counts.&lt;/strong&gt; Semantic boundaries plus light overlap beat fixed-size splits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Don't rely on vector search alone.&lt;/strong&gt; Hybrid (keyword + vector) retrieval with a reranker is the production default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ground the generation.&lt;/strong&gt; Pass only retrieved context, require citations, and refuse when context is thin.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;You can't improve what you don't measure.&lt;/strong&gt; Build a retrieval eval before you tune anything.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The pipeline, stage by stage
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Ingestion
&lt;/h3&gt;

&lt;p&gt;Load your sources and &lt;strong&gt;clean them before anything else&lt;/strong&gt;. Strip boilerplate, nav chrome, and duplicated headers/footers. Garbage in here propagates through every downstream stage and you'll never trace the bad answer back to it. Preserve structure — headings, lists, tables — because that structure is what makes good chunking possible.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Chunking — where most pipelines quietly fail
&lt;/h3&gt;

&lt;p&gt;Chunking is the highest-leverage, most-underrated stage. The naive move is to split every document into fixed 500-character windows. Don't. Fixed-size splitting severs sentences and merges unrelated ideas, and then retrieval surfaces fragments that don't mean anything on their own.&lt;/p&gt;

&lt;p&gt;Instead:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Split on semantic boundaries&lt;/strong&gt; — headings, paragraphs, list items. Respect the document's own structure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One idea per chunk.&lt;/strong&gt; A chunk should be retrievable and self-contained.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add light overlap&lt;/strong&gt; so context isn't cut mid-thought between adjacent chunks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Attach metadata&lt;/strong&gt; to every chunk: source, title, section, date, URL. You'll use it for filtering and citations.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;chunk = {
  id, text,
  metadata: { source, title, section, url, date }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;💡 &lt;strong&gt;Key insight&lt;/strong&gt;: If retrieval is bad, fix chunking before you touch the model or the prompt. The retriever can only find what chunking made findable.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  3. Embedding
&lt;/h3&gt;

&lt;p&gt;Turn each chunk into a vector with an embedding model. Two rules that save pain later:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Embed the same way at index time and query time.&lt;/strong&gt; Same model, same preprocessing. A mismatch silently wrecks relevance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Version your embeddings.&lt;/strong&gt; When you change the embedding model, you must re-embed the whole corpus. Track which model produced which vectors so you know when a reindex is due.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  4. Storage
&lt;/h3&gt;

&lt;p&gt;Store vectors in an index that does fast similarity search &lt;strong&gt;with metadata filtering&lt;/strong&gt;. You don't necessarily need a dedicated vector database — &lt;code&gt;pgvector&lt;/code&gt; on the Postgres you already run handles a surprising amount before a specialized store (Qdrant, Weaviate, Pinecone) earns its keep.&lt;/p&gt;

&lt;p&gt;What actually matters: filtering. "Search only this customer's docs" or "only documents from the last year" is a metadata &lt;code&gt;WHERE&lt;/code&gt; clause combined with vector similarity. Without it, retrieval leaks across boundaries it shouldn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. Retrieval — go hybrid, then rerank
&lt;/h3&gt;

&lt;p&gt;This is the stage that most separates a demo from a product.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Vector search alone is not enough.&lt;/strong&gt; Embeddings are great at semantic similarity and bad at exact matches — error codes, product SKUs, proper nouns, acronyms. Keyword search (BM25) is the opposite. &lt;strong&gt;Hybrid retrieval runs both and merges the results&lt;/strong&gt;, so you catch both "what they meant" and "the exact term they typed."&lt;/p&gt;

&lt;p&gt;Then &lt;strong&gt;rerank&lt;/strong&gt;. Initial retrieval optimizes for recall — pull a generous candidate set (say, top 20). A cross-encoder reranker then scores those candidates against the query far more precisely and keeps the top handful you'll actually pass to the model. Retrieve broad, rerank narrow.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;candidates = vectorSearch(q, k=20) ∪ keywordSearch(q, k=20)
top = rerank(q, candidates)[:5]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  6. Grounded generation
&lt;/h3&gt;

&lt;p&gt;Now — and only now — the LLM. The job here is to keep it honest:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pass only the retrieved context.&lt;/strong&gt; Don't let the model fall back on parametric memory for facts it should be reading.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Require citations.&lt;/strong&gt; Ask it to cite the chunk/source for each claim. Citations are both a UX feature and a hallucination check.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Give it permission to say "I don't know."&lt;/strong&gt; If the retrieved context doesn't answer the question, the correct output is a refusal, not a confident guess. Tell it that explicitly.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;System: Answer ONLY from the context below. Cite sources by id.
If the context doesn't contain the answer, say you don't know.

Context:
[1] {chunk_1}
[2] {chunk_2}
...

Question: {user_query}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The patterns that separate prod from demo
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hybrid + rerank&lt;/strong&gt;, not bare vector search. The single biggest quality jump.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata filtering&lt;/strong&gt; for security and scoping — never retrieve across tenant or permission boundaries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Citations and refusal&lt;/strong&gt; wired into the prompt, so wrong answers become "I don't know" instead of confident fiction.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Caching.&lt;/strong&gt; Cache embeddings (don't re-embed unchanged chunks) and cache answers to repeated queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A retrieval eval set.&lt;/strong&gt; A fixed set of question → expected-source pairs you can score on every change.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Common mistakes
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Fixed-size chunking.&lt;/strong&gt; The default that quietly caps your ceiling. Chunk on meaning.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vector-only retrieval.&lt;/strong&gt; You'll miss exact-match queries every time. Add keyword search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No reranking.&lt;/strong&gt; Stuffing the raw top-k into the prompt wastes context on near-misses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tuning the prompt to fix a retrieval problem.&lt;/strong&gt; If the right chunk isn't retrieved, the prompt is irrelevant. Diagnose retrieval first.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No evaluation.&lt;/strong&gt; "It looks better" isn't a metric. Without an eval set you're guessing, and you'll regress silently.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Best practices
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Measure retrieval separately from generation.&lt;/strong&gt; Most failures are retrieval failures; isolate them. Track recall on your eval set.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chunk on structure, then iterate.&lt;/strong&gt; Start with semantic boundaries and light overlap; adjust based on retrieval scores.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Default to hybrid + rerank.&lt;/strong&gt; Treat it as the baseline, not an optimization.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter by metadata for scope and security.&lt;/strong&gt; Especially in multi-tenant systems.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Force grounding and citations.&lt;/strong&gt; Answer only from context; cite; allow "I don't know."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-embed on model change.&lt;/strong&gt; Version vectors so you know when a reindex is required.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;RAG isn't one trick — it's a pipeline, and quality is set by its weakest stage. Get chunking right, retrieve hybrid and rerank, ground the generation, and &lt;em&gt;measure retrieval&lt;/em&gt; so you're improving the right thing. Do that and you cross the line from impressive demo to a system people can trust with real questions.&lt;/p&gt;

&lt;p&gt;Skip the engineering — relying on naive chunking and bare vector search — and you'll ship something that demos well and fails the moment real users ask real questions.&lt;/p&gt;

&lt;p&gt;Go deeper across &lt;a href="https://umesh-malik.com/topics/llm-engineering" rel="noopener noreferrer"&gt;LLM Engineering — RAG, Fine-Tuning &amp;amp; Production LLMs&lt;/a&gt;, revisit the &lt;a href="https://umesh-malik.com/blog/rag-vs-fine-tuning-llms-2026" rel="noopener noreferrer"&gt;RAG vs Fine-Tuning decision framework&lt;/a&gt;, or explore &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents&lt;/a&gt; for the agentic side of LLM systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Explore more:&lt;/strong&gt; &lt;a href="https://umesh-malik.com/topics/llm-engineering" rel="noopener noreferrer"&gt;LLM Engineering&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/ai-coding-agents" rel="noopener noreferrer"&gt;AI Coding Agents&lt;/a&gt; · &lt;a href="https://umesh-malik.com/topics/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://umesh-malik.com/blog/build-rag-pipeline-from-scratch" rel="noopener noreferrer"&gt;umesh-malik.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Keep reading on umesh-malik.com:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/rag-vs-fine-tuning-llms-2026" rel="noopener noreferrer"&gt;RAG vs Fine-Tuning for LLMs in 2026: A Production Decision Framework With Real Tradeoffs&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/autonomous-ai-agents-production-gap-2026" rel="noopener noreferrer"&gt;AI Agents That Run the Business in 2026: Why 77% Never Reach Production (and What the 23% Do Differently)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://umesh-malik.com/blog/how-to-build-mcp-server" rel="noopener noreferrer"&gt;How to Build a Production MCP Server (I Added One to My Site)&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>rag</category>
      <category>llmengineering</category>
      <category>vectordatabases</category>
      <category>embeddings</category>
    </item>
  </channel>
</rss>
