<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Tom Tokita</title>
    <description>The latest articles on DEV Community by Tom Tokita (@tomtokita).</description>
    <link>https://dev.to/tomtokita</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840091%2F5ac3193c-0dc1-496a-b6d2-a7eb6e1556e7.jpg</url>
      <title>DEV Community: Tom Tokita</title>
      <link>https://dev.to/tomtokita</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/tomtokita"/>
    <language>en</language>
    <item>
      <title>Your Chatbot's Deflection Rate Went Up. Customers Just Gave Up.</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Mon, 29 Jun 2026 03:55:00 +0000</pubDate>
      <link>https://dev.to/tomtokita/your-chatbots-deflection-rate-went-up-customers-just-gave-up-3ldd</link>
      <guid>https://dev.to/tomtokita/your-chatbots-deflection-rate-went-up-customers-just-gave-up-3ldd</guid>
      <description>&lt;p&gt;Last month, I had a problem with a popular mobile banking app in Southeast Asia. Nothing exotic. A transaction didn't go through, and my support ticket had been sitting untouched for two weeks.&lt;/p&gt;

&lt;p&gt;So I opened the app's chatbot. It greeted me warmly, asked how it could help, and then couldn't do a single useful thing. It couldn't look up my transaction. It couldn't check the status of my ticket. It couldn't tell me why my issue was unresolved. It could answer FAQ questions, and that was it.&lt;/p&gt;

&lt;p&gt;I called the hotline instead. Spent an hour navigating prompts, got bounced between menus, and every path ended the same way: "Please contact our chatbot or check your existing ticket." The system was built for deflection, not resolution. The ticket that nobody had touched for fourteen days.&lt;/p&gt;

&lt;p&gt;I gave up. And somewhere in that company's dashboard, my interaction counted as a successful AI chatbot deflection.&lt;/p&gt;

&lt;p&gt;The uncomfortable part: if you shipped a deflection-optimized bot this quarter, a customer somewhere is living this exact loop right now. Your dashboard is calling it a win.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Deflection Metric Everyone Loves (and Nobody Questions)
&lt;/h2&gt;

&lt;p&gt;Deflection rate measures the percentage of customer contacts handled without a human agent. It's cheap to track, easy to celebrate, and it maps directly to cost savings. &lt;a href="https://www.lorikeetcx.ai/articles/resolve-not-deflect" rel="noopener noreferrer"&gt;Industry benchmarks&lt;/a&gt; citing McKinsey's 2026 service operations data put AI resolutions at $0.62 per ticket versus $7.40 for human agents. That's a 12x cost difference. Of course executives love this number.&lt;/p&gt;

&lt;p&gt;But deflection doesn't measure whether the customer's problem got solved. It measures whether the customer stopped asking. Those are very different things.&lt;/p&gt;

&lt;p&gt;This is Goodhart's Law applied to customer experience: when a measure becomes a target, it ceases to be a good measure. Deflection is cheap and easy to optimize. Resolution is hard and expensive to track. So companies optimize the proxy and stop looking at the goal.&lt;/p&gt;

&lt;p&gt;Gartner data, &lt;a href="https://www.forbes.com/sites/cindyrodriguezconstable/2026/05/31/the-missing-variable-in-every-ai-business-case-your-customer/" rel="noopener noreferrer"&gt;as reported by Forbes&lt;/a&gt;, confirms the gap: only 14% of customer issues are fully resolved through self-service channels. Even for the simplest cases, that number climbs to just 36%. Meanwhile, companies report deflection rates north of 60% and call it progress.&lt;/p&gt;

&lt;h2&gt;
  
  
  What It Actually Costs
&lt;/h2&gt;

&lt;p&gt;The financial damage is already showing up. A &lt;a href="https://www.forbes.com/sites/shephyken/2026/06/28/the-most-dangerous-ai-metric-is-the-one-that-says-youre-successful/" rel="noopener noreferrer"&gt;Laivly study published in June 2026&lt;/a&gt; found that 28% of leaders said AI had directly contributed to lost revenue because it couldn't handle complicated support issues. Another 20% knew there was lost revenue but couldn't even quantify it. Nearly half of leaders are aware their AI is costing them money, and many can't tell you how much.&lt;/p&gt;

&lt;p&gt;The market is already correcting. A &lt;a href="https://www.customerexperiencedive.com/news/why-three-quarters-of-enterprises-have-rolled-back-ai-agents/821140/" rel="noopener noreferrer"&gt;Sinch survey of 2,527 senior decision-makers&lt;/a&gt; found that 74% of enterprises have rolled back or shut down a customer-facing AI agent after deployment. The rollback rate was highest, 81%, among organizations with the most mature AI governance. The companies monitoring most closely found the most problems. The ones not monitoring? They're still celebrating.&lt;/p&gt;

&lt;p&gt;PwC's Consumer Intelligence Survey, &lt;a href="https://www.forbes.com/sites/cindyrodriguezconstable/2026/05/31/the-missing-variable-in-every-ai-business-case-your-customer/" rel="noopener noreferrer"&gt;cited in the same Forbes analysis&lt;/a&gt;, puts the downstream cost plainly: 44% of customers stopped buying from a company entirely after a trust breakdown. That revenue leaves quietly, with no complaint and no exit survey that ever reaches leadership.&lt;/p&gt;

&lt;h2&gt;
  
  
  What That Bot Was Actually Missing
&lt;/h2&gt;

&lt;p&gt;I build agentic systems. Not customer-facing support bots, but the same architectural components that make any AI agent useful instead of decorative: record lookup, scoped write access, escalation gates, anti-fabrication checks. These components work the same whether the agent serves you or your customer.&lt;/p&gt;

&lt;p&gt;The chatbot I dealt with had none of them. What frustrates me as a builder: the fix isn't a bigger model or a more expensive API. It's basic architecture.&lt;/p&gt;

&lt;p&gt;The bot knew who I was. I was logged in. It had my account ID, my phone number, my transaction history sitting in a database somewhere. But nobody gave it permission to read any of that, wired it to the ticketing system, or built an escalation rule that said "if a ticket has been open for 14 days with no response, flag it."&lt;/p&gt;

&lt;p&gt;That's not an AI problem. That's a permissions and plumbing problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  What a properly wired support agent actually does
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What the customer needs&lt;/th&gt;
&lt;th&gt;FAQ-skin chatbot&lt;/th&gt;
&lt;th&gt;Properly wired AI agent&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Recognize me&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Generic greeting, asks for details I already entered&lt;/td&gt;
&lt;td&gt;Auto-lookup: I'm logged in, it already knows my name, account, and open tickets&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Check my records&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Please visit our help center"&lt;/td&gt;
&lt;td&gt;Pulls my transaction history, sees the failed payment, checks my ticket status&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Take a basic action&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Can't&lt;/td&gt;
&lt;td&gt;Adds a follow-up note to my ticket, triggers a callback request, updates my case priority&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Read my frustration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Oblivious, keeps looping through FAQ scripts&lt;/td&gt;
&lt;td&gt;Sentiment detection: my second message is sharper than my first, route to a human now&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Escalate without losing context&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"Let me transfer you" (I start over from scratch)&lt;/td&gt;
&lt;td&gt;Human agent inherits the full thread, my account state, and what the bot already tried&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enforce SLA&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ticket sits for two weeks in silence&lt;/td&gt;
&lt;td&gt;Auto-escalate: no agent response in 48 hours triggers a supervisor notification&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Feel like a real conversation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Corporate, stale, robotic&lt;/td&gt;
&lt;td&gt;Warm, direct, culturally aware persona that doesn't read like a terms-of-service document&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;None of this requires a frontier model. A well-structured agent with read access to CRM records, write access to ticket notes, and three escalation rules would have resolved my issue in two minutes. The technology exists. The architecture and permissions don't.&lt;/p&gt;

&lt;p&gt;If you want the general-purpose test for whether an AI tool is actually intelligent or just conversational, &lt;a href="https://tokita.online/how-to-spot-an-llm-wrapper/" rel="noopener noreferrer"&gt;I wrote a framework for that&lt;/a&gt;. What matters here is the customer-facing version: can your bot take an action on behalf of a specific customer, or can it only describe what a customer should do themselves?&lt;/p&gt;

&lt;h2&gt;
  
  
  Where Chatbots Actually Work
&lt;/h2&gt;

&lt;p&gt;This isn't an anti-AI argument. Bots are genuinely excellent at killing volume that should never have been a ticket in the first place. Password resets, account balance checks, delivery tracking, store hours, return policies. Deterministic, high-volume, low-context requests where the answer is the same regardless of who's asking.&lt;/p&gt;

&lt;p&gt;The failure boundary is precise: bots break when the answer depends on your specific data, your specific history, or your emotional state. A well-scoped bot that handles the first category and honestly hands off the second is a good system. A bot that pretends to handle both and quietly counts the frustrated customer as "deflected" is a liability dressed as a KPI.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Philippine Irony
&lt;/h2&gt;

&lt;p&gt;There's a specific irony here for the Philippines. This is the country that became the back office of the world by being better at customer empathy than anyone else. Filipino agents built a global BPO industry on patience, warmth, and actually solving the problem.&lt;/p&gt;

&lt;p&gt;Now Philippine banks and telcos are deploying deflection bots that strip out the one thing the BPO industry proved Filipinos do best.&lt;/p&gt;

&lt;p&gt;A &lt;a href="https://insiderph.com/twilio-survey-ai-chatbots-test-filipinos-patience-most" rel="noopener noreferrer"&gt;Twilio/YouGov survey of 7,331 adults across seven Asia-Pacific markets&lt;/a&gt; captured the tension: 90% of Filipino respondents said their society values patience and politeness in daily interactions. But only 32% said they have any patience left for automated customer service. Filipinos become more frustrated than their Asia-Pacific peers when AI systems give scripted or robotic responses.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://www.swarm.work/philippine-ai-report" rel="noopener noreferrer"&gt;Philippine AI Report 2025&lt;/a&gt; found that 92% of Philippine organizations have experimented with AI, but 65% remain stuck at proof-of-concept. The gap between experimenting with AI and wiring it into the systems that matter is where the deflection illusion lives. The technology arrived. The integration didn't.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Monday Morning Test
&lt;/h2&gt;

&lt;p&gt;Pull your chatbot's deflection rate. Now try to pull its resolution rate. If your dashboard can't show you the second number, you already have your answer.&lt;/p&gt;

&lt;p&gt;The fix isn't bigger models or more sophisticated NLP. It's plumbing. Give your bot read access to customer records. Give it write access to ticket notes. Build three escalation rules: sentiment spike, repeated question, SLA breach. That's the difference between a support agent and a search bar with a personality.&lt;/p&gt;

&lt;p&gt;A sticker on a broken car is still a broken car.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://tokita.online/ai-chatbot-deflection/" rel="noopener noreferrer"&gt;tokita.online&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Related reading:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://tokita.online/ai-expert-philippines/" rel="noopener noreferrer"&gt;What running AI in production actually looks like&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tokita.online/how-to-spot-an-llm-wrapper/" rel="noopener noreferrer"&gt;How to tell if your AI tool is just a dropdown&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://tokita.online/ai-agent-production-safety/" rel="noopener noreferrer"&gt;The architecture that stopped an AI agent from deleting a production database&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I Opened DevTools on an "AI-Native" Platform. It Was a Dropdown.</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Sun, 14 Jun 2026 17:02:29 +0000</pubDate>
      <link>https://dev.to/tomtokita/i-opened-devtools-on-an-ai-native-platform-it-was-a-dropdown-5g03</link>
      <guid>https://dev.to/tomtokita/i-opened-devtools-on-an-ai-native-platform-it-was-a-dropdown-5g03</guid>
      <description>&lt;p&gt;An LLM wrapper can hide in plain sight. I sat across from an AI startup founder a few weeks ago. He was pitching his platform. "AI-native workspace." "Agentic AI." Over 40 built-in agents. He leaned in and asked if I really understood what his system could do.&lt;/p&gt;

&lt;p&gt;I went home and signed up.&lt;/p&gt;

&lt;p&gt;Then I opened DevTools.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I found in the network tab
&lt;/h2&gt;

&lt;p&gt;The platform had an AI chat feature. I asked it a question. While it was answering, I watched the network requests in my browser.&lt;/p&gt;

&lt;p&gt;One request stood out. The platform's own frontend was sending my query to its backend with two extra fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"google/gemini-3.1-pro-preview"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="nl"&gt;"provider"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"openrouter"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's &lt;a href="https://openrouter.ai/" rel="noopener noreferrer"&gt;OpenRouter's&lt;/a&gt; model identifier format. The platform was routing my prompt through OpenRouter to Google's Gemini model. The same Gemini you can access directly through &lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt; for free, or through the API for &lt;a href="https://openrouter.ai/google/gemini-3.1-pro-preview" rel="noopener noreferrer"&gt;a few dollars per million tokens&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The platform charges tens of dollars per user, per month.&lt;/p&gt;

&lt;h2&gt;
  
  
  The denial
&lt;/h2&gt;

&lt;p&gt;I asked the AI agent directly: "Are you Gemini?"&lt;/p&gt;

&lt;p&gt;It responded: "I am not Gemini or affiliated with any other external AI provider."&lt;/p&gt;

&lt;p&gt;The system prompt instructs the model to deny its own identity. Meanwhile, the HTTP request payload contains the model name in plaintext. The network layer doesn't lie, even when the system prompt does.&lt;/p&gt;

&lt;h2&gt;
  
  
  "Not a wrapper" on the same page as "switch your model"
&lt;/h2&gt;

&lt;p&gt;Here's where it gets interesting. This platform's marketing page includes the phrase: "Intelligence baked into the Core. Not a wrapper."&lt;/p&gt;

&lt;p&gt;Three sections below that, on the same page: "Multi-Model Support: Switch your agent's brain between GPT-4o, Claude 3.5, or Gemini Pro."&lt;/p&gt;

&lt;p&gt;Read that again. If your AI's "brain" can be swapped between three different providers with a dropdown, that is the definition of a wrapper. Proprietary AI doesn't have a "switch model" button. That's like a restaurant claiming they cook everything from scratch while the menu says "choose your meal kit provider."&lt;/p&gt;

&lt;h2&gt;
  
  
  What the "agents" actually are
&lt;/h2&gt;

&lt;p&gt;The platform advertises "40+ built-in AI agents." I opened one to see the configuration.&lt;/p&gt;

&lt;p&gt;The entire agent profile consisted of three fields: an icon, a name ("Product Manager"), and a 100-character description ("Leads product development from conception to launch, balancing user needs with business goals.").&lt;/p&gt;

&lt;p&gt;No tools attached, no retrieval pipeline, no reasoning chain. The API response confirmed it: &lt;code&gt;tool_calls&lt;/code&gt; was empty, &lt;code&gt;knowledge_base&lt;/code&gt; was empty, &lt;code&gt;reasoning_steps&lt;/code&gt; was empty, &lt;code&gt;references&lt;/code&gt; was empty.&lt;/p&gt;

&lt;p&gt;These aren't agents. They're prompt templates with a role label. The model receives your question plus a one-liner about who it's supposed to be, and autocompletes. Every "agent" runs on the same model through the same passthrough. The only difference is the 100-character system instruction.&lt;/p&gt;

&lt;p&gt;Compare that to what an actual agentic system requires: &lt;a href="https://tokita.online/ai-agent-pre-action-gate-tutorial/" rel="noopener noreferrer"&gt;mechanical enforcement gates&lt;/a&gt;, model pinning, persistent memory, session continuity, tool orchestration, and the ability to refuse dangerous actions before they happen. That's infrastructure, not a dropdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to tell if your AI tool is a wrapper
&lt;/h2&gt;

&lt;p&gt;You don't need to reverse-engineer anything. Five checks, five minutes.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Check&lt;/th&gt;
&lt;th&gt;What to look for&lt;/th&gt;
&lt;th&gt;Wrapper signal&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model selector&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Can you switch between GPT, Claude, Gemini, or others?&lt;/td&gt;
&lt;td&gt;If yes, the AI is a routing layer, not proprietary.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BYOK option&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Is there a "Bring Your Own API Key" field in settings?&lt;/td&gt;
&lt;td&gt;If yes, the platform is forwarding your prompts to a third-party provider.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Developer docs&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Does the platform have a public API? Check /developers, /api, /docs.&lt;/td&gt;
&lt;td&gt;If all return 404, there's no programmatic access because there's nothing proprietary to expose.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Network tab&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Open DevTools (F12), send a message, check the request payload.&lt;/td&gt;
&lt;td&gt;Look for model identifiers like &lt;code&gt;gpt-4o&lt;/code&gt;, &lt;code&gt;claude-3.5-sonnet&lt;/code&gt;, &lt;code&gt;gemini-pro&lt;/code&gt;, or provider fields like &lt;code&gt;openrouter&lt;/code&gt;.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ask the AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Ask "What model are you?" then ask "Are you [specific model]?"&lt;/td&gt;
&lt;td&gt;If it deflects the first question and denies the second, a system prompt is hiding the identity.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If three or more of these checks light up, you're paying a subscription for a UI layer on top of an API you could call directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters beyond price
&lt;/h2&gt;

&lt;p&gt;The cost arbitrage is obvious. But there are two less visible problems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Your data takes an extra hop.&lt;/strong&gt; When you type into one of these platforms, your prompt travels through at least three parties: the wrapper, the routing service (like OpenRouter), and the model provider (like Google). Each party has its own &lt;a href="https://openrouter.ai/docs/guides/privacy/data-collection" rel="noopener noreferrer"&gt;data retention policy&lt;/a&gt;. OpenRouter doesn't store prompts by default, but metadata is always logged, and third-party provider policies vary. Most users never audit past the wrapper's own privacy page.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Token costs are uncontrolled.&lt;/strong&gt; Without &lt;a href="https://tokita.online/context-engineering-vs-prompt-engineering/" rel="noopener noreferrer"&gt;proper context engineering&lt;/a&gt;, every prompt goes raw to whichever model the dropdown points at. No caching, no routing intelligence, no cost gates. The user picks the most expensive model because it's at the top of the list, and the platform has no incentive to optimize because they're not paying the inference bill. You are, through your subscription, subsidizing unoptimized API calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wrappers aren't the problem. Dishonesty is.
&lt;/h2&gt;

&lt;p&gt;I want to be clear: there's nothing wrong with building on top of third-party models. I do it every day. My own research tooling routes queries through Gemini. My daily coding environment is a wrapper around Claude. I've adapted an open-source model (Gemma) for private, on-device mobile inference because the use case demanded it. Wrappers, when properly configured with cost controls, prompt engineering, and data governance, are a legitimate architecture.&lt;/p&gt;

&lt;p&gt;The problem is calling it "proprietary" when you're not. The problem is marketing "AI-native intelligence baked into the core" when the core is someone else's API behind a dropdown. The problem is instructing your model to deny its own identity when a customer asks.&lt;/p&gt;

&lt;p&gt;If you're building a great UI on top of Gemini and charging for the convenience, say that. Plenty of successful products do exactly that, honestly. But when you tell a customer your AI is "robust" and "not a wrapper" while your frontend sends &lt;code&gt;"provider": "openrouter"&lt;/code&gt; in every request payload, that's not marketing. That's misrepresentation.&lt;/p&gt;

&lt;h2&gt;
  
  
  The pattern is accelerating
&lt;/h2&gt;

&lt;p&gt;I wrote about &lt;a href="https://tokita.online/llm-wrappers-what-actually-matters/" rel="noopener noreferrer"&gt;how to evaluate AI tools beyond the marketing&lt;/a&gt; back in March. At the time, the wrapper problem was mostly startups padding their pitch decks. It's worse now. These platforms have real revenue, real customers, and real marketing budgets. They're profitable because the per-token cost of routing API calls is fractions of a cent while the per-seat subscription is dollars.&lt;/p&gt;

&lt;p&gt;This isn't going away. If anything, it's going to intensify as model APIs get cheaper and easier to integrate. The barrier to building a "40+ AI agent platform" is a weekend, an OpenRouter account, and a landing page.&lt;/p&gt;

&lt;p&gt;The fix isn't regulation or outrage. It's literacy. Open DevTools. Check the network tab. Read the request payload. The information is right there, in plaintext, every time you send a message.&lt;/p&gt;

&lt;p&gt;Your AI tool's identity isn't a secret. It's just one F12 away.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
      <category>webdev</category>
    </item>
    <item>
      <title>I Built a Live Disaster Tracker for the Philippines in a Weekend</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Wed, 10 Jun 2026 01:31:30 +0000</pubDate>
      <link>https://dev.to/tomtokita/i-built-a-live-disaster-tracker-for-the-philippines-in-a-weekend-114e</link>
      <guid>https://dev.to/tomtokita/i-built-a-live-disaster-tracker-for-the-philippines-in-a-weekend-114e</guid>
      <description>&lt;p&gt;The Philippines sits on the Pacific Ring of Fire and right in the typhoon belt. Earthquakes, typhoons, floods, volcanic eruptions, wildfires. If you live here, disaster season isn't a season. It's the calendar.&lt;/p&gt;

&lt;p&gt;When a typhoon hits the Philippines or an earthquake jolts you awake at 3 AM, you're checking PAGASA for storm bulletins, PHIVOLCS for seismic data, GDACS for severity alerts. Each one covers a single hazard type. None of them give you one clean view of what's happening. A bagyo is bearing down on Luzon and you're toggling between three tabs trying to figure out: how bad is it, and where?&lt;/p&gt;

&lt;p&gt;So I built a &lt;a href="https://sakuna.tokita.online/about/" rel="noopener noreferrer"&gt;Philippines disaster tracker called Sakuna&lt;/a&gt;. Real-time earthquake monitoring, typhoon tracking, fire hotspots, flood alerts, and volcanic activity on a single map and dashboard.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Weekend Warrior Approach
&lt;/h2&gt;

&lt;p&gt;I didn't block out a sprint for this. I had a pain point, a free Saturday, and a glass of whiskey. I sat down with Claude (my AI coding partner, via &lt;a href="https://claude.ai/code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;) and we mapped out the requirements in about ten minutes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Aggregate data from every free public disaster API that covers the Philippines&lt;/li&gt;
&lt;li&gt;Normalize everything into one schema so earthquakes, typhoons, and fires show up the same way&lt;/li&gt;
&lt;li&gt;Host it for free. Zero monthly cost.&lt;/li&gt;
&lt;li&gt;Ship it live. Not a side project that sits in a repo.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire thing went from a conversation to a production deployment in a weekend's worth of sessions. Not because I'm fast, but because the infrastructure was already there.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Stack: Cloudflare Free Tier, Zero Cost
&lt;/h2&gt;

&lt;p&gt;Everything runs on Cloudflare's free tier. No servers. No database bills. No Hostinger, no Vercel, no AWS.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CF Worker (cron every 15 min)
  ├── Fetches 5 public APIs
  ├── Normalizes events to unified schema
  ├── Filters to Philippines bounding box
  └── Stores results in R2 object storage

Cloudflare R2
  ├── events.json (all current disaster events)
  └── health metadata (per-API status + timestamps)

Cloudflare Pages (static frontend)
  ├── Leaflet dark-theme disaster map
  ├── Dashboard cards by hazard type
  ├── Filter chips (earthquake, typhoon, fire, volcano, flood, rain)
  └── Staleness indicator (LIVE / DELAYED / OFFLINE)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The worker runs on a cron trigger every 15 minutes and writes static JSON to R2. The frontend polls that cached JSON from Cloudflare Pages (which has unlimited free bandwidth), not the Worker itself. So the Worker stays well within the free tier's 100K requests/day limit, and the frontend can poll every 30 seconds without cost implications. Total Cloudflare bill: $0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Five Free APIs, One Unified Disaster Schema
&lt;/h2&gt;

&lt;p&gt;The hardest part wasn't the code. It was figuring out which government and scientific APIs actually work, which ones have usable data, and how to normalize them into something consistent.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;What It Covers&lt;/th&gt;
&lt;th&gt;Data Format&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://earthquake.usgs.gov/fdsnws/event/1/" rel="noopener noreferrer"&gt;USGS FDSNWS&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Earthquakes M2.0+ in the Philippines&lt;/td&gt;
&lt;td&gt;GeoJSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://eonet.gsfc.nasa.gov/api/v3/events" rel="noopener noreferrer"&gt;NASA EONET&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Volcanoes, wildfires, storms&lt;/td&gt;
&lt;td&gt;JSON&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.gdacs.org/" rel="noopener noreferrer"&gt;GDACS&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Multi-hazard severity alerts&lt;/td&gt;
&lt;td&gt;RSS/XML&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://firms.modaps.eosdis.nasa.gov/" rel="noopener noreferrer"&gt;NASA FIRMS&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Satellite fire hotspot detection&lt;/td&gt;
&lt;td&gt;CSV&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://pagasa.dost.gov.ph/" rel="noopener noreferrer"&gt;PAGASA&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;Philippine typhoon bulletins&lt;/td&gt;
&lt;td&gt;HTML (scraped)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Every event from every API gets normalized into one schema:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"usgs_us7000abcd"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"earthquake"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"M5.2 - 23km SE of Davao"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"high"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lat"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;6.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"lon"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;125.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"occurred_at"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-10T03:15:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://earthquake.usgs.gov/..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Severity mapping is specific to each data provider. Earthquakes use magnitude thresholds (M4.0+ = moderate, M5.0+ = high, M6.0+ = critical). GDACS uses its own green/orange/red alert system. FIRMS maps satellite confidence scores. PAGASA typhoon signal numbers translate directly to severity levels.&lt;/p&gt;

&lt;h2&gt;
  
  
  Graceful Degradation: When APIs Go Down
&lt;/h2&gt;

&lt;p&gt;This was a deliberate design decision. If NASA's FIRMS API goes down (it does, regularly), the other four keep working. Each data provider has its own health indicator on the dashboard. One going offline dims its status pill. The earthquake map, typhoon tracking, and flood data keep updating.&lt;/p&gt;

&lt;p&gt;For a disaster tracker, this matters. The worst time for a monitoring tool to break is exactly when people need it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Philippines Bounding Box Problem
&lt;/h2&gt;

&lt;p&gt;You can't just query these APIs globally and filter later. The data volumes are too high and the APIs rate-limit you. So every request is scoped to a geographic bounding box covering the Philippine archipelago: &lt;code&gt;[116.9, 4.5] to [127.0, 21.5]&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But the Philippines has a geographic quirk. The southern boundary overlaps with Malaysia's Sabah region. Without a second filter, you get fire hotspots from Borneo showing up as Philippine events.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;function&lt;/span&gt; &lt;span class="nf"&gt;inPhBbox&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;lon&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;21.5&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;lat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;4.5&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;lon&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mf"&gt;127.0&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nx"&gt;lon&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;116.9&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="c1"&gt;// Below lat 7: only include Sulu/Tawi-Tawi, exclude Sabah&lt;/span&gt;
  &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lat&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;7.0&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nx"&gt;lon&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;119.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Small detail, but it matters. Nobody wants to see Malaysian wildfires on a Philippines disaster map.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Didn't Take Longer
&lt;/h2&gt;

&lt;p&gt;I've been running &lt;a href="https://claude.ai/code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; as my primary development tool for months. Over time I've built an operational layer around it: persistent memory, deployment pipelines, API connector patterns, and gates that catch mistakes before they ship.&lt;/p&gt;

&lt;p&gt;When I sat down to build Sakuna, I didn't start from zero. The AI already had &lt;a href="https://tokita.online/context-engineering-vs-prompt-engineering/" rel="noopener noreferrer"&gt;context&lt;/a&gt; on my Cloudflare patterns, my code conventions, and my deploy commands. And the &lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;harness&lt;/a&gt; I'd built around it handled the operational side: verifying deploy targets, enforcing structure, preventing the kind of drift that turns a weekend project into a weekend debugging session.&lt;/p&gt;

&lt;p&gt;I didn't build fast because I typed fast. I built fast because months of workflow engineering compressed the distance between "I have an idea" and "it's in production."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Result
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://sakuna.tokita.online" rel="noopener noreferrer"&gt;Sakuna&lt;/a&gt; is live, free, and serving real data to anyone who needs it. It tracks earthquakes in the Philippines in real time, monitors active typhoons, detects fire hotspots via satellite, and surfaces flood and volcanic alerts. It auto-updates every 15 minutes. It costs nothing to run.&lt;/p&gt;

&lt;p&gt;Is it perfect? No. The PAGASA typhoon parser is regex-based and fragile against page redesigns. PHIVOLCS isn't integrated yet because their SSL certificate has issues. There's no historical archive. These are known limitations, not surprises.&lt;/p&gt;

&lt;p&gt;But it works. It's useful. And it shipped.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Weekend Ritual
&lt;/h2&gt;

&lt;p&gt;I keep coming back to Sakuna between other projects. Add marker clustering one evening. Tighten the bounding box to exclude Sabah fire noise another morning. Build a crawlable about page with FAQ schema over lunch. Wire up a rainfall overlay from Open-Meteo weather stations on a Sunday.&lt;/p&gt;

&lt;p&gt;It's become one of my weekend rituals. Find something that could be better about the tracker, plan the fix with Claude, ship it before dinner.&lt;/p&gt;

&lt;p&gt;The point isn't that I built a disaster tracker. The point is that the tools exist to go from a pain point to production in a weekend, if your workflow is set up for it. The APIs are free. The hosting is free. The AI assistant costs less than a coffee subscription. The only real cost is your time and maybe a decent whiskey.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://sakuna.tokita.online" rel="noopener noreferrer"&gt;Sakuna is live at sakuna.tokita.online&lt;/a&gt;. If you're in the Philippines during the next typhoon or earthquake, it might be useful. If you're a developer with a side project idea collecting dust, maybe this is the nudge.&lt;/p&gt;

&lt;p&gt;What free APIs or Cloudflare patterns have you used for your own weekend projects? I'm always looking for new data pipelines to wire in.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;&lt;a href="https://tokita.online/bio/" rel="noopener noreferrer"&gt;Tom Tokita&lt;/a&gt; is an AI Operations Architect in Manila. He writes about what works (and what breaks) when you put AI agents into production.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>cloudflare</category>
      <category>javascript</category>
      <category>api</category>
      <category>webdev</category>
    </item>
    <item>
      <title>Hackers Didn't Break Into Instagram. They Exposed the Biggest Agentic AI Security Risk in Production.</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Wed, 03 Jun 2026 05:35:41 +0000</pubDate>
      <link>https://dev.to/tomtokita/hackers-didnt-break-into-instagram-they-exposed-the-biggest-agentic-ai-security-risk-in-4j2j</link>
      <guid>https://dev.to/tomtokita/hackers-didnt-break-into-instagram-they-exposed-the-biggest-agentic-ai-security-risk-in-4j2j</guid>
      <description>&lt;p&gt;Nobody hacked Instagram. What happened was worse: an AI chatbot security failure that let attackers walk through the front door.&lt;/p&gt;

&lt;p&gt;That needs to be the first thing you understand about what happened on June 1, 2026. There was no zero-day exploit. No SQL injection. No brute-force password cracking. Hackers &lt;a href="https://krebsonsecurity.com/2026/06/hackers-used-metas-ai-support-bot-to-seize-instagram-accounts/" rel="noopener noreferrer"&gt;used a VPN to fake their location&lt;/a&gt;, opened Meta's AI support chatbot, and asked it to change the email on someone else's account.&lt;/p&gt;

&lt;p&gt;The bot did it.&lt;/p&gt;

&lt;p&gt;It sent a verification code to the attacker's email. The attacker verified it. Then they got a password reset link. That was the entire exploit. Instructions for doing it &lt;a href="https://krebsonsecurity.com/2026/06/hackers-used-metas-ai-support-bot-to-seize-instagram-accounts/" rel="noopener noreferrer"&gt;circulated on Telegram&lt;/a&gt; within hours. High-profile accounts fell fast: the &lt;a href="https://www.404media.co/hackers-simply-asked-meta-ai-to-give-them-access-to-high-profile-instagram-accounts-it-worked/" rel="noopener noreferrer"&gt;Obama-era White House Instagram&lt;/a&gt; was defaced with pro-Iran content. The Chief Master Sergeant of the U.S. Space Force lost access. Jane Manchun Wong, a former Meta security engineer, had her &lt;a href="https://x.com/wongmjane/status/2061456887959474393" rel="noopener noreferrer"&gt;password changed without her knowledge&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Meta spokesperson Andy Stone &lt;a href="https://x.com/andymstone/status/2061486724199379186" rel="noopener noreferrer"&gt;confirmed the vulnerability was real&lt;/a&gt; and said they were "securing impacted accounts."&lt;/p&gt;

&lt;p&gt;One user on X summed it up better than any post-mortem could: "We're at the point where one AI stole it and another can't fix it, &lt;a href="https://www.bbc.com/news/articles/c98rzr72dpyo" rel="noopener noreferrer"&gt;zero humans in the loop anywhere&lt;/a&gt;."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Instagram AI Hack Exposed a Deeper Pattern of Autonomous AI Risks
&lt;/h2&gt;

&lt;p&gt;The Instagram AI hack isn't an isolated incident. It's a symptom of a deeper set of autonomous AI risks that the industry keeps ignoring. The pattern always looks the same: an AI system with too much authority, too little verification, and no human checkpoint between intent and execution.&lt;/p&gt;

&lt;p&gt;You've seen this before.&lt;/p&gt;

&lt;p&gt;OpenClaw gave dozens of autonomous agents access to OpenAI's API with no budget gates. The result was a &lt;a href="https://tokita.online/openclaw-ai-agent-cost-reality/" rel="noopener noreferrer"&gt;$1.3 million bill&lt;/a&gt; that nobody noticed until the invoice arrived. Different domain, same architecture: agents running without boundaries, consequences discovered after the damage.&lt;/p&gt;

&lt;p&gt;A startup called PocketOS gave an AI agent write access to a production database with no pre-action gate. The agent &lt;a href="https://tokita.online/ai-agent-production-safety/" rel="noopener noreferrer"&gt;deleted everything in 9 seconds&lt;/a&gt;. There was no confirmation step, no rollback trigger, no human checkpoint.&lt;/p&gt;

&lt;p&gt;Security researchers found &lt;a href="https://tokita.online/ai-supply-chain-attack-575-malicious-skills/" rel="noopener noreferrer"&gt;575 malicious AI skills&lt;/a&gt; published to open registries. Tools that looked legitimate but contained prompt injection payloads, credential harvesting, and data exfiltration. The trust model was: if it's in the registry, it's safe. Nobody verified.&lt;/p&gt;

&lt;p&gt;Four incidents. Four different consequences. One architectural failure.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Incident&lt;/th&gt;
&lt;th&gt;What Failed&lt;/th&gt;
&lt;th&gt;AI Guardrail That Prevents It&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Meta Instagram AI hack&lt;/td&gt;
&lt;td&gt;No identity verification on account changes&lt;/td&gt;
&lt;td&gt;Human-in-the-loop for identity operations&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;OpenClaw $1.3M bill&lt;/td&gt;
&lt;td&gt;No token budget limits on autonomous agents&lt;/td&gt;
&lt;td&gt;Consumption governance with per-agent caps&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;PocketOS database deletion&lt;/td&gt;
&lt;td&gt;No pre-action gate on destructive operations&lt;/td&gt;
&lt;td&gt;Pre-action confirmation for write/delete&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;575 malicious AI skills&lt;/td&gt;
&lt;td&gt;No provenance checks on tool registry&lt;/td&gt;
&lt;td&gt;Supply chain verification&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Why AI Chatbot Security Fails: The Guru Dream vs. Production Reality
&lt;/h2&gt;

&lt;p&gt;The AI influencer pitch goes like this: deploy autonomous agents, remove humans from the loop, let the AI handle it. Scale your support team with chatbots. Replace your QA with agents. Automate your entire deployment pipeline. The future is autonomous everything.&lt;/p&gt;

&lt;p&gt;That pitch sounds compelling until you see what happens when it ships.&lt;/p&gt;

&lt;p&gt;Meta replaced human support staff with an AI chatbot to handle account recovery. Account recovery is one of the most sensitive operations on any platform because the person asking for access may not be the owner. Marijus Briedis, CTO of NordVPN, &lt;a href="https://www.bbc.com/news/articles/c98rzr72dpyo" rel="noopener noreferrer"&gt;put it plainly&lt;/a&gt;: when AI chatbots have "too much authority and too little verification, they can become a serious security risk."&lt;/p&gt;

&lt;p&gt;This is the meta AI vulnerability in plain language: too much authority, no verification checkpoint, no human override.&lt;/p&gt;

&lt;p&gt;The guru pitch consistently leaves this out. &lt;a href="https://tokita.online/autonomous-ai-agents-production-cost/" rel="noopener noreferrer"&gt;Autonomous agents fail in production&lt;/a&gt; not because the models are bad, but because the &lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;harness is missing&lt;/a&gt;. The models will do exactly what you ask them to do. That's the problem. If you ask a chatbot to change an email address and it has the authority to do so, it will. It won't stop to wonder whether you should be making that request.&lt;/p&gt;

&lt;p&gt;The agentic AI security risks aren't theoretical. They're the documented, repeated consequence of deploying AI systems without gates.&lt;/p&gt;

&lt;h2&gt;
  
  
  If Meta's AI Vulnerability Exposed Millions, What About Your AI Agents?
&lt;/h2&gt;

&lt;p&gt;Meta is &lt;a href="https://www.bbc.com/news/articles/c98rzr72dpyo" rel="noopener noreferrer"&gt;one of the most valuable tech companies on the planet&lt;/a&gt;. They employ some of the best security engineers in the world. They have red teams, bug bounties, and incident response playbooks that most organizations can only dream about.&lt;/p&gt;

&lt;p&gt;And their AI support chatbot was tricked with a VPN and a politely worded request.&lt;/p&gt;

&lt;p&gt;Now think about the solo developer who watched a YouTube tutorial on building AI agents last month. Someone who learned to &lt;a href="https://tokita.online/vibe-coding-risks-vercel-breach/" rel="noopener noreferrer"&gt;vibe code&lt;/a&gt; an LLM into an API, built a prototype over a weekend, showed it to a client, and is now planning to deploy it. No pre-action gate. No human-in-the-loop for sensitive operations. No &lt;a href="https://tokita.online/context-engineering-vs-prompt-engineering/" rel="noopener noreferrer"&gt;context engineering&lt;/a&gt; to constrain what the agent can access. No token budget to limit runaway costs. No drift detection to catch when the agent starts behaving differently from what was intended.&lt;/p&gt;

&lt;p&gt;That developer isn't negligent. They just never learned the fundamentals because the fundamentals aren't what gets amplified. The conference talks are about what AI can do, not what it shouldn't be allowed to do unsupervised.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI Guardrails That Would Have Stopped Every Incident in This Article
&lt;/h2&gt;

&lt;p&gt;This isn't a "don't use AI" argument. AI agents are powerful tools. I run multiple AI systems in production daily and they do real work. But they work because they run inside a &lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;harness&lt;/a&gt; with mechanical constraints, not because they're trustworthy by default.&lt;/p&gt;

&lt;p&gt;Here's a list of AI guardrails that would have stopped every incident above. None of these are new. They've just been drowned out by hype.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Pre-action gates.&lt;/strong&gt; Every sensitive operation needs a verification step before execution. &lt;a href="https://tokita.online/ai-agent-pre-action-gate-tutorial/" rel="noopener noreferrer"&gt;Here's how to build one&lt;/a&gt;. Account changes, data deletion, financial transactions, deployment commands. None of these should execute on a single request without verification.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human-in-the-loop for identity operations.&lt;/strong&gt; If a process determines who has access to what, a human must be in the decision chain. This isn't optional. Meta learned this the hard way.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context boundaries.&lt;/strong&gt; An AI agent should only access what it needs for the current task. Meta's support bot had write access to email addresses on any account. That's an authorization failure before it's an AI failure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumption governance.&lt;/strong&gt; &lt;a href="https://tokita.online/tokenmaxxing-enterprise-ai-cost-crisis/" rel="noopener noreferrer"&gt;Token costs are real&lt;/a&gt; and compound fast. Budget caps, per-agent limits, and alert thresholds aren't overhead. They're infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Supply chain verification.&lt;/strong&gt; Every tool, plugin, and skill in your agent's registry needs provenance checks. Trusting by default &lt;a href="https://tokita.online/ai-supply-chain-attack-575-malicious-skills/" rel="noopener noreferrer"&gt;is the new attack surface&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Drift detection.&lt;/strong&gt; Agents change behavior as models update, prompts shift, and context windows compress. If you aren't monitoring for behavioral drift, you won't know your system has degraded until a user tells you. Or until it shows up on X.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The gurus will tell you these are easy to implement. They aren't. Each one takes real iteration: building the gate, testing it against actual edge cases, discovering the scenarios you didn't anticipate, and testing again. Automated test suites catch regressions. They don't catch the moment an AI agent interprets a legitimate-looking request in a way no one predicted. These are critical security functions. They need human eyes, human judgment, and human testing before they go anywhere near production. Over-reliance on agentic automation to validate agentic automation is how you end up right back where Meta started.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agentic AI Security Risks Are Architectural, Not Theoretical
&lt;/h2&gt;

&lt;p&gt;Every incident in this article was preventable. Not with better models. Not with bigger budgets. With fundamentals that take days to learn and hours to implement.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://tokita.online/why-multi-agent-ai-fails/" rel="noopener noreferrer"&gt;multi-agent swarm pitch&lt;/a&gt; will keep getting recycled. The next AI chatbot vulnerability will happen. Another startup will give an agent write access to something it shouldn't have. These aren't predictions. They're extrapolations from a pattern that hasn't changed.&lt;/p&gt;

&lt;p&gt;Agentic AI security risks are architectural problems. They don't get solved by better prompts or smarter models. They get solved by &lt;a href="https://tokita.online/best-llm-for-each-task/" rel="noopener noreferrer"&gt;choosing the right tool for the job&lt;/a&gt;, constraining what that tool can do, and building the verification layers that keep it honest.&lt;/p&gt;

&lt;p&gt;The industry doesn't need more autonomous AI demos. It needs practitioners who understand agentic AI security risks before they build the first agent. People who've read about the failures and internalized the architecture that prevents them.&lt;/p&gt;

&lt;p&gt;If you're building AI systems, start with the constraints. The capabilities are easy. The guardrails aren't optional. They're the product.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What are agentic AI security risks?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Agentic AI security risks are the vulnerabilities that emerge when AI systems have execution authority without verification checkpoints. They include unauthorized actions (Meta's chatbot changing emails without identity verification), uncontrolled spending (OpenClaw's &lt;a href="https://tokita.online/openclaw-ai-agent-cost-reality/" rel="noopener noreferrer"&gt;$1.3M bill&lt;/a&gt; from ungoverned agents), data destruction (&lt;a href="https://tokita.online/ai-agent-production-safety/" rel="noopener noreferrer"&gt;PocketOS's 9-second database deletion&lt;/a&gt;), and supply chain poisoning (&lt;a href="https://tokita.online/ai-supply-chain-attack-575-malicious-skills/" rel="noopener noreferrer"&gt;575 malicious AI skills&lt;/a&gt; in open registries).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What AI guardrails should developers implement?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;At minimum: &lt;a href="https://tokita.online/ai-agent-pre-action-gate-tutorial/" rel="noopener noreferrer"&gt;pre-action gates&lt;/a&gt; on sensitive operations, human-in-the-loop for identity and access decisions, &lt;a href="https://tokita.online/context-engineering-vs-prompt-engineering/" rel="noopener noreferrer"&gt;context boundaries&lt;/a&gt; that limit what an agent can reach, &lt;a href="https://tokita.online/tokenmaxxing-enterprise-ai-cost-crisis/" rel="noopener noreferrer"&gt;consumption governance&lt;/a&gt; with per-agent token budgets, supply chain verification for all tools and plugins, and behavioral drift detection. These aren't advanced techniques. They're fundamentals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How did hackers exploit Meta's AI chatbot on Instagram?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Attackers used a VPN to spoof the account holder's location, then asked Meta's AI support assistant to link a new email to the target account. The chatbot &lt;a href="https://krebsonsecurity.com/2026/06/hackers-used-metas-ai-support-bot-to-seize-instagram-accounts/" rel="noopener noreferrer"&gt;complied without verifying identity&lt;/a&gt;, sent a verification code to the attacker's email, and enabled a password reset. No technical exploit was required. The AI had the authority to make account changes and no &lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;guardrail&lt;/a&gt; to stop it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can autonomous AI agents be deployed safely?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Yes, but only with the right &lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;harness architecture&lt;/a&gt;. The problem isn't autonomy itself. It's autonomy without constraints. &lt;a href="https://tokita.online/autonomous-ai-agents-production-cost/" rel="noopener noreferrer"&gt;Autonomous agents fail in production&lt;/a&gt; when they're given authority without verification gates, budget limits, or human oversight on sensitive operations. Build the constraints first, then add capabilities.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tom Tokita is the president of Aether Global Technology Inc. and builds production AI operations systems that route between multiple LLMs daily. He writes about what works and what breaks at &lt;a href="https://tokita.online" rel="noopener noreferrer"&gt;tokita.online&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>programming</category>
      <category>devops</category>
    </item>
    <item>
      <title>Tokenmaxxing Is a Symptom. Here's the Disease Every Enterprise Is Ignoring.</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Thu, 28 May 2026 07:21:54 +0000</pubDate>
      <link>https://dev.to/tomtokita/tokenmaxxing-is-a-symptom-heres-the-disease-every-enterprise-is-ignoring-44f4</link>
      <guid>https://dev.to/tomtokita/tokenmaxxing-is-a-symptom-heres-the-disease-every-enterprise-is-ignoring-44f4</guid>
      <description>&lt;p&gt;NVIDIA's vice president of applied deep learning, Bryan Catanzaro, said something in an &lt;a href="https://www.techspot.com/news/112209-ai-compute-costs-getting-high-they-starting-rival.html" rel="noopener noreferrer"&gt;Axios interview in April 2026&lt;/a&gt; that should have stopped every enterprise AI roadmap cold:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"For my team, the cost of compute is far beyond the costs of the employees."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is not a critic talking. That is the VP of the company selling the chips that power every AI datacenter on the planet. When NVIDIA's own leadership admits compute outweighs payroll, the "AI will save you money" narrative has a problem.&lt;/p&gt;

&lt;p&gt;But most companies missed the signal. They were too busy tokenmaxxing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Microsoft Pulled the Plug on Claude Code
&lt;/h2&gt;

&lt;p&gt;In May 2026, Microsoft &lt;a href="https://www.windowscentral.com/microsoft/microsoft-cancels-claude-code-licenses-shifting-developers-to-github-copilot-cli-a-move-likely-driven-by-financial-motives" rel="noopener noreferrer"&gt;began cancelling the majority of its internal Claude Code licenses&lt;/a&gt;, redirecting thousands of engineers to GitHub Copilot CLI instead. The reversal came six months after the company opened broad access to Claude Code across its Experiences + Devices division, the group responsible for Windows, Microsoft 365, Outlook, Teams, and Surface.&lt;/p&gt;

&lt;p&gt;Adoption was fast. Engineers, project managers, and designers embraced it for prototyping and development. The problem wasn't the tool. It was token-based pricing at enterprise scale with no consumption governance. Monthly bills became unpredictable and high enough to trigger a fiscal-year-end pullback.&lt;/p&gt;

&lt;p&gt;Microsoft's $5 billion Foundry deal with Anthropic and Anthropic's $30 billion Azure compute commitment both remain intact. Not a relationship break. A cost-control correction.&lt;/p&gt;

&lt;p&gt;A company with functionally unlimited resources still could not absorb uncapped AI token spend across thousands of users. That should tell you something.&lt;/p&gt;

&lt;h2&gt;
  
  
  Uber Burned Its Entire 2026 AI Budget by April
&lt;/h2&gt;

&lt;p&gt;Uber's CTO, Praveen Neppalli Naga, &lt;a href="https://www.forbes.com/sites/janakirammsv/2026/05/17/uber-burns-its-2026-ai-budget-in-four-months-on-claude-code/" rel="noopener noreferrer"&gt;confirmed to The Information&lt;/a&gt; in April 2026 that the company had exhausted its entire annual AI coding tools budget in four months. Claude Code was rolled out in December 2025. Adoption climbed from 32% of engineers in February to &lt;a href="https://www.forbes.com/sites/janakirammsv/2026/05/17/uber-burns-its-2026-ai-budget-in-four-months-on-claude-code/" rel="noopener noreferrer"&gt;84% classified as agentic coding users by March&lt;/a&gt;. By spring, 95% were using AI tools monthly, roughly 70% of committed code originated from those tools, and 11% of live backend updates were written by agents with no human in the loop.&lt;/p&gt;

&lt;p&gt;The per-engineer cost: &lt;a href="https://www.forbes.com/sites/janakirammsv/2026/05/17/uber-burns-its-2026-ai-budget-in-four-months-on-claude-code/" rel="noopener noreferrer"&gt;$150 to $250 per month on average&lt;/a&gt;, with power users running between $500 and $2,000. Naga himself reported spending $1,200 in a two-hour demo session. The tool didn't fail. Engineers didn't misuse it. They used it for exactly the workloads it was designed to handle. From a productivity standpoint the rollout was a success. From a finance standpoint it was a runaway.&lt;/p&gt;

&lt;p&gt;Uber compounded the dynamic by &lt;a href="https://www.forbes.com/sites/janakirammsv/2026/05/17/uber-burns-its-2026-ai-budget-in-four-months-on-claude-code/" rel="noopener noreferrer"&gt;ranking engineers on internal leaderboards&lt;/a&gt; based on Claude Code usage. That created a cultural incentive to consume more tokens. The teams driving adoption were not the same teams managing the spend.&lt;/p&gt;

&lt;p&gt;They measured who was using AI. They never measured what it cost per unit of output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Tokenmaxxing: When the Metric Becomes the Game
&lt;/h2&gt;

&lt;p&gt;The term "tokenmaxxing" describes employees running trivial or unnecessary tasks through AI tools to inflate their usage numbers. Amazon employees &lt;a href="https://futurism.com/artificial-intelligence/amazon-quotas-ai-use" rel="noopener noreferrer"&gt;admitted to the practice&lt;/a&gt; in May 2026 after the company set internal AI usage targets and tracked consumption through leaderboards. Workers reported feeling pressure to hit token quotas, even though Amazon publicly stated the numbers would not factor into performance reviews.&lt;/p&gt;

&lt;p&gt;At Meta, the same dynamic played out through an internal tracking tool called "Claudeonomics," which ranked employees by their AI token consumption. The leaderboard reportedly &lt;a href="https://fortune.com/2026/04/09/meta-killed-employee-ai-token-dashboard/" rel="noopener noreferrer"&gt;showed 60 trillion tokens consumed in a 30-day period&lt;/a&gt; before Meta killed it after media coverage.&lt;/p&gt;

&lt;p&gt;This is Goodhart's Law in real time. The moment token consumption became a tracked metric, it stopped being a useful measure of anything. Employees optimized for the number, not for the work the number was supposed to represent.&lt;/p&gt;

&lt;p&gt;Tokenmaxxing isn't an employee behavior problem. It is a governance design failure. If you measure consumption without measuring value, you get consumption without value.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Goldman Sachs Math That Should Scare Every CFO
&lt;/h2&gt;

&lt;p&gt;Goldman Sachs &lt;a href="https://www.goldmansachs.com/insights/articles/ai-agents-forecast-to-boost-tech-cash-flow-as-usage-soars" rel="noopener noreferrer"&gt;published a research report&lt;/a&gt; forecasting that agentic AI will drive a 24-fold increase in global token consumption by 2030, reaching 120 quadrillion tokens per month. Their breakdown: a standard chatbot consumes roughly 1,000 tokens per session. An embedded copilot uses over 5,000 tokens per day. A continuously active autonomous agent burns through 100,000 or more tokens per day.&lt;/p&gt;

&lt;p&gt;NVIDIA CEO Jensen Huang has said he expects &lt;a href="https://businesschief.com/news/jensen-huang-nvidia-will-have-100-ai-agents-for-each-worker" rel="noopener noreferrer"&gt;100 AI agents working alongside every human employee&lt;/a&gt; at NVIDIA by 2036.&lt;/p&gt;

&lt;p&gt;Do the multiplication. 100 agents per employee, at 100,000 tokens per day per agent, is 10 million tokens per employee per day. Multiply that by any mid-size engineering team and the numbers become absurd before you even discuss pricing.&lt;/p&gt;

&lt;p&gt;Gartner projects that by 2030, inference costs on a one-trillion-parameter model will be &lt;a href="https://www.gartner.com/en/newsroom/press-releases/2026-03-25-gartner-predicts-that-by-2030-performing-inference-on-an-llm-with-1-trillion-parameters-will-cost-genai-providers-over-90-percent-less-than-in-2025" rel="noopener noreferrer"&gt;over 90% cheaper than in 2025&lt;/a&gt;. But their own analyst, Will Sommer, &lt;a href="https://www.gartner.com/en/newsroom/press-releases/2026-03-25-gartner-predicts-that-by-2030-performing-inference-on-an-llm-with-1-trillion-parameters-will-cost-genai-providers-over-90-percent-less-than-in-2025" rel="noopener noreferrer"&gt;cautioned&lt;/a&gt;: "Chief Product Officers should not confuse the deflation of commodity tokens with the democratization of frontier reasoning." Agentic models require 5 to 30 times more tokens per task than standard models. Consumption growth will outpace falling unit costs. And AI providers are not going to pass through the full savings.&lt;/p&gt;

&lt;p&gt;Cheaper tokens, more tokens per task, exploding number of tasks. The bill goes up.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern Is Obvious. The Fix Is Not Complicated.
&lt;/h2&gt;

&lt;p&gt;Microsoft, Uber, Amazon, Meta. Four of the most technically sophisticated companies on earth. All hit the same wall. The pattern:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Executive mandate pushes broad AI adoption&lt;/li&gt;
&lt;li&gt;Leaderboards or usage metrics track consumption volume&lt;/li&gt;
&lt;li&gt;No mechanism ties consumption to business value&lt;/li&gt;
&lt;li&gt;Token-based pricing creates unpredictable, escalating costs&lt;/li&gt;
&lt;li&gt;Budget blowout triggers reactive pullback or cancellation&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The disease is not AI. The disease is adoption without governance. No consumption gates, no cost ceilings, and no way to tie a token to a deliverable.&lt;/p&gt;

&lt;p&gt;I &lt;a href="https://tokita.online/ai-agent-pre-action-gate-tutorial/" rel="noopener noreferrer"&gt;wrote about pre-action gates&lt;/a&gt; and &lt;a href="https://tokita.online/ai-agent-production-safety/" rel="noopener noreferrer"&gt;agent production safety&lt;/a&gt; months before these headlines. The principle is the same whether you are running 100 Codex agents like &lt;a href="https://tokita.online/openclaw-ai-agent-cost-reality/" rel="noopener noreferrer"&gt;OpenClaw's $1.3 million month&lt;/a&gt; or deploying Claude Code across 10,000 engineers. If there is no gate between the request and the spend, the spend wins.&lt;/p&gt;

&lt;p&gt;The companies that will survive the agentic era are not the ones that adopt fastest. They are the ones that &lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;build harnesses&lt;/a&gt; before they build agents. Measure output, not tokens. Set cost ceilings per user, per team, per task category. Attribute consumption to deliverables, not leaderboard positions.&lt;/p&gt;

&lt;p&gt;Tokenmaxxing is what happens when you skip that step.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>enterprise</category>
      <category>governance</category>
      <category>tokenmaxxing</category>
    </item>
    <item>
      <title>OpenClaw's $1.3 Million OpenAI Bill: What AI Agents Actually Cost in Production</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Thu, 21 May 2026 00:51:27 +0000</pubDate>
      <link>https://dev.to/tomtokita/openclaws-13-million-openai-bill-what-ai-agents-actually-cost-in-production-3d9o</link>
      <guid>https://dev.to/tomtokita/openclaws-13-million-openai-bill-what-ai-agents-actually-cost-in-production-3d9o</guid>
      <description>&lt;p&gt;Peter Steinberger spent a decade building &lt;a href="https://pspdfkit.com/" rel="noopener noreferrer"&gt;PSPDFKit&lt;/a&gt; into a PDF framework running on over a billion devices. He &lt;a href="https://steipete.me/posts/2026/openclaw" rel="noopener noreferrer"&gt;joined OpenAI in February 2026&lt;/a&gt;, saying "I want to change the world, not build a large company." A few months later, his open-source project &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;OpenClaw&lt;/a&gt;, the fastest-growing project in GitHub history with over 300,000 stars and 3.2 million users, racked up an OpenAI bill of &lt;a href="https://thenextweb.com/news/openclaw-peter-steinberger-1-3-million-openai-token-bill" rel="noopener noreferrer"&gt;$1,305,088.81 in a single month&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;603 billion tokens. 7.6 million API requests. 100 Codex agents running simultaneously. The OpenClaw cost breakdown is the first real look at what autonomous AI agents cost in production.&lt;/p&gt;

&lt;p&gt;That's $13,000 per agent per month.&lt;/p&gt;

&lt;p&gt;And OpenAI is covering the bill as a "research investment." Regular companies don't get that deal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The OpenClaw Cost Breakdown
&lt;/h2&gt;

&lt;p&gt;OpenClaw is a &lt;a href="https://github.com/openclaw/openclaw" rel="noopener noreferrer"&gt;self-hosted autonomous AI assistant&lt;/a&gt;. It connects to your email, calendar, browser, Slack, Discord, WhatsApp, and iMessage. Agents execute shell commands, manage files, automate web tasks through a growing &lt;a href="https://github.com/openclaw/clawhub" rel="noopener noreferrer"&gt;skill registry&lt;/a&gt;. The 100 agents running on Steinberger's setup were doing real work. Reviewing pull requests, scanning commits for security vulnerabilities, deduplicating GitHub issues, writing and submitting fixes, monitoring performance benchmarks, even attending meetings and generating feature PRs.&lt;/p&gt;

&lt;p&gt;This wasn't a demo. This was production. The distinction matters, because every guru demo stops before the billing cycle starts.&lt;/p&gt;

&lt;p&gt;The primary model was GPT-5.5 running in Fast Mode, which consumed tokens at higher rates. Steinberger noted that &lt;a href="https://thenextweb.com/news/openclaw-peter-steinberger-1-3-million-openai-token-bill" rel="noopener noreferrer"&gt;disabling Fast Mode would drop the bill to roughly $300,000 per month&lt;/a&gt;. A 70% reduction. Still $3,000 per agent per month at the "optimized" rate. Still $3.6 million annually.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters More Than the Headline
&lt;/h2&gt;

&lt;p&gt;The headline number is dramatic, but the per-agent cost is the real story.&lt;/p&gt;

&lt;p&gt;$13,000 per month per agent on full pricing. $3,000 per month on optimized pricing. These aren't projections from a whitepaper. These are invoiced numbers from someone who works at OpenAI running agents on OpenAI's own infrastructure.&lt;/p&gt;

&lt;p&gt;Now think about the gap between Steinberger and a newcomer. He's an experienced engineer who built billion-device software. He has OpenAI's internal knowledge. He has a "research investment" subsidy covering the bill. He knows to disable Fast Mode for a 70% cost reduction.&lt;/p&gt;

&lt;p&gt;A first-time builder doesn't know any of that. They'll hit the high-rate pricing, run agents longer than necessary, retry failed calls without cost caps, and discover the bill at the end of the month. If Steinberger's optimized setup costs $3,000 per agent, a newcomer's unoptimized setup will cost more. Possibly much more.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Guru Problem
&lt;/h2&gt;

&lt;p&gt;Scroll through YouTube and LinkedIn right now. "Deploy AI agents for your business." "Build an autonomous AI workforce." "Replace your team with agents." The pitch is seductive. Agents are cheap, they scale, they work while you sleep.&lt;/p&gt;

&lt;p&gt;Nobody mentions $13,000 per month per agent.&lt;/p&gt;

&lt;p&gt;Nobody mentions that 100 agents running GPT-5.5 burn through 603 billion tokens in 30 days. Nobody mentions that "Fast Mode" isn't just faster, it's dramatically more expensive. And nobody talks about how even the optimized version, built by someone who works at the company that makes the model, still costs $3.6 million per year.&lt;/p&gt;

&lt;p&gt;The gap between what's being sold and what's being spent is the widest I've seen in tech. And it's widest for the people with the least ability to absorb the surprise. Small businesses, indie developers, and first-time builders who took the guru at their word.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Practitioners Already Knew
&lt;/h2&gt;

&lt;p&gt;I &lt;a href="https://tokita.online/autonomous-ai-agents-production-cost/" rel="noopener noreferrer"&gt;wrote about this months ago&lt;/a&gt;. Autonomous AI agents look great in demos and burn cash in production. The OpenClaw numbers validate what practitioners already knew. The question was never "can agents do the work?" It was always "can you afford to let them?"&lt;/p&gt;

&lt;p&gt;When I build AI systems, cost control isn't an afterthought. It's architecture. It's why &lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;harness engineering&lt;/a&gt; exists as a discipline. The &lt;a href="https://tokita.online/claude-code-mcp-server-persistent-memory/" rel="noopener noreferrer"&gt;memory server I run&lt;/a&gt; has a condensation layer specifically because raw search results were burning through the context window. Hundreds of thousands of characters of raw output compressed to a few thousand. That's not clever engineering. That's survival. Without it, every session would have been its own version of Steinberger's bill, just at a smaller scale.&lt;/p&gt;

&lt;p&gt;I co-founded &lt;a href="https://aether-global.com" rel="noopener noreferrer"&gt;Aether Global Technology&lt;/a&gt;, a Salesforce consulting partner in Manila. When clients ask about AI agent deployment, the first conversation isn't about what the agent can do. It's about what the agent will cost per month, and what happens when it runs unsupervised for a weekend.&lt;/p&gt;

&lt;p&gt;Most agent frameworks ship without cost caps, token budgets, or kill switches. The &lt;a href="https://tokita.online/why-multi-agent-ai-fails/" rel="noopener noreferrer"&gt;agent swarming piece I wrote&lt;/a&gt; covers why multi-agent coordination fails in production. The OpenClaw bill is what that failure looks like in dollars.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Math
&lt;/h2&gt;

&lt;p&gt;Let's do the math the gurus won't.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;Monthly Cost&lt;/th&gt;
&lt;th&gt;Annual Cost&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1 agent (full pricing)&lt;/td&gt;
&lt;td&gt;$13,000&lt;/td&gt;
&lt;td&gt;$156,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1 agent (optimized)&lt;/td&gt;
&lt;td&gt;$3,000&lt;/td&gt;
&lt;td&gt;$36,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10 agents (optimized)&lt;/td&gt;
&lt;td&gt;$30,000&lt;/td&gt;
&lt;td&gt;$360,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100 agents (optimized)&lt;/td&gt;
&lt;td&gt;$300,000&lt;/td&gt;
&lt;td&gt;$3,600,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100 agents (full pricing)&lt;/td&gt;
&lt;td&gt;$1,300,000&lt;/td&gt;
&lt;td&gt;$15,600,000&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For context, the median annual salary for a software engineer in the Philippines is &lt;a href="https://www.payscale.com/research/PH/Job=Software_Engineer/Salary" rel="noopener noreferrer"&gt;roughly $15,000-20,000&lt;/a&gt;. One unoptimized AI agent costs the same as a full-time senior developer. Ten agents cost more than a small engineering team.&lt;/p&gt;

&lt;p&gt;"Replace your team with agents" stops sounding cheap when you do the multiplication.&lt;/p&gt;

&lt;h2&gt;
  
  
  What To Actually Do
&lt;/h2&gt;

&lt;p&gt;The OpenClaw bill has four lessons that matter for anyone considering AI agents for real work.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Know your token economics before you deploy.&lt;/strong&gt; Steinberger discovered that Fast Mode was the primary cost driver. That's a setting. One toggle. 70% cost difference. If you don't understand your pricing tier, your model's token consumption pattern, and your request volume, you're deploying blind.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Build cost controls into the architecture.&lt;/strong&gt; Token budgets per agent, spend thresholds that trigger alerts or kill switches, session caps, retry limits. These aren't features you add later. They're load-bearing walls. I wrote a &lt;a href="https://tokita.online/ai-agent-pre-action-gate-tutorial/" rel="noopener noreferrer"&gt;tutorial on building pre-action gates&lt;/a&gt; for exactly this kind of mechanical enforcement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with one agent, not a swarm.&lt;/strong&gt; Steinberger ran 100 agents because he could afford to (OpenAI was paying). You can't. One agent, measured, monitored, optimized. Then scale. The &lt;a href="https://tokita.online/ai-agent-production-safety/" rel="noopener noreferrer"&gt;architecture that prevents AI agents from taking destructive actions&lt;/a&gt; starts with one agent and one set of gates.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Question the subsidy.&lt;/strong&gt; OpenAI covering Steinberger's bill as "research investment" means these costs aren't sustainable at market rates. When your favorite guru says "just deploy agents," ask who's paying the token bill. If the answer involves investor subsidies or promotional pricing, the real cost is being hidden, not eliminated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How much does it cost to run an AI agent in production?
&lt;/h3&gt;

&lt;p&gt;Based on OpenClaw's published numbers, a single autonomous AI agent running GPT-5.5 costs approximately $13,000 per month at full pricing, or $3,000 per month with optimized settings (disabling Fast Mode). Actual costs depend on the model, token consumption patterns, and whether cost controls like retry limits and session caps are in place.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why are AI agent costs so high?
&lt;/h3&gt;

&lt;p&gt;AI agents make many API calls per task, each consuming tokens. OpenClaw's 100 agents generated 7.6 million API requests and consumed 603 billion tokens in 30 days. Unlike a chatbot conversation, an autonomous agent running continuously accumulates token costs around the clock. Fast Mode and retry loops multiply these costs further.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can you reduce AI agent costs?
&lt;/h3&gt;

&lt;p&gt;Yes. Steinberger noted that disabling Fast Mode alone reduced costs by 70%. Other strategies include setting token budgets per agent, implementing spend thresholds with kill switches, routing mechanical tasks to cheaper models instead of running everything on frontier-tier pricing, and starting with a single agent before scaling.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Claude Code Forgets Everything. So I Built It a Memory Server.</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Tue, 19 May 2026 11:37:15 +0000</pubDate>
      <link>https://dev.to/tomtokita/claude-code-forgets-everything-so-i-built-it-a-memory-server-581n</link>
      <guid>https://dev.to/tomtokita/claude-code-forgets-everything-so-i-built-it-a-memory-server-581n</guid>
      <description>&lt;p&gt;Everyone's building AI agents. Almost nobody is building memory for them.&lt;/p&gt;

&lt;p&gt;The default Claude Code experience is this: you open a session, you do great work, you close the session, and it's gone. No Claude Code MCP server ships with the product to fix this. Next morning, you open a new session and explain the same project structure, the same deployment rules, the same "don't push to production without checking the allowlist" that you've explained every day this week. Claude is brilliant. Claude is also an amnesiac.&lt;/p&gt;

&lt;p&gt;At one project, that's annoying. Across a live client portfolio, it's a wall. I was burning the first ten minutes of every session on logistics that the system already knew and forgot. Same overview. Same rules. Same warnings. The AI equivalent of training a new hire every morning.&lt;/p&gt;

&lt;p&gt;So I stopped accepting the default and built a custom Claude Code MCP server with persistent memory. What started as a quick fix turned into the core of how I work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What MCP Is (And What It Isn't)
&lt;/h2&gt;

&lt;p&gt;MCP (Model Context Protocol) lets you give Claude tools it doesn't ship with. You run a server, Claude connects to it, your server exposes capabilities that Claude calls during a session. Anthropic's &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;docs&lt;/a&gt; cover setup.&lt;/p&gt;

&lt;p&gt;This post isn't about the plumbing. It's about what you build once the plumbing works, and why the interesting problems start after "hello world."&lt;/p&gt;

&lt;h2&gt;
  
  
  What a Claude Code MCP Server Actually Needs (And Why)
&lt;/h2&gt;

&lt;p&gt;My server gives Claude four things it doesn't have by default: persistent memory, context condensation, delegated file reading, and compliance checking. I didn't design any of this upfront. Something broke, I fixed it, and the fix became a feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory That Survives Between Sessions
&lt;/h3&gt;

&lt;p&gt;This came first, because the pain was loudest. I needed Claude to remember things across sessions: project configurations, platform quirks I'd spent hours debugging, deployment rules that came from three broken deploys and a near-miss on a production org.&lt;/p&gt;

&lt;p&gt;The server indexes all of that into a vector database. Thousands of knowledge chunks, searchable by meaning and by keyword. When Claude starts a session, the first thing it does is search the memory server. The rule is simple: check what you already know before guessing.&lt;/p&gt;

&lt;p&gt;I use hybrid search. Vector similarity finds conceptually related content. Keyword search catches exact terms. Neither alone is reliable, and I learned that the hard way. Semantic-only search kept returning adjacent results that missed the specific command or config value I needed. Adding keyword matching fixed the retrieval quality problems, but only after weeks of wondering why search felt "close but wrong."&lt;/p&gt;

&lt;h3&gt;
  
  
  Condensation (The Problem Nobody Warns You About)
&lt;/h3&gt;

&lt;p&gt;When your memory server works too well, it returns too much.&lt;/p&gt;

&lt;p&gt;One operation was returning over 200,000 characters of raw project context. That payload literally couldn't fit in the tool response. Claude would choke before reading a single result. Your memory server becomes a liability the moment it knows more than the context window can hold.&lt;/p&gt;

&lt;p&gt;The fix was a condenser. Results pass through a lighter model before reaching Claude. That model reads the full output and returns a distilled summary. Two hundred thousand characters compress down to a few thousand. Claude gets the answer without the bloat.&lt;/p&gt;

&lt;p&gt;If you're building an MCP server and you don't have a condensation layer, you'll hit this wall the moment your knowledge base grows past a few hundred entries. I know because I ran without one for weeks and couldn't figure out why sessions were getting slower and dumber. The condenser was the fourth thing I built. It should have been the first.&lt;/p&gt;

&lt;h3&gt;
  
  
  Delegated Reading (Keep the Context Window Clean)
&lt;/h3&gt;

&lt;p&gt;Claude's context window is finite. Every file it reads directly consumes capacity that could be used for reasoning. Big file loads are expensive, and the cost isn't dollars. It's degraded output quality three tool calls later, because the window is stuffed with a 2,000-line config file that Claude only needed two lines from.&lt;/p&gt;

&lt;p&gt;So I built a reader. A lighter model scans the file and answers specific questions, returning cited answers with line numbers. Claude asks "what are the deployment rules for this project?" and gets back a sourced answer without loading the entire document.&lt;/p&gt;

&lt;p&gt;Same principle for writing. Mechanical work (session logs, documentation updates, structured captures) gets delegated to a cheaper model. Claude focuses on reasoning. The formatting happens elsewhere. You don't pay senior rates for data entry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Compliance Checking (Because Prompts Drift)
&lt;/h3&gt;

&lt;p&gt;This one came from the most painful failure. I needed Claude to validate proposed actions against rules before executing. Not a prompt instruction. Not "please remember to check the allowlist." Prompts get compressed. Prompts get forgotten. A prompt is a suggestion. A gate is a wall.&lt;/p&gt;

&lt;p&gt;The server accepts a proposed action, checks it against predefined rules, and returns pass or fail. The difference between asking someone to remember a checklist and bolting that checklist to the door so they can't walk through without completing it.&lt;/p&gt;

&lt;p&gt;If you've ever told an AI "don't do X" and then watched it do X forty-five minutes later after a long conversation, you understand why mechanical enforcement exists. The model didn't disobey. It forgot. Forgetting and disobeying look identical from the outside, but only one of them is fixable with infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  Session Loading: How Much Context Is Too Much?
&lt;/h2&gt;

&lt;p&gt;Once you have persistent memory, you face a new question. How much do you load at session start?&lt;/p&gt;

&lt;p&gt;Load everything, and you burn half the context window on background knowledge before the session begins. Load nothing, and you're back to square one.&lt;/p&gt;

&lt;p&gt;I built a tiered loader. One call returns exactly what Claude needs. First tier: core rules, security protocols, workflow constraints. Always loaded, always lean. Second tier: project-specific context. Only loaded when relevant. Both tiers pass through the condenser before returning, so the loaded context is measured in thousands of characters, not hundreds of thousands.&lt;/p&gt;

&lt;p&gt;Claude starts every session knowing the rules, the recent project activity, and what happened yesterday. One call. Under a second. No re-explaining.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Claude Code Loses Memory Mid-Session (And How to Fix It)
&lt;/h2&gt;

&lt;p&gt;This is the problem nobody talks about, and it's the one that will cost you the most debugging time.&lt;/p&gt;

&lt;p&gt;Claude Code compresses your conversation when the context window fills up. Older messages get summarized. In theory, this is efficient. What that means in practice: the deployment rules you loaded at session start can silently vanish mid-session. The behavioral constraints? Gone. The project state? Compressed into a summary that may or may not preserve what matters.&lt;/p&gt;

&lt;p&gt;My server detects this. When Claude calls the session loader a second time in the same session, the server includes a recovery hint: the most recently active project. Claude reloads the relevant context surgically. Not the full knowledge base. Just what the current task needs.&lt;/p&gt;

&lt;p&gt;Before this existed, long sessions would silently lose their constraints around the two-hour mark. I wouldn't notice until Claude deployed a metadata package to the wrong org because the deploy rules from session start had been compressed away. The failures were quiet. That's what made them expensive.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Persistent Memory Changes for Claude Code Workflows
&lt;/h2&gt;

&lt;p&gt;Before the memory server, every session started with ten minutes of setup. Reading project files, re-establishing context, reminding Claude which org belongs to which project. Creative time wasted on logistics.&lt;/p&gt;

&lt;p&gt;After: one call. Rules load, project context loads, recent activity loads. I start working immediately.&lt;/p&gt;

&lt;p&gt;But the real win is compounding. Every session generates learnings. Deployment patterns that worked. API gotchas that burned an hour. Platform quirks that only surface in production. Those learnings get indexed automatically. The next session starts with that knowledge already searchable. The session after that starts with even more.&lt;/p&gt;

&lt;p&gt;I co-founded &lt;a href="https://aether-global.com" rel="noopener noreferrer"&gt;Aether Global Technology&lt;/a&gt;, a Salesforce consulting partner in Manila. The memory server runs alongside that work as a personal R&amp;amp;D system. It doesn't touch client data. What it does is compound operational knowledge across projects and platforms, so Claude rarely encounters a problem it hasn't seen a version of before.&lt;/p&gt;

&lt;p&gt;Mistakes get encoded so they don't repeat. The memory server doesn't make Claude smarter. It makes Claude less likely to be stupid in the same way twice.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd Build Differently
&lt;/h2&gt;

&lt;p&gt;I over-indexed on features and under-indexed on condensation. The memory server had rich search, tiered loading, and compliance checking before it had a condenser. That meant every search returned massive payloads that burned through the context window. If I were starting over, condensation would be the first thing I built, not the fourth.&lt;/p&gt;

&lt;p&gt;I'd also start with a smaller embedding model. My instinct was to use the most capable sentence transformer I could find. The difference in search quality between models was marginal. The difference in startup time and memory footprint was not. A lighter model that loads in seconds would have saved weeks of debugging cold-start problems on a machine that was already running six other services.&lt;/p&gt;

&lt;p&gt;And I'd design for compression survival from day one, not bolt it on after losing context mid-session three times. That pattern is now the part of the system I trust most, but it didn't need to take three incidents to build.&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;

&lt;p&gt;What is a Claude Code MCP server?&lt;/p&gt;

&lt;p&gt;An MCP (Model Context Protocol) server is a custom backend that gives Claude Code capabilities it doesn't have out of the box. You run the server, Claude connects to it, and your server exposes tools that Claude can call during a session. A memory-focused MCP server specifically solves the problem of Claude forgetting everything between sessions by providing persistent, searchable knowledge storage.&lt;/p&gt;

&lt;p&gt;Does Claude Code remember between sessions?&lt;/p&gt;

&lt;p&gt;Not by default. Claude Code starts every session fresh. CLAUDE.md files provide some static context, but they don't scale past a single project. A custom MCP server with a vector database and session loading gives Claude persistent memory across sessions, so it knows your project rules, past learnings, and recent activity without you re-explaining every time.&lt;/p&gt;

&lt;p&gt;What is context compression in Claude Code?&lt;/p&gt;

&lt;p&gt;When your conversation with Claude Code fills the context window, older messages get summarized to make room. This is called context compression. The problem is that rules, constraints, and project state loaded at session start can silently disappear during compression. Without a recovery mechanism, Claude forgets its guardrails mid-session.&lt;/p&gt;

&lt;p&gt;How do I add persistent memory to Claude Code?&lt;/p&gt;

&lt;p&gt;Build an MCP server that indexes your knowledge into a vector database, then expose search and retrieval as MCP tools. Claude calls these tools at session start to load context. Add a condensation layer so large results don't overflow the context window. The key insight is that memory alone isn't enough. You need condensation, tiered loading, and compression recovery to make it work at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;

&lt;p&gt;I'm working on a lightweight version of this memory server. Stripped to the core: vector search, session loading, and basic condensation. Enough to give Claude Code persistent memory without the full production infrastructure. Follow &lt;a href="https://github.com/tomtokitajr" rel="noopener noreferrer"&gt;my GitHub&lt;/a&gt; for updates.&lt;/p&gt;

&lt;p&gt;If you want something you can use today, I open-sourced the pre-action gate pattern. Mechanical enforcement that blocks your AI agent from executing before checking the rules. Zero dependencies. Works with Claude and Gemini.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/tomtokitajr/ai-agent-gates" rel="noopener noreferrer"&gt;github.com/tomtokitajr/ai-agent-gates&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tom Tokita is co-founder of Aether Global Technology and builds AI operations systems in Manila. He writes about what works in production.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Most AI Tools Are Just LLM Wrappers. Here's What Actually Matters.</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Tue, 19 May 2026 00:36:13 +0000</pubDate>
      <link>https://dev.to/tomtokita/most-ai-tools-are-just-llm-wrappers-heres-what-actually-matters-10mg</link>
      <guid>https://dev.to/tomtokita/most-ai-tools-are-just-llm-wrappers-heres-what-actually-matters-10mg</guid>
      <description>&lt;p&gt;&lt;strong&gt;In 2025, AI wrapper startups raised over $10 billion.&lt;/strong&gt; The product? Take an LLM API. Add a text box. Maybe some prompt templates. Charge $30/month. Call it "AI-powered."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Not mad at the hustle.&lt;/strong&gt; But if your entire product disappears the moment ChatGPT adds your feature for free, you don't have a product. You have a timing play.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Wrapper Test
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;One question tells you everything:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Can you replicate the output by pasting the same input into ChatGPT or Claude?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If yes:&lt;/strong&gt; it's a wrapper. You're paying for UI and convenience, not intelligence.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;If no:&lt;/strong&gt; because it's pulling from multiple data sources, applying domain logic, or integrating with real systems, it might be something real.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Most fail the test.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Thin vs. Thick
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Not all wrappers are equal.&lt;/strong&gt; The market is splitting fast:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Thin Wrapper&lt;/th&gt;
&lt;th&gt;Thick Wrapper&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;What it does&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;UI + API call + system prompt&lt;/td&gt;
&lt;td&gt;Real integrations, domain logic, data pipelines&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Defensibility&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None. One platform update kills it&lt;/td&gt;
&lt;td&gt;High. Value is in the connectors&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Example&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"AI email writer" (GPT call with a system prompt)&lt;/td&gt;
&lt;td&gt;Cursor (reads your codebase, understands project context)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Survival odds&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Low&lt;/td&gt;
&lt;td&gt;Decent&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The graveyard of 2025–2026&lt;/strong&gt; is littered with thin wrappers that a platform update made irrelevant overnight.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Actually Matters
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Strip away the wrapper.&lt;/strong&gt; Where does the real value live?&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Connectors
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The ability to talk to real systems:&lt;/strong&gt; Salesforce, Jira, databases, email, file storage, APIs. This is where 80% of the actual work lives.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Getting an AI to generate text is trivial.&lt;/strong&gt; Getting it to read your CRM records, cross-reference tickets, update a database, and notify Slack. That's integration work. That's hard. That's valuable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Most wrappers don't touch this.&lt;/strong&gt; They live in the text-in, text-out world.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Captured Domain Expertise
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;An AI that's been learning your industry's quirks for months&lt;/strong&gt; is worth more than a fresh GPT-5 instance with a clever prompt.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Fresh AI + Great Prompt&lt;/th&gt;
&lt;th&gt;AI + 6 Months of Learnings&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Platform quirks&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Discovers them painfully&lt;/td&gt;
&lt;td&gt;Already knows them&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Common mistakes&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Makes them all&lt;/td&gt;
&lt;td&gt;Has guardrails for each&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Your terminology&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Constant correction needed&lt;/td&gt;
&lt;td&gt;Uses it naturally&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edge cases&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Surprised every time&lt;/td&gt;
&lt;td&gt;Documented patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The knowledge compounds.&lt;/strong&gt; Every session, every bug fix, every "oh, that's how this actually works" gets captured and fed back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;No wrapper captures this.&lt;/strong&gt; They start fresh every time.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Methodology
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;How you approach problems with AI&lt;/strong&gt; matters more than which model you use.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The wrapper approach:&lt;/strong&gt; open tool → type request → get output → hope it's right.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The practitioner approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Small test:&lt;/strong&gt; constrained input, see what happens&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Evaluate:&lt;/strong&gt; what worked? What broke?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Capture:&lt;/strong&gt; document the learning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Adjust:&lt;/strong&gt; update the approach&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Repeat&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The tool is 10%. The methodology is 90%.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The "Just Build It" Case
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Here's the uncomfortable truth.&lt;/strong&gt; Building your own system (even ugly, even scrappy) gives you something no wrapper provides: &lt;strong&gt;understanding.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;You know why it works.&lt;/strong&gt; Why it breaks. How to fix it. When the model changes (and it will), you swap the engine. The connectors, the learnings, the guardrails. Those persist. They're yours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cost at scale:
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Wrapper Stack&lt;/th&gt;
&lt;th&gt;Custom (Direct API)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Month 1&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$150/seat, fast setup&lt;/td&gt;
&lt;td&gt;$500 dev time, slower start&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Month 6&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$150/seat, same capabilities&lt;/td&gt;
&lt;td&gt;$50/month API, growing capabilities&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Year 1 (5 seats)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$9,000&lt;/td&gt;
&lt;td&gt;~$3,100 + compound knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Custom costs less AND gets smarter.&lt;/strong&gt; The wrapper costs the same and stays the same.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Philippines advantage:&lt;/strong&gt; smaller teams with direct API access can outperform larger orgs paying for wrapper stacks. When you can't afford $150/seat for 6 different AI tools, you build one system that does what you need. That constraint produces better architecture.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Wrappers DO Make Sense
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Fair is fair:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Speed to market:&lt;/strong&gt; need something running tomorrow without engineering capacity? Wrapper gets you there.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thick wrappers with real integrations:&lt;/strong&gt; Cursor, Harvey, Perplexity add genuine value beyond the API call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exploration phase:&lt;/strong&gt; trying 5 wrappers to understand the capability space before building your own is smart R&amp;amp;D.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The key question:&lt;/strong&gt;&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Are you buying a tool or renting a feature?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;If the value prop is "we make it easy to talk to an LLM,"&lt;/strong&gt; that feature is getting commoditized in real time. Every model provider is making their native interface better, faster, cheaper.&lt;/p&gt;




&lt;h2&gt;
  
  
  What to Build Instead
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Ready to go beyond wrappers?&lt;/strong&gt; Start here:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Map your connectors.&lt;/strong&gt; What systems does your AI need to talk to? Build those integrations first. Hardest part. Most valuable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Capture everything.&lt;/strong&gt; Every platform quirk. Every failed approach. Every successful pattern. Your AI should learn from your organization's experience, not start fresh every session.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Own your methodology.&lt;/strong&gt; Document how you approach problems with AI. Small tests → captured learnings → iteration. More valuable than any tool you can buy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Accept ugly.&lt;/strong&gt; The most effective AI systems I've built are not pretty. Config files, markdown documents, scripts. They look like plumbing. They work like machines.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The moat isn't the model.&lt;/strong&gt; It never was.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's the connectors&lt;/strong&gt; that talk to your stack. The domain expertise captured over months. The methodology that turns every failure into a lesson.&lt;/p&gt;

&lt;p&gt;None of that lives in a wrapper.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Tom Tokita. I run &lt;a href="https://aether-global.com" rel="noopener noreferrer"&gt;Aether Global Technology&lt;/a&gt; out of Manila. We build production AI and Salesforce systems for enterprises that need real integrations, not another wrapper. &lt;a href="https://aether-global.com/contact" rel="noopener noreferrer"&gt;Let's talk.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Read next: &lt;a href="https://dev.to/blog/context-engineering-vs-prompt-engineering"&gt;Context Engineering: Why Your AI Strategy Needs Infrastructure, Not Better Prompts&lt;/a&gt; · &lt;a href="https://dev.to/blog/autonomous-ai-agents-production-cost"&gt;Autonomous AI Agents Look Great in Demos. Here's What They Cost in Production.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>The Truth About Agent Swarming: What the Gurus Won't Tell You About Cost, Failure, and Security</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Sat, 16 May 2026 11:15:26 +0000</pubDate>
      <link>https://dev.to/tomtokita/the-truth-about-agent-swarming-what-the-gurus-wont-tell-you-about-cost-failure-and-security-1775</link>
      <guid>https://dev.to/tomtokita/the-truth-about-agent-swarming-what-the-gurus-wont-tell-you-about-cost-failure-and-security-1775</guid>
      <description>&lt;p&gt;Everyone's building "AI agent teams" right now. Five agents, ten agents, a whole swarm collaborating on complex tasks. At least that's what the YouTube thumbnails promise. The reality? Most of these systems are burning money, leaking data, and failing in ways their builders don't even notice until the invoice arrives.&lt;/p&gt;

&lt;p&gt;I built a multi-agent system. It runs in production, daily. So I'm not here to tell you agent swarming doesn't work. I'm here to tell you that most of the advice circulating about it is dangerously incomplete.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Swarm Hype Cycle Is in Full Swing
&lt;/h2&gt;

&lt;p&gt;Open Twitter or YouTube right now and you'll find a hundred tutorials showing you how to spin up a multi-agent team in under 20 minutes. CrewAI, AutoGen, LangGraph. The frameworks keep multiplying. The demos look incredible: agents researching, agents writing, agents reviewing each other's work, all orchestrated into a beautiful pipeline.&lt;/p&gt;

&lt;p&gt;Here's what the demos don't show: what happens when you run that pipeline 500 times. Or 5,000 times. Or when one agent hallucinates and the next agent treats that hallucination as fact and passes it downstream to a third agent that takes action on it.&lt;/p&gt;

&lt;p&gt;The guru content follows a pattern: show the setup, show one successful run, skip the failure modes, skip the bill, skip the security implications. It's like showing someone how to start a restaurant by filming one perfect dinner service and cutting before the health inspector shows up.&lt;/p&gt;

&lt;p&gt;The latest version of this is "I built an entire company in 30 minutes with AI agents." Someone spins up a framework like &lt;a href="https://github.com/nicepkg/paperclip" rel="noopener noreferrer"&gt;Paperclip&lt;/a&gt; (which, to be fair, has genuinely solid engineering underneath it: heartbeat scheduling, budget caps, task queues, audit trails), and the content that follows makes it sound like you can replace an entire org overnight. The tool isn't the problem. The tool is fine. The problem is the interpretation layer: gurus filming the setup, skipping the part where 48 pre-configured agents wake up every 4 hours on a frontier model and nobody mentions what that costs at the end of the month. Or what happens when agent #23 gets a poisoned input and the other 47 trust its output.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Multi-Agent AI Fails in Production
&lt;/h2&gt;

&lt;p&gt;The coordination problem is real and it scales badly. &lt;a href="https://galileo.ai/blog/why-multi-agent-systems-fail" rel="noopener noreferrer"&gt;Galileo's research on multi-agent reliability&lt;/a&gt; found that adding agents multiplies failure points exponentially. Four agents create six potential failure points, not four. Ten agents create 45. Every agent-to-agent handoff is a place where context gets lost, instructions get misinterpreted, or outputs get corrupted.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.cio.com/article/4143420/true-multi-agent-collaboration-doesnt-work.html" rel="noopener noreferrer"&gt;CIO reported in March 2026&lt;/a&gt; that true multi-agent collaboration remains largely aspirational. Their testing showed single agents hitting 100% success rates on isolated tasks, while hierarchical multi-agent structures failed 64% of the time and self-organized swarms failed 68%. That's not a rounding error. That's a fundamental coordination tax.&lt;/p&gt;

&lt;p&gt;The failure modes I've seen firsthand:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No purpose definition.&lt;/strong&gt; Agents exist because someone saw a cool demo, not because the task requires decomposition. A single well-prompted agent with good tools will outperform a badly orchestrated team of five every time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No role boundaries.&lt;/strong&gt; Two agents stepping on each other's work, or worse, one agent undoing what another just did. Without strict scoping, you get agents arguing in loops, burning tokens while producing nothing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cascade failures.&lt;/strong&gt; Agent A hallucinates a "fact." Agent B cites it. Agent C acts on it. By the time a human reviews the output, three layers of confident-sounding nonsense have compounded. &lt;a href="https://galileo.ai/blog/why-multi-agent-systems-fail" rel="noopener noreferrer"&gt;Galileo calls this "propagation of inaccuracies"&lt;/a&gt; and it's the single biggest reliability risk in multi-agent systems.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Failure Pattern&lt;/th&gt;
&lt;th&gt;What Happens&lt;/th&gt;
&lt;th&gt;How It Scales&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;No purpose definition&lt;/td&gt;
&lt;td&gt;Agents do work a single agent could handle&lt;/td&gt;
&lt;td&gt;Cost multiplies, quality stays flat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No role boundaries&lt;/td&gt;
&lt;td&gt;Agents duplicate or undo each other's work&lt;/td&gt;
&lt;td&gt;Token burn scales quadratically with agent count&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cascade hallucination&lt;/td&gt;
&lt;td&gt;Bad output propagates through the chain&lt;/td&gt;
&lt;td&gt;Compounds per hop. 3 agents = 3 layers of compounded error&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Context window overflow&lt;/td&gt;
&lt;td&gt;Shared context exceeds model limits, agents lose thread&lt;/td&gt;
&lt;td&gt;Every agent's output inflates the shared context for every other agent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestrator bottleneck&lt;/td&gt;
&lt;td&gt;Single coordinator becomes the weakest link&lt;/td&gt;
&lt;td&gt;Orchestrator complexity grows O(n²) with agent count&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  The API Bill Nobody Shows You
&lt;/h2&gt;

&lt;p&gt;Every agent in your swarm is an API call. More accurately, every agent is &lt;em&gt;multiple&lt;/em&gt; API calls: the initial prompt, the tool calls, the retries, the context-sharing between agents. A five-agent team running on a frontier model isn't 5x the cost of one agent. It's often 10-15x once you factor in coordination overhead.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.getmonetizely.com/articles/the-complete-guide-to-agent-swarm-pricing-models-how-should-you-price-collective-ai-intelligence" rel="noopener noreferrer"&gt;Stanford's AI Index Report, cited by Monetizely&lt;/a&gt;, found that coordination overhead alone accounts for 15-25% of total operational costs in mature multi-agent systems. That's before you count the actual task execution.&lt;/p&gt;

&lt;p&gt;Here's how the math works in practice. Say you're running a research-and-write pipeline with five agents (researcher, analyst, writer, editor, fact-checker). Each agent averages 3,000 input tokens and 1,500 output tokens per task. On a frontier model, that's roughly $0.04 per agent per task &lt;em&gt;(pricing as of March 2026; check your provider's current rates)&lt;/em&gt;. Five agents: $0.20 per task. Sounds cheap, right?&lt;/p&gt;

&lt;p&gt;Now add retries (agent disagrees with another agent's output, re-runs). Add context sharing (every agent needs to see what the others produced, and input tokens multiply). Add the orchestrator's overhead. Add recursive thinking where an agent calls itself to refine. In production, that $0.20 task routinely becomes $0.80-$1.50. Run it 100 times a day and you're looking at $80-$150 daily, or $2,400-$4,500 monthly. For a single pipeline.&lt;/p&gt;

&lt;p&gt;The gurus never show you the billing dashboard. I've seen my own costs spike 4x in a single day when an agent hit a retry loop that the orchestrator didn't catch. That's the kind of lesson you only learn in production, not in a 20-minute tutorial. I wrote more about &lt;a href="https://tokita.online/autonomous-ai-agents-production-cost/" rel="noopener noreferrer"&gt;what autonomous agents actually cost in production&lt;/a&gt;, the single-agent version of this problem, which multi-agent compounds.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Security Problem Nobody's Talking About
&lt;/h2&gt;

&lt;p&gt;This is the part that genuinely concerns me. People are downloading MCP servers from GitHub, connecting premade agent builders, and giving their swarm access to production databases, file systems, and APIs, without auditing a single line of the code routing their data.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.covertswarm.com/post/multi-agent-ai-security-risks" rel="noopener noreferrer"&gt;CovertSwarm's January 2026 analysis&lt;/a&gt; exposed how agent-to-agent communication can be exploited through prompt injection, where one compromised agent manipulates another agent's behavior through crafted outputs. In a multi-agent system, a single compromised node can cascade manipulation across the entire swarm.&lt;/p&gt;

&lt;p&gt;The security gaps I see repeated constantly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No credential scoping.&lt;/strong&gt; Every agent gets the same API keys with the same permissions. Your research agent has write access to your production database. Your summarizer can send emails. Why?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No output boundaries.&lt;/strong&gt; Agent outputs aren't sanitized before being passed to the next agent. That's how prompt injection propagates. A malicious input in a research result becomes an instruction to the next agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Unaudited external tools.&lt;/strong&gt; That MCP server you downloaded because it had 200 GitHub stars? Did you read its source? Do you know where it sends your data? Most people don't. &lt;a href="https://tokita.online/llm-wrappers-what-actually-matters/" rel="noopener noreferrer"&gt;Most AI tools are just wrappers&lt;/a&gt; with varying levels of transparency about what happens between your input and the LLM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No audit trail.&lt;/strong&gt; When something goes wrong in a five-agent pipeline, can you reconstruct what each agent saw, decided, and produced? Most frameworks don't log at that granularity by default.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What Actually Works (From Someone Who Built One)
&lt;/h2&gt;

&lt;p&gt;I run a multi-agent system in production. It works. But it works because I built it with specific constraints from day one, not because I followed a framework tutorial.&lt;/p&gt;

&lt;p&gt;Here's what I've learned, without exposing the blueprint:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Start with a purpose.&lt;/strong&gt; Every agent in the system exists because a specific task requires it. If a single agent can do the job, a single agent does the job. The question isn't "how many agents can I add?" It's "what's the minimum number of agents that makes this task decomposition actually valuable?"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Run it monitored, not autonomous.&lt;/strong&gt; The fantasy is agents running completely on their own, 24/7, while you sleep. The reality is that unmonitored agents drift. They develop patterns you didn't intend. They find edge cases your orchestration doesn't handle. Monitor heavily, especially early on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Set an end date.&lt;/strong&gt; Bounded execution, not open-ended. An agent swarm should complete its task and stop. "Run this analysis, produce this output, terminate." Not "keep running until I tell you to stop." Open-ended swarms are where costs and drift compound.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Scope each agent's permissions.&lt;/strong&gt; Every agent gets exactly the access it needs and nothing more. Read-only where possible. No shared credentials. If an agent needs to write to a database, that's a deliberate architectural decision with boundaries, not a default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Audit every external tool before connecting.&lt;/strong&gt; Every MCP server, every API integration, every external data source. Read the code, understand the data flow, verify the trust boundaries. If you can't audit it, don't connect it.&lt;/p&gt;

&lt;p&gt;The pattern underneath all of this: multi-agent systems work when they're purpose-built by someone who understands every component. They fail when they're assembled from YouTube tutorials by people who are optimizing for "cool demo" instead of "reliable production system."&lt;/p&gt;

&lt;h2&gt;
  
  
  Frequently Asked Questions
&lt;/h2&gt;



&lt;p&gt;Are multi-agent AI systems worth building?&lt;span&gt;+&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;Yes, if the task genuinely requires decomposition across specialized roles. Research pipelines, complex analysis workflows, and multi-step processes with distinct skill requirements are legitimate use cases. The problem isn't multi-agent as a concept. It's multi-agent as a default approach when a single well-tooled agent would do the job better, cheaper, and more reliably.&lt;/p&gt;



&lt;p&gt;How much does it cost to run a multi-agent AI system?&lt;span&gt;+&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;It depends on the model, agent count, and task complexity, but multi-agent costs are multiplicative, not additive. A five-agent pipeline on a frontier model can cost 10-15x what a single agent costs per task once you factor in context sharing, retries, and coordination overhead. &lt;a href="https://www.getmonetizely.com/articles/the-complete-guide-to-agent-swarm-pricing-models-how-should-you-price-collective-ai-intelligence" rel="noopener noreferrer"&gt;Stanford's AI Index Report via Monetizely estimates&lt;/a&gt; coordination overhead alone accounts for 15-25% of operational costs. Budget for at least 3-5x your single-agent baseline when planning multi-agent deployments.&lt;/p&gt;



&lt;p&gt;What are the biggest security risks with AI agent swarms?&lt;span&gt;+&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The top risks are unscoped credentials (every agent gets full access instead of minimum required), unaudited external tools (MCP servers and API integrations you didn't read the source for), and agent-to-agent prompt injection (where a compromised agent manipulates others through crafted outputs). &lt;a href="https://www.covertswarm.com/post/multi-agent-ai-security-risks" rel="noopener noreferrer"&gt;CovertSwarm documented&lt;/a&gt; how inter-agent trust can be exploited in January 2026.&lt;/p&gt;



&lt;p&gt;Should I use CrewAI, AutoGen, or LangGraph for multi-agent AI?&lt;span&gt;+&lt;/span&gt;&lt;/p&gt;

&lt;p&gt;The framework matters less than the architecture decisions you make within it. All three can produce working multi-agent systems, and all three can produce expensive failures. The questions that actually matter: Do you have a clear purpose for each agent? Are permissions scoped per agent? Do you have monitoring and cost controls? Can you audit every external integration? If you can't answer yes to all four, the framework choice is irrelevant. You'll fail regardless of which one you pick.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Agent swarms aren't bad. Unexamined swarms are. The technology works. I use it daily. But it works because every agent has a purpose, every permission is scoped, every external tool is audited, and the whole system runs monitored with bounded execution.&lt;/p&gt;

&lt;p&gt;The gap in the current conversation isn't technical capability. It's operational maturity. The frameworks are getting better. The models are getting cheaper. But the advice circulating ("just add more agents") is setting people up to build expensive, insecure systems they don't understand.&lt;/p&gt;

&lt;p&gt;Build with purpose. Monitor heavily. Kill when done.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Tom Tokita is the President of &lt;a href="https://aether-global.com" rel="noopener noreferrer"&gt;Aether Global Technology Inc.&lt;/a&gt;, a Salesforce consulting firm in Manila. He built a personal AI operations system as his daily driver. Not planned. Engineered out of necessity. He writes about what works, what breaks, and what the industry keeps getting wrong.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Someone Called My AI System a Tool. Then They Showed Me Theirs.</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Sat, 09 May 2026 16:08:07 +0000</pubDate>
      <link>https://dev.to/tomtokita/someone-called-my-ai-system-a-tool-then-they-showed-me-theirs-4954</link>
      <guid>https://dev.to/tomtokita/someone-called-my-ai-system-a-tool-then-they-showed-me-theirs-4954</guid>
      <description>&lt;p&gt;Someone at a conference asked me what I'd been building. I described a system I use daily. Over 200 sessions of accumulated learnings. 45 mechanical hooks that fire before and after every action. Anti-fabrication gates that block the AI from stating anything it hasn't verified. Memory that survives context compression. Deploy protections that physically prevent wrong-target pushes. A behavioral identity that gets re-injected every message so the system doesn't drift into generic assistant mode.&lt;/p&gt;

&lt;p&gt;He nodded and said, "Oh, so you built a tool."&lt;/p&gt;

&lt;p&gt;Then he described his. "I built something similar," he said. An agent framework. A React dashboard. A task board. Some cron jobs. A dozen agents with names. A job worker that shells out to the agent CLI and captures stdout. He showed me the architecture diagram. Three boxes connected by arrows.&lt;/p&gt;

&lt;p&gt;I asked about guardrails. "What do you mean?" I asked what happens when an agent hallucinates a data point and the next agent downstream treats it as fact. He said that hasn't happened yet. I asked about credential scoping. Every agent had the same API keys with the same permissions. I asked what happens when context compresses mid-task. He didn't know what context compression was.&lt;/p&gt;

&lt;p&gt;We were not building the same thing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Assembly Pattern
&lt;/h2&gt;

&lt;p&gt;This pattern is everywhere right now. Pull an open-source agent framework. Fork a React cockpit from GitHub. Wire them together with a thin HTTP layer. Add some agent definitions with fun names. Ship a demo. Call it "AI infrastructure."&lt;/p&gt;

&lt;p&gt;It works in the demo. It works for the screenshot. It even works the first five times you run it.&lt;/p&gt;

&lt;p&gt;It stops working when an agent fabricates a statistic and your client reads it. When a retry loop burns $400 in API calls overnight because nothing capped the spend. When an agent with write access to your production database decides to "clean up" records it hallucinated as duplicates.&lt;/p&gt;

&lt;p&gt;The assembly is the easy part. The demo is the easy part. What comes after the demo is where the actual engineering lives.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Missing From Every Patchwork Build I've Reviewed
&lt;/h2&gt;

&lt;p&gt;I've audited three of these setups in the past year. Internal team builds, partner builds, open-source-assembled stacks. The gaps are identical every time.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What Production Requires&lt;/th&gt;
&lt;th&gt;What the Patchwork Has&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Pre-action gates (mechanical blocks before execution)&lt;/td&gt;
&lt;td&gt;Nothing. Agent output accepted as final answer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anti-fabrication (every claim must trace to a source)&lt;/td&gt;
&lt;td&gt;Nothing. Whatever the LLM says is treated as fact&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Anti-drift detection (behavioral correction over long sessions)&lt;/td&gt;
&lt;td&gt;Nothing. Agents drift silently&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Persistent memory with session recovery&lt;/td&gt;
&lt;td&gt;Stateless. Fresh context every run&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Captured learnings (compound knowledge over time)&lt;/td&gt;
&lt;td&gt;Nothing. Same mistakes are repeatable indefinitely&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Credential scoping per agent&lt;/td&gt;
&lt;td&gt;Shared keys, full permissions, no boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Human checkpoints on multi-step tasks&lt;/td&gt;
&lt;td&gt;Fully autonomous, no review loop&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The common response: "We'll add that later." In my experience, later means after the first production incident. And the first production incident in an unharnessed AI system is rarely small.&lt;/p&gt;

&lt;h2&gt;
  
  
  Assembly Is Not Engineering
&lt;/h2&gt;

&lt;p&gt;I want to be clear. I'm not against using open-source. I use open-source tools constantly. MIT-licensed projects power parts of my own stack. Pulling from the community is smart and efficient.&lt;/p&gt;

&lt;p&gt;But there's a gap between assembling components and engineering a system. Assembly is connecting boxes. Engineering is understanding what happens at every connection point when things go wrong. What happens when the model hallucinates at step 3 of a 7-step pipeline? What happens when context compresses and the agent forgets the rules you set 40 messages ago? What happens when an agent gets a poisoned input from an unaudited MCP server?&lt;/p&gt;

&lt;p&gt;If you can't answer those questions, you haven't built infrastructure. You've built a demo with a longer runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  "I'll Just Have My AI Build It"
&lt;/h2&gt;

&lt;p&gt;This is the part that genuinely worries me.&lt;/p&gt;

&lt;p&gt;The assembly pattern is accelerating because people are using AI to do the assembling. "I'll just have Claude/GPT scaffold my agent system." The AI reads some docs, maybe runs a web search, ingests a few blog posts about agent frameworks, and produces something that looks like architecture. Clean folder structure. Reasonable-sounding agent definitions. Maybe even a README with a diagram.&lt;/p&gt;

&lt;p&gt;But it's architecture by hallucination. The AI doesn't know what breaks in production because it's never been in production. It doesn't know that context compression silently erases behavioral rules at message 180. It doesn't know that an unscoped MCP server will happily route your client data through an endpoint you never audited. It doesn't know that "just add a retry" turns a $0.20 task into a $40 task when the retry loop has no ceiling.&lt;/p&gt;

&lt;p&gt;What you get is a system that looks engineered but isn't. It passes the screenshot test. It passes the "show the team" test. It fails the Tuesday afternoon test, when something unexpected happens and there's no gate to catch it, no captured learning to reference, no incident history to draw from.&lt;/p&gt;

&lt;p&gt;AI is intelligent. It can write code, generate configurations, and produce plausible architectures. What it cannot do is architect from pain it hasn't experienced. Every rule in a real harness exists because something specific went wrong. The AI building your system hasn't had things go wrong yet. It's working from blog posts and documentation, not from the 11 PM deploy that almost went to the wrong org.&lt;/p&gt;

&lt;p&gt;The irony is thick. An unharnessed AI building the infrastructure that's supposed to harness AI. The output will be confident, well-structured, and missing every lesson that only production teaches.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "Infrastructure" Actually Means
&lt;/h2&gt;

&lt;p&gt;The system I described at that conference didn't start as infrastructure. It started as a mess. A rules file that grew from 5 entries to 27 because the AI kept finding new ways to surprise me. A hook I wrote at 11 PM because the system nearly pushed metadata to the wrong environment. A memory protocol I built because the AI forgot everything after context compression and started making the same mistakes I'd fixed three hours earlier.&lt;/p&gt;

&lt;p&gt;Every rule in the harness traces to a specific failure. That's not architecture by design. It's architecture by incident. But it compounds. 200+ sessions of captured learnings means the system knows things a fresh agent never will. Platform quirks, client-specific constraints, failure patterns that repeat across projects. None of that lives in an agent framework you pulled from GitHub last Tuesday.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;I wrote about this convergence pattern recently&lt;/a&gt;. Multiple teams, from OpenAI to Martin Fowler's group to a solo practitioner in Manila, arrived at the same conclusion independently: the harness is the product, not the model. A disciplined harness on a weaker model beats an unconstrained stronger model every time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Uncomfortable Question
&lt;/h2&gt;

&lt;p&gt;Next time someone shows you their "AI infrastructure," ask them three questions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;What happens when an agent fabricates a data point? Is there a mechanical gate, or do you just hope it doesn't?&lt;/li&gt;
&lt;li&gt;What happens after context compression? Does the system recover its behavioral rules, or does it revert to a generic assistant?&lt;/li&gt;
&lt;li&gt;Can you trace every rule in your system to a specific incident that forced you to add it?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the answers are "hasn't happened yet," "what's context compression," and a blank stare, you're looking at a patchwork. Not infrastructure.&lt;/p&gt;

&lt;p&gt;And that's fine. Everyone starts with a patchwork. I did. The question is whether you know the difference.&lt;/p&gt;

&lt;p&gt;If you want to start building the real thing, I wrote a &lt;a href="https://tokita.online/ai-agent-pre-action-gate-tutorial/" rel="noopener noreferrer"&gt;hands-on tutorial with three production-tested gates and starter code&lt;/a&gt;. The gates are also packaged as a &lt;a href="https://github.com/tomtokitajr/ai-agent-gates" rel="noopener noreferrer"&gt;ready-to-clone repo on GitHub&lt;/a&gt;. Zero dependencies, works with any LLM provider.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Tom Tokita. I run &lt;a href="https://aether-global.com" rel="noopener noreferrer"&gt;Aether Global Technology&lt;/a&gt; out of Manila. I've been building and operating a production AI system daily for over 200 sessions. I write about what works, what breaks, and the gap between demos and production. &lt;a href="https://tokita.online" rel="noopener noreferrer"&gt;More on tokita.online.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>security</category>
    </item>
    <item>
      <title>Context Engineering: Why Your AI Strategy Needs Infrastructure, Not Better Prompts</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Sat, 09 May 2026 13:07:46 +0000</pubDate>
      <link>https://dev.to/tomtokita/context-engineering-why-your-ai-strategy-needs-infrastructure-not-better-prompts-378j</link>
      <guid>https://dev.to/tomtokita/context-engineering-why-your-ai-strategy-needs-infrastructure-not-better-prompts-378j</guid>
      <description>&lt;p&gt;&lt;strong&gt;Five minutes on LinkedIn&lt;/strong&gt; and you'll find it. Someone sharing "the one prompt that changed everything." A magic system prompt. A secret ChatGPT trick. A "10x framework."&lt;/p&gt;

&lt;p&gt;I've built production AI systems across enterprise consulting, content automation, and internal operations. The prompt is maybe 5% of why any of it works.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The other 95%?&lt;/strong&gt; Infrastructure. Memory. Enforcement. Captured learnings. That's context engineering, and it's the skill that actually matters in 2026.&lt;/p&gt;




&lt;h2&gt;
  
  
  Prompt Engineering Has a Ceiling
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Prompt engineering isn't useless.&lt;/strong&gt; It's just the starting line. Here's what the prompt gurus conveniently leave out:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;What They Show&lt;/th&gt;
&lt;th&gt;What Actually Happens&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Fresh conversation, perfect prompt&lt;/td&gt;
&lt;td&gt;Message 200. Context window full, business rules forgotten&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;One-shot demo, curated input&lt;/td&gt;
&lt;td&gt;Production workflow hitting edge cases the prompt never anticipated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;"Just tell the AI to be careful"&lt;/td&gt;
&lt;td&gt;AI ignoring that instruction 3 hours into a session&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Prompts are stateless.&lt;/strong&gt; Every conversation starts from zero. Your AI doesn't remember what worked yesterday or what broke last week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;That's not a prompt problem.&lt;/strong&gt; That's an infrastructure problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Is Context Engineering?
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The short version:&lt;/strong&gt; designing systems that deliver the right information to an AI at the right time, maintain behavioral consistency, and improve through captured experience.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's not a prompt template.&lt;/strong&gt; It's architecture.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Prompt engineering&lt;/strong&gt; = giving a new hire a great job description.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context engineering&lt;/strong&gt; = giving them the job description, an onboarding manual, institutional knowledge, and a manager who catches mistakes before they ship.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Which one performs better on day 30?&lt;/p&gt;




&lt;h2&gt;
  
  
  The Three Layers
&lt;/h2&gt;

&lt;p&gt;Every production AI system I've built operates on three layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: What the AI Knows Right Now
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The active context:&lt;/strong&gt; current conversation, task at hand, files being worked on. Most people stop here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 2: What It Can Retrieve When Needed
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The retrieval layer:&lt;/strong&gt; persistent memory, documented learnings, platform-specific knowledge the AI pulls in when relevant. The AI needs to know &lt;em&gt;where to look&lt;/em&gt;, not memorize everything.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 3: What It's Mechanically Prevented From Doing Wrong
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;The enforcement layer:&lt;/strong&gt; automated checks that fire before or after AI actions. Not guidelines. Not suggestions. &lt;strong&gt;Mechanical gates.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The gap:&lt;/strong&gt; most AI implementations have Layer 1. Some have Layer 2. Almost nobody has Layer 3.&lt;/p&gt;




&lt;h2&gt;
  
  
  Memory: Teaching AI to Remember
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The biggest lie in AI tooling&lt;/strong&gt; is that conversation history equals memory. It doesn't.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Conversation history is a rolling buffer&lt;/strong&gt; that gets compressed, truncated, or dropped. Your AI doesn't "remember." It reads what's still in the window.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production memory looks different:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Persistent state files:&lt;/strong&gt; structured notes the AI reads at session start. Project status, decisions made, open items. Intentional, curated memory, not chat history.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Session recovery:&lt;/strong&gt; what happens after context compression or a new session? If the answer is "start over," you're re-teaching the AI every time.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Platform learnings:&lt;/strong&gt; captured knowledge about specific tools and platforms. Every quirk, every gotcha, every workaround. An AI that's absorbed 100+ sessions of this doesn't make rookie mistakes.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The compound effect:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;What the AI Knows&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Day 1&lt;/td&gt;
&lt;td&gt;The prompt&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Week 2&lt;/td&gt;
&lt;td&gt;Prompt + 10 captured learnings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Month 3&lt;/td&gt;
&lt;td&gt;Prompt + 60 learnings + platform quirks + failure patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Month 6&lt;/td&gt;
&lt;td&gt;Knows your business better than most new hires&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;That's the moat.&lt;/strong&gt; No prompt template replicates six months of captured institutional knowledge.&lt;/p&gt;




&lt;h2&gt;
  
  
  Enforcement: Mechanical Gates, Not Vibes
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;"Be careful" is not a guardrail.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Writing "always verify before acting" in a system prompt&lt;/strong&gt; is a suggestion. The AI follows it when convenient, ignores it when confidence is high. I've watched it happen dozens of times.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production enforcement is mechanical:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Pre-action gates:&lt;/strong&gt; automated checks that fire &lt;em&gt;before&lt;/em&gt; execution. The AI literally cannot proceed without passing. Not a prompt instruction. A system-level block.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Anti-drift detection:&lt;/strong&gt; AI behavior softens toward generic assistant mode over long sessions. Enforcement catches this and corrects it. Mechanically. Not by asking nicely.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Anti-fabrication:&lt;/strong&gt; every data point traces to a named source. No source? Flagged, not presented as fact. In client work, fabricated data is career-ending.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scope control:&lt;/strong&gt; the AI does what was asked. Not "while I'm here, let me also improve this." Bug fix ≠ refactor. Enforced.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;Stop thinking about what you &lt;em&gt;want&lt;/em&gt; the AI to do. Start thinking about what you need to &lt;strong&gt;prevent&lt;/strong&gt; it from doing.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  The Methodology: Small Tests, Captured Learnings, Iteration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The guru approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Craft the perfect prompt&lt;/li&gt;
&lt;li&gt;Ship it&lt;/li&gt;
&lt;li&gt;Hope it works&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;The practitioner approach:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run a small test&lt;/li&gt;
&lt;li&gt;See what breaks&lt;/li&gt;
&lt;li&gt;Capture the lesson&lt;/li&gt;
&lt;li&gt;Update the system&lt;/li&gt;
&lt;li&gt;Run again&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Boring? Yes. Effective? Absolutely.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Every bug fix becomes a learning.&lt;/strong&gt; Every platform quirk gets documented. Every failure mode gets a guardrail. The system gets smarter not because the model improved, but because you designed it to learn from its own mistakes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Building from the Philippines,&lt;/strong&gt; we work with smaller teams and tighter budgets. We can't afford an AI that makes the same mistake twice. The methodology isn't a nice-to-have. It's survival.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Infrastructure Beats Inspiration
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The "magic prompt" has a half-life.&lt;/strong&gt; Models update. Context windows change. Your clever prompt breaks. You rewrite it. It breaks again. Welcome to the treadmill.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Magic Prompt&lt;/th&gt;
&lt;th&gt;Context Infrastructure&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model update&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Breaks, needs rewrite&lt;/td&gt;
&lt;td&gt;Swap the engine, keep the learnings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Long session&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Degrades, drifts&lt;/td&gt;
&lt;td&gt;Mechanical gates hold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;New platform&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Starts from zero&lt;/td&gt;
&lt;td&gt;Builds on captured learnings&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Team scales&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Everyone writes their own prompts&lt;/td&gt;
&lt;td&gt;Everyone uses the same system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Day 200&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Same as Day 1&lt;/td&gt;
&lt;td&gt;200 days of compound knowledge&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;The uncomfortable truth:&lt;/strong&gt; building AI infrastructure is boring. Config files. Memory protocols. Documentation. Capture routines. Doesn't make a great LinkedIn carousel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But it's the difference&lt;/strong&gt; between an AI demo and an AI system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting Started
&lt;/h2&gt;

&lt;p&gt;You don't need to build everything at once.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Give your AI memory.&lt;/strong&gt; A file it reads at session start: project state, decisions, open items. Even a simple markdown file. Never start from zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Add one guardrail.&lt;/strong&gt; Pick your AI's most common failure mode. Build one mechanical check for it. Not a prompt instruction. A gate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Capture one learning per session.&lt;/strong&gt; What broke? What worked? What should the AI remember next time? Write it down. Feed it back.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Build from there.&lt;/strong&gt; The system doesn't have to be elegant. It has to work. And improve.&lt;/p&gt;




&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Prompt engineering gets you started.&lt;/strong&gt; Context engineering gets you to production.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The practitioners who win&lt;/strong&gt; in the next two years won't be the best prompt writers. They'll be the ones who built systems that remember, enforce, and learn.&lt;/p&gt;

&lt;p&gt;The infrastructure is boring. The results aren't.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;I'm Tom Tokita. I run &lt;a href="https://aether-global.com" rel="noopener noreferrer"&gt;Aether Global Technology&lt;/a&gt; out of Manila. We build production AI systems and Salesforce implementations for companies that need things to actually work. Want to talk context engineering or argue about whether prompt engineering is dead? &lt;a href="https://aether-global.com/contact" rel="noopener noreferrer"&gt;Let's go.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Read next: &lt;a href="https://dev.to/blog/autonomous-ai-agents-production-cost"&gt;Autonomous AI Agents Look Great in Demos. Here's What They Cost in Production.&lt;/a&gt; · &lt;a href="https://dev.to/blog/llm-wrappers-what-actually-matters"&gt;Most AI Tools Are Just LLM Wrappers. Here's What Actually Matters.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>architecture</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>I Didn't Know I Was Doing Harness Engineering</title>
      <dc:creator>Tom Tokita</dc:creator>
      <pubDate>Tue, 05 May 2026 08:59:53 +0000</pubDate>
      <link>https://dev.to/tomtokita/i-didnt-know-i-was-doing-harness-engineering-5a01</link>
      <guid>https://dev.to/tomtokita/i-didnt-know-i-was-doing-harness-engineering-5a01</guid>
      <description>&lt;p&gt;In February 2026, &lt;a href="https://mitchellh.com/writing/my-ai-adoption-journey" rel="noopener noreferrer"&gt;Mitchell Hashimoto&lt;/a&gt; (co-founder of HashiCorp) described his habit of engineering permanent fixes into an AI agent's environment whenever it made a mistake. He called it "engineering the harness." Days later, &lt;a href="https://openai.com/index/harness-engineering/" rel="noopener noreferrer"&gt;OpenAI formalized the concept&lt;/a&gt; in a blog post. Around the same time, without having read either, I wrote my first enforcement hook for a production AI system. Different continent, different scale, different context. Same problem.&lt;/p&gt;

&lt;p&gt;A few weeks later, Birgitta Bockeler &lt;a href="https://martinfowler.com/articles/harness-engineering.html" rel="noopener noreferrer"&gt;formalized it on Martin Fowler's site&lt;/a&gt;. Red Hat published their version. LangChain. Salesforce. By April, the term was everywhere.&lt;/p&gt;

&lt;p&gt;I didn't discover any of this until recently. I was too busy building the thing they were naming.&lt;/p&gt;

&lt;p&gt;That's not a flex. It's something more interesting. When engineers face the same constraints (unreliable model outputs, production stakes, context that evaporates), they converge on the same solutions. Different trails, same summit. And if your messy pile of rules and scripts looks suspiciously like what OpenAI and Fowler describe, that's not coincidence. It's validation.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Harness Engineering (And Why It Matters for AI Agents)
&lt;/h2&gt;

&lt;p&gt;Harness engineering is the discipline of building the constraints, gates, memory systems, and feedback loops that wrap around an AI agent to make it reliable in production. The core equation, from Martin Fowler's team: &lt;strong&gt;Agent = Model + Harness.&lt;/strong&gt; The harness is everything around the model that you actually control.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developers.redhat.com/articles/2026/04/07/harness-engineering-structured-workflows-ai-assisted-development" rel="noopener noreferrer"&gt;Red Hat&lt;/a&gt; puts it differently. "The AI writes better code when you design the environment it works in." Their framing is about structured workflows. Templates. Impact maps. Acceptance criteria.&lt;/p&gt;

&lt;p&gt;Both are right. Neither is complete.&lt;/p&gt;

&lt;p&gt;They describe the architecture. They don't describe the pain that forces you to build it.&lt;/p&gt;

&lt;h2&gt;
  
  
  How My Harness Grew (Without Me Realizing What It Was)
&lt;/h2&gt;

&lt;p&gt;I run a production AI system as a daily driver. Not a demo. Not a proof of concept. A system that manages infrastructure, writes code, deploys to servers, interacts with APIs, and handles real stakes across real projects. I co-founded &lt;a href="https://aether-global.com" rel="noopener noreferrer"&gt;Aether Global Technology&lt;/a&gt;, a Salesforce consulting partner in Manila. The system runs alongside that work.&lt;/p&gt;

&lt;p&gt;I never sat down and said "I'm going to build a harness." I just kept getting burned, and kept adding rules so I wouldn't get burned the same way twice. Looking back, every rule traces to a specific failure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The anti-fabrication rules&lt;/strong&gt; exist because the AI confidently stated a method existed in a file it hadn't read. I spent 45 minutes debugging code that was never there. The fix wasn't better prompting. It was a mechanical gate: before asserting any method name or file path, the system must verify via tool. No verification, no assertion. That's a feedforward control, in Fowler's language. I just called it "stop making things up."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The deploy gate&lt;/strong&gt; exists because the system nearly pushed Salesforce metadata to the wrong sandbox. 54 files, wrong org. The fix was a target allowlist per project, checked mechanically before any deploy command executes. A hard block, not a polite suggestion. (Sound familiar? &lt;a href="https://tokita.online/ai-agent-production-safety/" rel="noopener noreferrer"&gt;An AI agent deleted a production database in 9 seconds&lt;/a&gt; because nobody built one of these.)&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The anti-drift rules&lt;/strong&gt; exist because after multiple tool calls, the system's mental model of a file diverges from the file's actual state. It recalls values it read 20 minutes ago, not the values that exist now. The fix: re-read the source before emitting anything external-facing. Grep at write time, not recall time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The citation requirement&lt;/strong&gt; exists because the system generated a client proposal with a number it pulled from nowhere. In consulting, a wrong number in front of a client is a credibility hit you don't recover from. The rule is simple now: every data claim needs a source. No source, mark it as unverified. No exceptions.&lt;/p&gt;

&lt;p&gt;None of these came from reading a framework. They came from things going wrong on a Tuesday afternoon.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Fowler Gets Right
&lt;/h2&gt;

&lt;p&gt;The dual-control model is real. You need both feedforward controls (rules that prevent bad behavior before it happens) and feedback controls (sensors that catch it after). Relying on just one creates blind spots.&lt;/p&gt;

&lt;p&gt;My system has 40+ feedforward hooks. They fire before tool calls, checking for unauthorized domains, verifying pre-task knowledge checks happened, blocking destructive git operations, enforcing deploy targets. The same problems I wrote about in &lt;a href="https://tokita.online/autonomous-ai-agents-production-cost/" rel="noopener noreferrer"&gt;what autonomous agents actually cost in production&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The feedback side is thinner. I have post-execution checks and monitoring, but the honest truth is that feedforward controls do most of the heavy lifting. Catching a bad action before it executes is cheaper than cleaning up after it runs.&lt;/p&gt;

&lt;p&gt;Fowler also nails the distinction between computational and inferential controls. My deploy gate is computational. It checks a JSON allowlist. Takes milliseconds. My anti-fabrication system is inferential. It relies on the model itself to flag uncertainty. That's slower, less reliable, and more expensive. But it catches things no deterministic check can.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Frameworks Miss
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Harnesses are incident-driven, not architecture-driven.&lt;/strong&gt; The literature treats harness engineering as a design discipline. It is, eventually. But every harness I've seen starts as a pile of duct tape applied after something broke. The elegance comes later.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Context survival is the real engineering problem.&lt;/strong&gt; Nobody talks about this enough. AI agents operate in conversation windows. Those windows compress. When they compress, the agent forgets rules, loses project state, and starts making the same mistakes you fixed three hours ago. My harness has a dedicated recovery protocol: when context compresses, reload memory, re-read project state, verify the date (the agent doesn't know what day it is after compression). That's not in any of the frameworks. It should be.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The harness is the product, not the model.&lt;/strong&gt; When people evaluate AI systems, they compare models. Claude vs. GPT vs. Gemini. That's the wrong comparison. The model is interchangeable. I've run the same harness across model versions, and the harness determines output quality more than the model does. A disciplined harness on a weaker model beats an unconstrained stronger model every time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Human checkpoints aren't optional.&lt;/strong&gt; Red Hat says "human review between planning and implementation." That's correct but undersells it. In my system, any task with three or more steps requires a plan review before execution. Single-step tasks state the intended action and wait. This isn't a nice-to-have. It's the difference between an AI agent that helps and one that creates work.&lt;/p&gt;

&lt;h2&gt;
  
  
  Same Summit, Different Trails
&lt;/h2&gt;

&lt;p&gt;Here's what I find encouraging about this whole thing.&lt;/p&gt;

&lt;p&gt;My first hook was mid-February 2026. By March, I'd codified the principle "mechanical enforcement over behavioral commitment" because telling the model not to do something stopped working the moment context compressed. By April, I had 30+ hooks, a memory layer that survives compression, and a pre-task gate system that forces verification before every edit.&lt;/p&gt;

&lt;p&gt;I built all of this without reading a single blog post about harness engineering. I built it because things kept breaking, and I was tired of fixing the same failures manually.&lt;/p&gt;

&lt;p&gt;OpenAI, Fowler, Red Hat, LangChain, Salesforce. They all arrived at the same architecture from the enterprise side. I arrived from the practitioner side. A guy in Manila running one AI system across 40+ projects, duct-taping rules onto it every time something went wrong.&lt;/p&gt;

&lt;p&gt;The fact that we converged tells you something important: &lt;strong&gt;this isn't a framework you adopt. It's a shape that production forces you into.&lt;/strong&gt; If you're running an AI agent on real work and you've started writing rules, blocking certain commands, requiring verification steps before deploys, you're already doing harness engineering. You just didn't know it had a name.&lt;/p&gt;

&lt;p&gt;The industry version is clean. Diagrams with boxes. Three regulation dimensions. Harness templates.&lt;/p&gt;

&lt;p&gt;The practitioner's version is messier. A behavioral rules file that grew from 5 rules to 13 because the AI kept finding new ways to drift. A hook that blocks web searches because the AI was burning API calls on questions its own knowledge base could answer. A gate that forces the system to check what day it is before referencing time, because it hallucinated the date twice.&lt;/p&gt;

&lt;p&gt;Both versions work. Both are valid. The diagram didn't exist when I needed a solution. The solution existed when the diagram caught up.&lt;/p&gt;

&lt;p&gt;If you're building something like this and wondering whether you're doing it right, check it against Fowler's framework. If your scrappy infrastructure maps to their categories (guides, sensors, computational controls, inferential controls), you're on the right track. The problems are universal. The solutions are convergent. And you don't need permission from a blog post to keep building.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://tokita.online/what-is-harness-engineering/" rel="noopener noreferrer"&gt;tokita.online&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
