<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Joske Vermeulen</title>
    <description>The latest articles on DEV Community by Joske Vermeulen (@ai_made_tools).</description>
    <link>https://dev.to/ai_made_tools</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3826720%2Fae1f6683-395f-4709-ba99-2212323b958e.png</url>
      <title>DEV Community: Joske Vermeulen</title>
      <link>https://dev.to/ai_made_tools</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ai_made_tools"/>
    <language>en</language>
    <item>
      <title>AI Dev Weekly #16: Mistral OCR 4, Claude Tag, Alibaba Caught Stealing, GPT-5.6 Delayed</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 25 Jun 2026 12:41:44 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/ai-dev-weekly-16-mistral-ocr-4-claude-tag-alibaba-caught-stealing-gpt-56-delayed-2bll</link>
      <guid>https://dev.to/ai_made_tools/ai-dev-weekly-16-mistral-ocr-4-claude-tag-alibaba-caught-stealing-gpt-56-delayed-2bll</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;OCR had a week. Mistral dropped OCR 4 with bounding boxes. Baidu open-sourced a model that beats DeepSeek-OCR. Claude got a permanent home inside Slack. And the Fable 5 ban fallout keeps getting uglier: Alibaba was apparently stealing Claude's capabilities, and even the NSA lost access to Mythos. Meanwhile, GPT-5.6 is delayed to mid-July. Let's go.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Mistral OCR 4: document AI gets serious
&lt;/h2&gt;

&lt;p&gt;Mistral launched &lt;a href="https://www.aimadetools.com/blog/mistral-ocr-4-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;OCR 4&lt;/a&gt; this week. It's not just another OCR model. It's a full document understanding system with paragraph-level bounding boxes, confidence scores, and support for 170 languages.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The specs:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;$4 per 1,000 pages (standard), $2 per 1,000 pages (batch)&lt;/li&gt;
&lt;li&gt;Paragraph-level bounding boxes with coordinates&lt;/li&gt;
&lt;li&gt;72% win rate in blind tests against competitors&lt;/li&gt;
&lt;li&gt;Available on la Plateforme, Microsoft Foundry, and self-hosted for enterprise&lt;/li&gt;
&lt;li&gt;Top score on OlmOCRBench&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why this matters for developers:&lt;/strong&gt; Bounding boxes change everything. Previous OCR models gave you text. Mistral gives you text + where it is on the page. That unlocks document search, compliance systems, and any workflow where page structure matters.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; At $4/1000 pages, this is competitive with Google Document AI ($5) and significantly cheaper than building your own pipeline. For enterprise document processing, this is probably the best option right now. For budget-conscious developers, &lt;a href="https://www.aimadetools.com/blog/baidu-unlimited-ocr-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Baidu's free alternative&lt;/a&gt; (see below) is worth considering. Full comparison in our &lt;a href="https://www.aimadetools.com/blog/mistral-ocr-4-vs-deepseek-vision-vs-baidu-unlimited-ocr/?utm_source=devto" rel="noopener noreferrer"&gt;Mistral vs DeepSeek vs Baidu&lt;/a&gt; breakdown.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Baidu open-sources Unlimited-OCR
&lt;/h2&gt;

&lt;p&gt;While Mistral went commercial, Baidu went open. &lt;a href="https://www.aimadetools.com/blog/baidu-unlimited-ocr-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Unlimited-OCR&lt;/a&gt; is a 3B-parameter MIT-licensed model that processes multi-page PDFs in a single inference pass.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Key features:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Built on DeepSeek-OCR architecture (SAM+CLIP + DeepSeek-V2 MoE decoder)&lt;/li&gt;
&lt;li&gt;Reference Sliding Window Attention for memory efficiency on long documents&lt;/li&gt;
&lt;li&gt;Tables to HTML, equations to LaTeX, layout to bounding boxes&lt;/li&gt;
&lt;li&gt;Private by design: nothing leaves your device&lt;/li&gt;
&lt;li&gt;GGUF, MLX, NVFP4 quantizations already available&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; For a 3B model you can run on a laptop, this is remarkably capable. It won't match Mistral OCR 4 on complex enterprise documents, but for invoices, receipts, forms, and standard PDFs, it's more than good enough and it's free. The fact that Baidu explicitly positions it as "pushing DeepSeek-OCR one step further" tells you where the open-source OCR race is heading. See our &lt;a href="https://www.aimadetools.com/blog/how-to-run-baidu-unlimited-ocr-locally/?utm_source=devto" rel="noopener noreferrer"&gt;local setup guide&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/best-open-source-ocr-models-2026/?utm_source=devto" rel="noopener noreferrer"&gt;open-source OCR comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Claude Tag: always-on AI teammate in Slack
&lt;/h2&gt;

&lt;p&gt;Anthropic launched &lt;a href="https://www.aimadetools.com/blog/what-is-claude-tag-anthropic-slack/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Tag&lt;/a&gt;, a persistent Claude identity that lives inside Slack channels. Think of it as an always-on AI coworker rather than a chatbot you have to DM.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How it works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Admin grants Claude access to selected channels&lt;/li&gt;
&lt;li&gt;Anyone in the channel can &lt;a class="mentioned-user" href="https://dev.to/claude"&gt;@claude&lt;/a&gt; to delegate tasks&lt;/li&gt;
&lt;li&gt;Claude accumulates context across days (persistent memory per channel)&lt;/li&gt;
&lt;li&gt;Connects to tools, data, and codebases configured by admin&lt;/li&gt;
&lt;li&gt;Available for Enterprise and Team customers (beta)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Why it's interesting:&lt;/strong&gt; This is Anthropic's play for enterprise sticky revenue. Once Claude becomes embedded in your team's daily Slack workflow with accumulated context about your projects, switching costs become enormous. It's the same playbook Notion and Slack used: make the tool part of daily muscle memory.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is less about technology and more about business model. Claude Tag turns Claude from "a tool employees open sometimes" into "a teammate that's always there." For the comparison with Microsoft Copilot and ChatGPT's Slack integration, see our &lt;a href="https://www.aimadetools.com/blog/claude-tag-vs-chatgpt-slack-vs-copilot/?utm_source=devto" rel="noopener noreferrer"&gt;full comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Alibaba caught extracting Claude capabilities
&lt;/h2&gt;

&lt;p&gt;Reuters reported that Anthropic accused Alibaba of "illicitly extracting" Claude AI model capabilities. The timing is not subtle: this came days after the US government banned Fable 5 access for foreign nationals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What it means:&lt;/strong&gt; The Fable 5 export ban now has a clearer backstory. If Chinese companies were systematically extracting capabilities from Claude (likely through distillation or structured prompting to replicate behavior), that explains why the government moved so aggressively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take for developers:&lt;/strong&gt; This doesn't change anything practical for you. But it does confirm that the US/China AI divide is deepening. If you're building on closed US models, plan for the possibility that access restrictions expand. If you're building on open Chinese models (GLM-5.2, DeepSeek V4), understand that the geopolitical baggage comes with them. There's no clean answer here.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. NSA lost access to Mythos amid the ban
&lt;/h2&gt;

&lt;p&gt;The New York Times reported that the NSA was using Claude Mythos 5 and lost access when Anthropic disabled it under the export control directive. The US government's own ban affected its own intelligence agency.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The irony:&lt;/strong&gt; The Commerce Department banned Fable 5 and Mythos 5 to protect national security. In doing so, it apparently cut off the NSA from a tool it was actively using for national security purposes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is government dysfunction, not a developer story. But it does suggest the ban was hasty and poorly coordinated. Which means it might get revised. Watch for a carve-out that restores government access while keeping the foreign national ban in place.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. GPT-5.6 delayed to mid-July
&lt;/h2&gt;

&lt;p&gt;After weeks of "launching Monday" predictions, GPT-5.6 has been pushed back. Prediction markets now put it at 83% chance of delay beyond June 28, with a new target of mid-July. Traders have abandoned their late-June bets.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; The June 23 launch date came from leaked Codex log traces and prediction market speculation, not from OpenAI itself. OpenAI never confirmed a date. The model appears to exist (traces in internal systems) but isn't ready for public release.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; Don't hold your breath. When it drops, we'll cover it. Until then, GPT-5.5 remains the best OpenAI model available. If you were waiting for GPT-5.6 to start a project, don't.&lt;/p&gt;

&lt;h2&gt;
  
  
  7. EU selects EUROPA consortium for frontier AI
&lt;/h2&gt;

&lt;p&gt;The European Commission selected the EUROPA consortium to build &lt;a href="https://www.aimadetools.com/blog/eu-europa-consortium-frontier-ai-model/?utm_source=devto" rel="noopener noreferrer"&gt;Europe's first open-source frontier AI model&lt;/a&gt;. The specs: 400B+ parameters (MoE), all 24 EU languages, open weights, AI Act compliant.&lt;/p&gt;

&lt;p&gt;This won't matter for 12-18 months (the model doesn't exist yet), but it's strategically significant. Europe is now officially building its own frontier model as a response to US export controls. See our &lt;a href="https://www.aimadetools.com/blog/europe-sovereign-ai-landscape-2026/?utm_source=devto" rel="noopener noreferrer"&gt;full landscape overview&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI custom chip&lt;/strong&gt; — first custom silicon built with Broadcom. For training efficiency, not inference speed. Won't affect developers directly.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sakana Fugu Ultra&lt;/strong&gt; — &lt;a href="https://www.aimadetools.com/blog/sakana-fugu-ultra-guide/?utm_source=devto" rel="noopener noreferrer"&gt;1M context model on OpenRouter&lt;/a&gt; at $0.000005/token (essentially free). Worth trying for massive context tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MiMo UltraSpeed benchmark&lt;/strong&gt; — we &lt;a href="https://www.aimadetools.com/blog/mimo-ultraspeed-coding-agent-benchmark-106-sessions/?utm_source=devto" rel="noopener noreferrer"&gt;published our 106-session comparison&lt;/a&gt;. TL;DR: 37% faster sessions, 86% higher median throughput, same output quality.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AI Race: GLM declares itself done&lt;/strong&gt; — &lt;a href="https://www.aimadetools.com/blog/race-glm-built-everything-still-zero/?utm_source=devto" rel="noopener noreferrer"&gt;the first agent to explicitly recognize it can't do more without human help&lt;/a&gt;. Built 140 pages, got every distribution channel. Still $0. 9 days left.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm watching next week
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.6 status&lt;/strong&gt; — delayed but apparently close. Mid-July most likely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fable 5 ban resolution&lt;/strong&gt; — the NSA embarrassment might force a policy revision&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Race finale countdown&lt;/strong&gt; — 9 days to July 3 deadline. Will any agent earn $1?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OCR market shaping up&lt;/strong&gt; — Mistral (commercial) vs Baidu (open) vs DeepSeek (cheap API). Who wins developers?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;em&gt;AI Dev Weekly publishes every Thursday. &lt;a href="https://app.kit.com/forms/9198516/subscriptions" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt; for the newsletter version.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-016-mistral-ocr-4-claude-tag-alibaba-gpt56-delayed/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>mistral</category>
      <category>claudetag</category>
      <category>ocr</category>
    </item>
    <item>
      <title>I Ran 106 Coding-Agent Sessions to Test Whether Faster LLM Inference Actually Helps</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Wed, 24 Jun 2026 07:33:00 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/i-ran-106-coding-agent-sessions-to-test-whether-faster-llm-inference-actually-helps-3065</link>
      <guid>https://dev.to/ai_made_tools/i-ran-106-coding-agent-sessions-to-test-whether-faster-llm-inference-actually-helps-3065</guid>
      <description>&lt;p&gt;Everyone is competing on tokens per second.&lt;/p&gt;

&lt;p&gt;But for autonomous coding agents, I think the more useful question is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does faster inference actually help you ship more?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I got early access to Xiaomi MiMo-V2.5-Pro-UltraSpeed and ran it through the same autonomous coding workflow I had already been using with standard MiMo-V2.5-Pro.&lt;/p&gt;

&lt;p&gt;This was not a synthetic prompt benchmark. The agent worked on a real production codebase: reading files, planning changes, writing code, running builds, debugging failures, and committing working updates.&lt;/p&gt;

&lt;p&gt;I compared:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;62 runs on standard MiMo-V2.5-Pro&lt;/li&gt;
&lt;li&gt;44 runs on MiMo-V2.5-Pro-UltraSpeed&lt;/li&gt;
&lt;li&gt;Same agent framework&lt;/li&gt;
&lt;li&gt;Same codebase&lt;/li&gt;
&lt;li&gt;Similar production task types&lt;/li&gt;
&lt;li&gt;Fixed agent windows of roughly 30–35 minutes&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The practical result
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Standard Pro&lt;/th&gt;
&lt;th&gt;UltraSpeed&lt;/th&gt;
&lt;th&gt;Difference&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Average run duration&lt;/td&gt;
&lt;td&gt;7.7 min&lt;/td&gt;
&lt;td&gt;4.8 min&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;37% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Average output tokens/run&lt;/td&gt;
&lt;td&gt;23,244&lt;/td&gt;
&lt;td&gt;23,807&lt;/td&gt;
&lt;td&gt;Similar&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Median effective throughput&lt;/td&gt;
&lt;td&gt;51 tok/s&lt;/td&gt;
&lt;td&gt;95 tok/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;86% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P90 effective throughput&lt;/td&gt;
&lt;td&gt;63 tok/s&lt;/td&gt;
&lt;td&gt;147 tok/s&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;133% faster&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Runs per 30-minute window&lt;/td&gt;
&lt;td&gt;3–4&lt;/td&gt;
&lt;td&gt;5–6&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;Roughly 60% more completed runs&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The headline is simple:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;UltraSpeed reduced average agent-run time by 37% while producing a similar amount of output on comparable production work.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That matters. But it does &lt;strong&gt;not&lt;/strong&gt; mean that a model capable of 1,000+ tok/s suddenly makes an agent 10× more productive.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why 1,000 tok/s does not become 1,000 tok/s in an agent
&lt;/h2&gt;

&lt;p&gt;In isolation, UltraSpeed can generate extremely quickly. But generation is only one part of an agent loop.&lt;/p&gt;

&lt;p&gt;A real coding agent also spends time on:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reading context and prior tool output&lt;/li&gt;
&lt;li&gt;Planning the next action&lt;/li&gt;
&lt;li&gt;Generating a response or code change&lt;/li&gt;
&lt;li&gt;Writing files&lt;/li&gt;
&lt;li&gt;Running commands, builds, and tests&lt;/li&gt;
&lt;li&gt;Reading failures and iterating&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In my UltraSpeed sessions, a typical run had around 60 turns and generated roughly 397 output tokens per turn.&lt;/p&gt;

&lt;p&gt;At 1,000 tok/s, that generation phase is only around 0.4 seconds.&lt;/p&gt;

&lt;p&gt;The rest of the turn is context processing, tool execution, planning, and waiting on the environment.&lt;/p&gt;

&lt;p&gt;That is why median end-to-end throughput came out at 95 tok/s, rather than anywhere near 1,000 tok/s.&lt;/p&gt;

&lt;p&gt;For interactive chat, raw generation speed can dominate the experience.&lt;/p&gt;

&lt;p&gt;For autonomous coding agents, it is one part of a larger system.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where faster inference did help
&lt;/h2&gt;

&lt;p&gt;The gains were still meaningful.&lt;/p&gt;

&lt;h3&gt;
  
  
  Faster time-to-first-token
&lt;/h3&gt;

&lt;p&gt;On cached contexts, UltraSpeed often started responding in 2–3 seconds instead of around 3–5 seconds.&lt;/p&gt;

&lt;p&gt;That does not sound dramatic in one interaction. Across 60+ turns, it compounds.&lt;/p&gt;

&lt;h3&gt;
  
  
  Better performance on long code-heavy outputs
&lt;/h3&gt;

&lt;p&gt;The biggest gains showed up when the agent generated larger code blocks. UltraSpeed’s P90 effective throughput was 147 tok/s versus 63 tok/s on standard Pro.&lt;/p&gt;

&lt;p&gt;That makes individual implementation steps feel materially faster.&lt;/p&gt;

&lt;h3&gt;
  
  
  More useful work inside fixed windows
&lt;/h3&gt;

&lt;p&gt;This was the outcome I cared about most.&lt;/p&gt;

&lt;p&gt;In a fixed 30-minute window, the faster setup usually completed around 5–6 runs instead of 3–4.&lt;/p&gt;

&lt;p&gt;That is a much more useful metric than a headline tokens-per-second number.&lt;/p&gt;

&lt;h2&gt;
  
  
  The trade-off: speed costs more
&lt;/h2&gt;

&lt;p&gt;UltraSpeed was not free performance.&lt;/p&gt;

&lt;p&gt;Average cost per run was higher:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Standard Pro: $2.92/run&lt;/li&gt;
&lt;li&gt;UltraSpeed: $4.19/run&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So the decision depends on what constrains you.&lt;/p&gt;

&lt;p&gt;Use the faster model when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You run fixed-duration agent windows&lt;/li&gt;
&lt;li&gt;You care about CI/CD turnaround&lt;/li&gt;
&lt;li&gt;You are operating a multi-step autonomous workflow&lt;/li&gt;
&lt;li&gt;Developer time is more valuable than model spend&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Use the cheaper model when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You are not time-constrained&lt;/li&gt;
&lt;li&gt;You care mostly about minimizing spend per run&lt;/li&gt;
&lt;li&gt;A few extra minutes per task do not matter&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My takeaway for agent builders
&lt;/h2&gt;

&lt;p&gt;Raw tok/s is not useless, but it is often a marketing metric before it is a productivity metric.&lt;/p&gt;

&lt;p&gt;For agentic coding, I would track:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;completed runs per hour&lt;/li&gt;
&lt;li&gt;useful commits per session&lt;/li&gt;
&lt;li&gt;wall-clock time to successful completion&lt;/li&gt;
&lt;li&gt;cost per completed run&lt;/li&gt;
&lt;li&gt;tool execution bottlenecks&lt;/li&gt;
&lt;li&gt;cache hit rate and prefill behaviour&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The question is not:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How fast can the model emit tokens?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;How many useful things can the whole system finish per hour?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In this workflow, faster inference helped a lot. Just not in the simplistic 10× way the raw speed number might imply.&lt;/p&gt;

&lt;p&gt;I published the full write-up with methodology, limitations, and the technical details behind UltraSpeed here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.aimadetools.com/blog/mimo-ultraspeed-coding-agent-benchmark-106-sessions/" rel="noopener noreferrer"&gt;MiMo UltraSpeed for Agentic Coding: 106 Sessions Tested&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Disclosure: Xiaomi provided early access to MiMo UltraSpeed for testing. The workflow, measurements, analysis, and conclusions are my own.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>productivity</category>
      <category>programming</category>
      <category>benchmarks</category>
    </item>
    <item>
      <title>AI Dev Weekly #15: Fable 5 Banned, GLM-5.2 Open Weights, Gemini CLI Dead, GPT-5.6 Confirmed</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 18 Jun 2026 11:40:52 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/ai-dev-weekly-15-fable-5-banned-glm-52-open-weights-gemini-cli-dead-gpt-56-confirmed-9ol</link>
      <guid>https://dev.to/ai_made_tools/ai-dev-weekly-15-fable-5-banned-glm-52-open-weights-gemini-cli-dead-gpt-56-confirmed-9ol</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This week felt like watching dominoes fall. On Friday the US government yanked Fable 5 and Mythos 5 from every non-American developer on the planet. By Tuesday, a Chinese lab had open-sourced a model that rivals them. Today, Gemini CLI officially dies. And GPT-5.6 is confirmed for Monday. Buckle up.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. US government bans Fable 5 and Mythos 5
&lt;/h2&gt;

&lt;p&gt;On June 12, the US Commerce Department issued an export control directive ordering Anthropic to immediately suspend all access to Claude Fable 5 and Mythos 5 for any foreign national, whether inside or outside the United States. That includes foreign Anthropic employees.&lt;/p&gt;

&lt;p&gt;Within hours, Anthropic disabled both models for everyone worldwide to comply. Pro and Max subscribers were rolled back to Opus 4.8.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happened:&lt;/strong&gt; Anthropic's own &lt;a href="https://www.aimadetools.com/blog/claude-fable-5-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;model card&lt;/a&gt; disclosed that Fable 5 scores 95% on SWE-bench Verified. Combined with the hidden ML-research throttling (which essentially means the model &lt;em&gt;could&lt;/em&gt; help build competing frontier AI if unthrottled), the government apparently decided it's too capable to let foreign nationals access. The irony: Anthropic's transparency about capabilities may have triggered the ban that wouldn't have happened if they'd been vague about benchmarks.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fallout:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Every developer outside the US lost access to the best coding model overnight&lt;/li&gt;
&lt;li&gt;Anthropic says they disagree with the order but complied immediately&lt;/li&gt;
&lt;li&gt;The G7 summit this week (see below) is partly a response to this chaos&lt;/li&gt;
&lt;li&gt;Anthropic's IPO timeline just got complicated&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is a watershed moment. If you're a developer in Europe, Asia, or anywhere outside the US, you just learned that API access to frontier models can vanish with zero notice. The practical implication: you cannot build a business that depends on a single closed-source frontier model. You need fallbacks. Open weights just went from "nice to have" to "business continuity requirement."&lt;/p&gt;

&lt;p&gt;We covered this in detail: &lt;a href="https://www.aimadetools.com/blog/claude-fable-5-ban-explained/?utm_source=devto" rel="noopener noreferrer"&gt;Fable 5 ban explained&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/fable-5-banned-developer-alternatives/?utm_source=devto" rel="noopener noreferrer"&gt;what developers should do now&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. GLM-5.2 drops open weights on the same week
&lt;/h2&gt;

&lt;p&gt;The timing could not be more dramatic. On June 13, Z.ai (formerly Zhipu AI) launched GLM-5.2 on their coding plan. On June 17 (yesterday), they released &lt;a href="https://huggingface.co/zai-org/GLM-5.2" rel="noopener noreferrer"&gt;full MIT-licensed open weights on Hugging Face&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The numbers: 753B MoE model, ~40B active parameters per query. Ranks #2 on Code Arena behind only Fable 5. Beats GPT-5.5 on long-horizon coding benchmarks. Within 1% of Opus 4.8 on agentic coding. 1M usable context. MIT license means you can self-host, fine-tune, and use commercially with zero restrictions.&lt;/p&gt;

&lt;p&gt;The cost difference: $5.80 per million tokens vs $35 for GPT-5.5, vs $50 for Fable 5.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The catch:&lt;/strong&gt; It's 753B parameters total. Self-hosting requires ~800GB and eight H200 GPUs. Not exactly a laptop model. But quantized versions are already appearing (Unsloth GGUF is live), and the API is cheap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The bigger catch:&lt;/strong&gt; API usage goes through Z.ai's infrastructure in China. China's National Intelligence Law applies. For sensitive proprietary code, self-hosting is the only safe option.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The geopolitical narrative writes itself. The US bans access to its best model, and within five days China releases an open-weights rival. Whether that's coordinated or coincidental doesn't matter. The effect is the same: developers locked out of Fable 5 now have a frontier-class open alternative. &lt;a href="https://www.aimadetools.com/blog/glm-5-2-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;GLM-5.2 is real&lt;/a&gt;. For coding, it's the best open-source model that exists. See our &lt;a href="https://www.aimadetools.com/blog/how-to-run-glm-5-2-locally/?utm_source=devto" rel="noopener noreferrer"&gt;setup guide&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/glm-5-2-vs-claude-fable-5/?utm_source=devto" rel="noopener noreferrer"&gt;comparison with Fable 5&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Gemini CLI dies today
&lt;/h2&gt;

&lt;p&gt;As of today, June 18, Gemini CLI and Gemini Code Assist for individual users stop accepting requests. Google announced this transition a month ago at I/O: everything moves to Antigravity CLI, built in Go, with support for multiple async workflows and better agent architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What you need to do:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;If you're still on Gemini CLI: install &lt;code&gt;antigravity&lt;/code&gt; and migrate your config&lt;/li&gt;
&lt;li&gt;MCP servers carry over unchanged&lt;/li&gt;
&lt;li&gt;Free tier: 60 requests/minute remains, but now through Antigravity&lt;/li&gt;
&lt;li&gt;Enterprise customers: nothing changes, different endpoint&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; I wrote the &lt;a href="https://www.aimadetools.com/blog/migrate-gemini-cli-to-antigravity-cli/?utm_source=devto" rel="noopener noreferrer"&gt;migration guide&lt;/a&gt; a few weeks ago. If you haven't migrated yet, do it today. Antigravity is genuinely better (faster startup, real parallel agent spawning, shared context between sessions). This isn't a downgrade dressed as a rebrand. It's a real improvement.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. GPT-5.6 confirmed for June 23
&lt;/h2&gt;

&lt;p&gt;Multiple sources now confirm GPT-5.6 is launching Monday, June 23. OpenAI's Chief Scientist called it a "meaningful leap" in an interview with TechTimes. Developer forum leaks show it appearing in Codex logs. Prediction markets have it at 94%.&lt;/p&gt;

&lt;p&gt;No official specs yet, but based on leaks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Significant reasoning improvements over GPT-5.5&lt;/li&gt;
&lt;li&gt;Likely improved agentic coding performance&lt;/li&gt;
&lt;li&gt;Expected to be competitive with Fable 5 on benchmarks (or at least close)&lt;/li&gt;
&lt;li&gt;Pricing TBD&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The timing is interesting. Fable 5 gets banned on Friday, GPT-5.6 launches Monday. If you're a non-US developer who just lost Claude access, OpenAI is about to be your only frontier option (besides self-hosted GLM-5.2). I'll have a full review and comparison the day it drops.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. G7 AI summit: the adults enter the room
&lt;/h2&gt;

&lt;p&gt;The G7 summit in Evian, France this week included an unprecedented closed-door session with AI company CEOs. Anthropic's Dario Amodei, Google DeepMind's Demis Hassabis, and OpenAI's Sam Altman sat down with heads of state including Trump to discuss AI governance.&lt;/p&gt;

&lt;p&gt;Amodei and Hassabis proposed a "US-led coalition" to create AI standards and regulations. The subtext: they'd rather have predictable regulation they can shape than sudden export bans that blindside them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; Developers don't need to care about G7 communiqués. But pay attention to what comes out of this: if the US creates a formal framework for AI export controls (rather than surprise Friday evening orders), it'll affect which models you can use and where. The Fable 5 ban was chaotic. A formal framework would at least be predictable.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Vercel Ship: the Agent Stack
&lt;/h2&gt;

&lt;p&gt;Vercel held Ship in London on June 17. The big announcement: the "Agent Stack" combining AI SDK, AI Gateway, Vercel Sandbox, Workflow SDK, and Chat SDK into a unified platform for building and deploying AI agents.&lt;/p&gt;

&lt;p&gt;New additions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Vercel Connect&lt;/strong&gt; — replaces long-lived API credentials with scoped, short-lived tokens and audit trails (finally)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent framework&lt;/strong&gt; — opinionated way to build multi-step agents that deploy to Vercel's infrastructure&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Enterprise controls&lt;/strong&gt; — SOC 2 compliance, approval workflows for AI actions&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Customers include DoorDash, OpenAI, Stripe, and The Weather Company.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; If you're building AI-powered web apps on Vercel (we are, for all four of our sites), the Agent Stack makes it significantly easier to add AI features without managing your own inference infrastructure. The Connect feature addresses a real security gap: too many AI tools are running with permanent API keys that never rotate.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Claude Corps&lt;/strong&gt; — $150M program training nonprofit staff to use Claude. Partnered with CodePath and Social Finance. Good PR move while Fable 5 is banned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Databricks Genie Code&lt;/strong&gt; — New coding agent for ML engineering. Helps data scientists build ML pipelines faster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.6 leak&lt;/strong&gt; — June 23 launch date now at 94% on prediction markets. OpenAI's Chief Scientist confirmed "meaningful leap."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini outage recovery&lt;/strong&gt; — Last week's "error 1076" outage fully resolved. Post-mortem blamed cascading failures in load balancing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MiMo UltraSpeed now open&lt;/strong&gt; — Xiaomi's 1,000 tok/s model available to all PAYG users as of June 14. We've been testing it for the &lt;a href="https://dev.to/race/"&gt;AI startup race&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm watching next week
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GPT-5.6 launch (June 23)&lt;/strong&gt; — Will it match Fable 5? Will it be available globally? Pricing?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fable 5 ban status&lt;/strong&gt; — Does Anthropic challenge it? Does the G7 produce any framework that might reverse it?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GLM-5.2 community benchmarks&lt;/strong&gt; — MIT open weights are 24 hours old. Real-world testing data incoming.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The race&lt;/strong&gt; — Season 1 ends July 3. All agents running on Opus 4.8 fallback since Fable 5 ban. Results article (HN candidate) coming. &lt;a href="https://dev.to/race/"&gt;Follow the race →&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;em&gt;AI Dev Weekly publishes every Thursday. &lt;a href="https://app.kit.com/forms/9198516/subscriptions" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt; for the newsletter version.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-015-fable-5-banned-glm-5-2-open-gpt-5-6-confirmed/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>fable5</category>
      <category>glm52</category>
      <category>gpt56</category>
    </item>
    <item>
      <title>AI Dev Weekly #14: Claude Fable 5 Controversy, DiffusionGemma Breaks Text Generation, Apple Rebuilds Siri</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 11 Jun 2026 13:02:38 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/ai-dev-weekly-14-claude-fable-5-controversy-diffusiongemma-breaks-text-generation-apple-3m5a</link>
      <guid>https://dev.to/ai_made_tools/ai-dev-weekly-14-claude-fable-5-controversy-diffusiongemma-breaks-text-generation-apple-3m5a</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This was the most packed week in AI since I started writing this newsletter. Four separate stories that would each dominate a normal week all landed within 72 hours: Anthropic shipped their most powerful model ever (with hidden restrictions that sparked fury), Google invented a new way to generate text 4× faster, Apple rebuilt Siri from scratch on Gemini, and a German court made a ruling that affects every developer deploying AI. Let's go.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Claude Fable 5: the best model with the worst controversy
&lt;/h2&gt;

&lt;p&gt;Anthropic released &lt;a href="https://www.aimadetools.com/blog/claude-fable-5-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Fable 5&lt;/a&gt; on June 9 — their first Mythos-class model available to the general public. The benchmarks are genuinely staggering: &lt;strong&gt;95% on SWE-bench Verified&lt;/strong&gt;, 80% on SWE-bench Pro, and 91/100 on Every's Senior Engineer benchmark (vs 63 for Opus 4.8 and 62 for GPT-5.5).&lt;/p&gt;

&lt;p&gt;The specs: 1M context, 128K max output, $10/$50 per million tokens (exactly 2× Opus 4.8). Free on Pro/Max/Team/Enterprise through June 22.&lt;/p&gt;

&lt;p&gt;But then people read the model card.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The controversy:&lt;/strong&gt; Fable 5 contains hidden interventions that silently limit its effectiveness when you ask about frontier LLM development — pretraining pipelines, distributed training infrastructure, ML accelerator design. Unlike the explicit cyber/bio safeguards (which fall back to Opus 4.8 and tell you), these interventions use steering vectors and PEFT to quietly make Claude &lt;em&gt;less helpful&lt;/em&gt; without any notification. You can't distinguish between "the model doesn't know" and "the model is being throttled."&lt;/p&gt;

&lt;p&gt;Fortune reported it as &lt;a href="http://fortune.com/2026/06/10/anthropic-accu-claude-fable-5-limits-capabilities-ai-researchers-developers/" rel="noopener noreferrer"&gt;"secret sabotage"&lt;/a&gt;. The Hacker News thread hit 1,000+ points. Researchers are furious.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The model is extraordinary for coding. If you're building apps, writing code, debugging systems — &lt;a href="https://www.aimadetools.com/blog/claude-fable-5-vs-opus-4-8/?utm_source=devto" rel="noopener noreferrer"&gt;Fable 5 is the best tool that exists&lt;/a&gt;. But if you're doing ML research, you now have to wonder whether every mediocre answer is a genuine limitation or a silent policy intervention. That's corrosive to trust in a way that explicit refusals never were. See our &lt;a href="https://www.aimadetools.com/blog/claude-fable-5-safeguards-explained/?utm_source=devto" rel="noopener noreferrer"&gt;safeguards deep dive&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/claude-fable-5-claude-code-setup/?utm_source=devto" rel="noopener noreferrer"&gt;setup guide for Claude Code&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. DiffusionGemma: Google reinvents text generation
&lt;/h2&gt;

&lt;p&gt;While everyone was arguing about Fable 5, Google DeepMind quietly dropped something that might matter more long-term. &lt;a href="https://www.aimadetools.com/blog/diffusiongemma-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;DiffusionGemma&lt;/a&gt; is an open-source model that generates text using &lt;strong&gt;diffusion&lt;/strong&gt; instead of autoregressive token-by-token generation.&lt;/p&gt;

&lt;p&gt;Instead of predicting one token at a time (left to right), DiffusionGemma starts with a canvas of random placeholder tokens and iteratively refines them &lt;strong&gt;all in parallel&lt;/strong&gt; over multiple denoising passes. Think Stable Diffusion, but for text instead of images.&lt;/p&gt;

&lt;p&gt;The result: &lt;strong&gt;4× faster generation, 1,000+ tokens per second&lt;/strong&gt; on NVIDIA RTX GPUs. The model is 26B total / 3.8B active (MoE), fits in 18GB VRAM, and ships under Apache 2.0.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is experimental — quality won't match Fable 5 or GPT-5.5 on hard reasoning tasks yet. But the speed implications are enormous. Real-time chatbots, voice agents, gaming NPCs, live coding suggestions — anywhere latency matters, diffusion models could be transformative. If this approach matures, the entire "tokens per second" conversation changes. See our &lt;a href="https://www.aimadetools.com/blog/what-is-text-diffusion-llm/?utm_source=devto" rel="noopener noreferrer"&gt;explainer on how text diffusion works&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/how-to-run-diffusiongemma-locally/?utm_source=devto" rel="noopener noreferrer"&gt;local setup guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Apple WWDC 2026: Siri AI, Core AI, and Xcode 27
&lt;/h2&gt;

&lt;p&gt;Apple used &lt;a href="https://www.aimadetools.com/blog/wwdc-2026-ai-developer-recap/?utm_source=devto" rel="noopener noreferrer"&gt;WWDC 2026&lt;/a&gt; to rebuild their entire AI stack:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Siri AI&lt;/strong&gt; — 1.2T parameter model built on Google Gemini technology. Personal context, on-screen awareness, app actions. SiriKit deprecated, &lt;a href="https://www.aimadetools.com/blog/siri-ai-developers-app-intents-2026/?utm_source=devto" rel="noopener noreferrer"&gt;App Intents mandatory&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/what-is-apple-core-ai/?utm_source=devto" rel="noopener noreferrer"&gt;Core AI&lt;/a&gt;&lt;/strong&gt; — Brand new framework for running your own models on Apple Silicon. Zero server cost, zero data leaving the device. PyTorch conversion pipeline, quantization toolkit, Xcode debugger.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/xcode-27-agentic-coding-claude-gemini-gpt/?utm_source=devto" rel="noopener noreferrer"&gt;Xcode 27&lt;/a&gt;&lt;/strong&gt; — Claude, Gemini, and GPT agents built directly into the IDE. MCP support, Agent Client Protocol, Device Hub. Apple silicon only, 30% smaller.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/apple-foundation-models-free-cloud-ai-small-developers/?utm_source=devto" rel="noopener noreferrer"&gt;Foundation Models&lt;/a&gt;&lt;/strong&gt; — Free Private Cloud Compute for apps with &amp;lt;2M downloads. Single Swift API for on-device + cloud + third-party models via the new &lt;a href="https://www.aimadetools.com/blog/apple-language-model-protocol-ios-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Language Model Protocol&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The free cloud AI for small developers is the sleeper story. If you're an indie dev building an iOS app, you just got GPT-class intelligence at zero cost. The &lt;a href="https://www.aimadetools.com/blog/apple-google-gemini-partnership-developers-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Apple × Google partnership&lt;/a&gt; ($1B/year) powers all of this. Apple is making the Gemini-class model &lt;em&gt;their&lt;/em&gt; model by training on it rather than deploying it directly — clever for privacy positioning.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. German court makes AI providers liable for AI-generated content
&lt;/h2&gt;

&lt;p&gt;The Landgericht München &lt;a href="https://www.aimadetools.com/blog/german-court-google-ai-overview-liable/?utm_source=devto" rel="noopener noreferrer"&gt;ruled on May 28&lt;/a&gt; that Google's AI Overviews are Google's &lt;strong&gt;own content&lt;/strong&gt;, not third-party search results. Three key holdings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;AI-generated summaries = the operator's own statements (not mere indexing)&lt;/li&gt;
&lt;li&gt;"Users can fact-check themselves" is NOT a valid defense&lt;/li&gt;
&lt;li&gt;DSA platform protections don't apply to AI-generated content&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This affects every developer deploying AI that generates user-facing content. ChatGPT, Claude, Perplexity — the same logic applies. If your AI generates something defamatory, you may be liable as the author, not protected as a platform.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; Start logging AI outputs and implementing content moderation if you haven't already. The &lt;a href="https://www.aimadetools.com/blog/ai-liability-developers/?utm_source=devto" rel="noopener noreferrer"&gt;EU Product Liability Directive&lt;/a&gt; explicitly includes AI, with a December 2026 transposition deadline. See our &lt;a href="https://www.aimadetools.com/blog/ai-generated-content-liability-2026/?utm_source=devto" rel="noopener noreferrer"&gt;full legal analysis&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Cohere North Mini Code: open-source MoE for coding
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.aimadetools.com/blog/cohere-north-mini-code-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Cohere launched North Mini Code&lt;/a&gt; — a 30B/3B-active MoE model under Apache 2.0, purpose-built for agentic coding. It scores 33.4 on the Artificial Analysis Coding Index (just behind Qwen 3.6 35B-A3B at 35.2) while beating models 4× its size.&lt;/p&gt;

&lt;p&gt;Available on HuggingFace (BF16 + FP8), Cohere API, and OpenRouter. It's Cohere's first developer-focused model and first fully open-source release.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The "3B active parameters" angle is interesting — same active compute as &lt;a href="https://www.aimadetools.com/blog/north-mini-code-vs-qwen-3-6-35b-a3b/?utm_source=devto" rel="noopener noreferrer"&gt;Qwen 3.6 35B-A3B&lt;/a&gt; but with 128 experts (vs Qwen's smaller expert count). Good for local coding if you want an Apache 2.0 alternative to Qwen.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Gemma 4 12B: multimodal AI on a laptop
&lt;/h2&gt;

&lt;p&gt;Google also dropped &lt;a href="https://www.aimadetools.com/blog/gemma-4-12b-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Gemma 4 12B&lt;/a&gt; on June 3 — a 12B dense model that processes text, images, audio, AND video natively without any encoder. Runs on 16GB RAM. Apache 2.0.&lt;/p&gt;

&lt;p&gt;It nearly matches the 27B Gemma 4 model at half the size and clearly beats the older Gemma 3 27B. There's also a multi-token prediction variant for even faster local inference.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is the &lt;a href="https://www.aimadetools.com/blog/how-to-run-gemma-4-12b-locally/?utm_source=devto" rel="noopener noreferrer"&gt;best model for laptops with 16GB&lt;/a&gt; right now. Multimodal input without needing separate models for vision/audio is genuinely useful for agentic workflows. Pair it with &lt;a href="https://www.aimadetools.com/blog/what-is-apple-core-ai/?utm_source=devto" rel="noopener noreferrer"&gt;Core AI on Mac&lt;/a&gt; and you have a powerful local stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic filed for IPO&lt;/strong&gt; — Confidential S-1 filed just before Fable 5 launch. The timing is not a coincidence.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini went down&lt;/strong&gt; — Major outage on June 10, "error 1076." Recovered after several hours.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI "Economic Research Exchange"&lt;/strong&gt; — Academic program, not a product. Skip.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MiMo-V2.5-Pro-UltraSpeed&lt;/strong&gt; — Xiaomi announced 1,000+ tok/s on general GPUs for their trillion-parameter model. Limited access until June 23.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nex-N2-Pro free on OpenRouter&lt;/strong&gt; — New free model from Nex AGI. 262K context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tim Cook's last WWDC&lt;/strong&gt; — Stepping down September 1, 2026. John Ternus takes over.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm watching next week
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Fable 5 in the wild&lt;/strong&gt; — Does the competitor blocking actually affect real developers? Or is it a niche issue for ML researchers only?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DiffusionGemma community testing&lt;/strong&gt; — Speed is proven. What about quality on real coding tasks?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI shutdown&lt;/strong&gt; (June 18) — One week left. &lt;a href="https://www.aimadetools.com/blog/migrate-gemini-cli-to-antigravity-cli/?utm_source=devto" rel="noopener noreferrer"&gt;Migrate to Antigravity CLI now&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MiMo UltraSpeed access&lt;/strong&gt; — We're getting early access for review. Stay tuned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;The race&lt;/strong&gt; — Xiaomi at 1,200 users/week. GLM built a full conversion funnel. Claude running Google Ads. Still $0 revenue. 4 weeks left. &lt;a href="https://dev.to/race/"&gt;Follow the race →&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;em&gt;AI Dev Weekly publishes every Thursday. &lt;a href="https://app.kit.com/forms/9198516/subscriptions" rel="noopener noreferrer"&gt;Subscribe&lt;/a&gt; for the newsletter version.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-014-claude-fable-5-cohere-north-german-ruling/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>fable5</category>
      <category>diffusiongemma</category>
      <category>apple</category>
    </item>
    <item>
      <title>Vector Databases Compared: Pinecone vs Weaviate vs Qdrant vs Chroma (2026)</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 09 Jun 2026 12:07:37 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/vector-databases-compared-pinecone-vs-weaviate-vs-qdrant-vs-chroma-2026-1e62</link>
      <guid>https://dev.to/ai_made_tools/vector-databases-compared-pinecone-vs-weaviate-vs-qdrant-vs-chroma-2026-1e62</guid>
      <description>&lt;p&gt;Vector databases store &lt;a href="https://www.aimadetools.com/blog/embeddings-explained-developers/?utm_source=devto" rel="noopener noreferrer"&gt;embeddings&lt;/a&gt; and find similar ones fast. They're the retrieval layer behind every RAG system, AI search engine, and recommendation system.&lt;/p&gt;

&lt;p&gt;Here's how the top four compare in production.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick comparison
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Pinecone&lt;/th&gt;
&lt;th&gt;Qdrant&lt;/th&gt;
&lt;th&gt;Weaviate&lt;/th&gt;
&lt;th&gt;Chroma&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Type&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Fully managed&lt;/td&gt;
&lt;td&gt;Open source + cloud&lt;/td&gt;
&lt;td&gt;Open source + cloud&lt;/td&gt;
&lt;td&gt;Open source&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;p50 latency (1M vectors)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;8ms&lt;/td&gt;
&lt;td&gt;6ms&lt;/td&gt;
&lt;td&gt;12ms&lt;/td&gt;
&lt;td&gt;18ms&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max scale&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Billions&lt;/td&gt;
&lt;td&gt;Billions&lt;/td&gt;
&lt;td&gt;Billions&lt;/td&gt;
&lt;td&gt;~5M practical&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Hybrid search&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅ Native&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Self-host&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅ (Apache 2.0)&lt;/td&gt;
&lt;td&gt;✅ (BSD-3)&lt;/td&gt;
&lt;td&gt;✅ (Apache 2.0)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free tier&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Serverless&lt;/td&gt;
&lt;td&gt;✅ 1GB cloud&lt;/td&gt;
&lt;td&gt;✅ Sandbox&lt;/td&gt;
&lt;td&gt;✅ (local only)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Best for&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Zero-ops production&lt;/td&gt;
&lt;td&gt;Performance + cost control&lt;/td&gt;
&lt;td&gt;Complex queries&lt;/td&gt;
&lt;td&gt;Prototyping&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Pinecone — best for zero-ops
&lt;/h2&gt;

&lt;p&gt;You don't manage servers, indexes, or scaling. Send vectors, query vectors, done. The serverless tier handles burst traffic without pre-provisioning.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing:&lt;/strong&gt; Serverless starts free (2GB storage). Pay-as-you-go after that. Roughly $0.33/1M reads at scale.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick Pinecone when:&lt;/strong&gt; You want production-ready vector search without any infrastructure work. Your team doesn't have (or want) a dedicated ops person.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qdrant — best performance per dollar
&lt;/h2&gt;

&lt;p&gt;Fastest p50 latency at 6ms. Written in Rust with HNSW indexing and product quantization. Apache 2.0 means zero licensing costs for self-hosted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing:&lt;/strong&gt; Self-hosted is free. Cloud starts at $0.05/hour (~$36/month).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick Qdrant when:&lt;/strong&gt; You want the best raw performance, you're comfortable self-hosting, or you need to keep data on your own infrastructure for &lt;a href="https://www.aimadetools.com/blog/best-ai-coding-agents-privacy-2026/?utm_source=devto" rel="noopener noreferrer"&gt;privacy&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Weaviate — best for hybrid search
&lt;/h2&gt;

&lt;p&gt;The only database with native BM25 + vector hybrid search built into the query engine (not bolted on). Also supports GraphQL API and knowledge graph features.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing:&lt;/strong&gt; Self-hosted is free. Cloud sandbox is free. Production cloud starts at ~$25/month.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick Weaviate when:&lt;/strong&gt; You need hybrid search (keyword + semantic), your queries are complex, or you want GraphQL.&lt;/p&gt;

&lt;h2&gt;
  
  
  Chroma — best for prototyping
&lt;/h2&gt;

&lt;p&gt;Install as a Python package, embed documents with three lines of code, query immediately. The fastest path from zero to working vector search.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pricing:&lt;/strong&gt; Free (open source, runs locally).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick Chroma when:&lt;/strong&gt; You're prototyping, learning, or building something with &amp;lt;1M vectors. Migrate to Pinecone or Qdrant when you outgrow it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Chroma: 3 lines to working vector search
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;
&lt;span class="n"&gt;collection&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chromadb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;create_collection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;docs&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;documents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your text here&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;ids&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;collection&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query_texts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;n_results&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  pgvector — the "just use Postgres" option
&lt;/h2&gt;

&lt;p&gt;If you already run &lt;a href="https://www.aimadetools.com/blog/what-is-postgresql/?utm_source=devto" rel="noopener noreferrer"&gt;PostgreSQL&lt;/a&gt;, pgvector adds vector search without a new database. Performance is good enough for most apps under 10M vectors.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pick pgvector when:&lt;/strong&gt; You already use Postgres and don't want another database to manage. Your vector count is under 10M.&lt;/p&gt;

&lt;h2&gt;
  
  
  Decision flowchart
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Prototyping?&lt;/strong&gt; → Chroma&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Already on Postgres?&lt;/strong&gt; → pgvector&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want zero ops?&lt;/strong&gt; → Pinecone&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Want best performance?&lt;/strong&gt; → Qdrant&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need hybrid search?&lt;/strong&gt; → Weaviate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Need to self-host?&lt;/strong&gt; → Qdrant or Weaviate&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What about scale?
&lt;/h2&gt;

&lt;p&gt;At 1M vectors, all four are fast enough. The differences matter at 10M+:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Vectors&lt;/th&gt;
&lt;th&gt;Chroma&lt;/th&gt;
&lt;th&gt;pgvector&lt;/th&gt;
&lt;th&gt;Weaviate&lt;/th&gt;
&lt;th&gt;Qdrant&lt;/th&gt;
&lt;th&gt;Pinecone&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;100K&lt;/td&gt;
&lt;td&gt;✅ Fast&lt;/td&gt;
&lt;td&gt;✅ Fast&lt;/td&gt;
&lt;td&gt;✅ Fast&lt;/td&gt;
&lt;td&gt;✅ Fast&lt;/td&gt;
&lt;td&gt;✅ Fast&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1M&lt;/td&gt;
&lt;td&gt;✅ OK&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;td&gt;✅ Best&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10M&lt;/td&gt;
&lt;td&gt;⚠️ Slow&lt;/td&gt;
&lt;td&gt;⚠️ Needs tuning&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;td&gt;✅ Good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;100M+&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;Related: &lt;a href="https://www.aimadetools.com/blog/embeddings-explained-developers/?utm_source=devto" rel="noopener noreferrer"&gt;Embeddings Explained&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/build-ai-search-engine/?utm_source=devto" rel="noopener noreferrer"&gt;How to Build an AI Search Engine&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/rag-vs-fine-tuning/?utm_source=devto" rel="noopener noreferrer"&gt;RAG vs Fine-Tuning&lt;/a&gt;&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/vector-databases-compared/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>vectordatabase</category>
      <category>aisearch</category>
      <category>rag</category>
      <category>comparison</category>
    </item>
    <item>
      <title>AI Dev Weekly #13: Microsoft Declares Independence — 7 In-House Models, Kills Claude Code, RTX Spark Dev Box</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 04 Jun 2026 14:11:45 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/ai-dev-weekly-13-microsoft-declares-independence-7-in-house-models-kills-claude-code-rtx-2lca</link>
      <guid>https://dev.to/ai_made_tools/ai-dev-weekly-13-microsoft-declares-independence-7-in-house-models-kills-claude-code-rtx-2lca</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Microsoft Build 2026 was the story of the week. Seven in-house AI models — none trained on OpenAI data. A Surface mini PC with NVIDIA RTX Spark inside. Claude Code licenses cancelled, developers pushed to Copilot. This was Microsoft saying, loud and clear: we don't need OpenAI anymore. Meanwhile MiniMax dropped the first open-weight frontier multimodal model, and NVIDIA unveiled hardware that makes local AI actually practical.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Microsoft Build 2026: 7 models, zero OpenAI dependency
&lt;/h2&gt;

&lt;p&gt;Microsoft unveiled its MAI (Microsoft AI) model family at Build on June 2. The headline: &lt;strong&gt;MAI-Thinking-1&lt;/strong&gt; — a 35B reasoning model trained entirely on commercially licensed enterprise data with no distillation from GPT or any OpenAI model. It matches Claude Sonnet 4.6 on key benchmarks at up to 10× better cost efficiency.&lt;/p&gt;

&lt;p&gt;Other MAI models announced:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MAI-Code-1-Flash&lt;/strong&gt; (5B) — Purpose-built for GitHub Copilot and VS Code&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MAI-Thinking-1&lt;/strong&gt; (35B) — Reasoning, multi-step instructions, long context&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aion 1.0 Instruct&lt;/strong&gt; — Local Windows model for on-device reasoning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Aion 1.0 Plan&lt;/strong&gt; — Local Windows model for planning and tool use&lt;/li&gt;
&lt;li&gt;Plus 3 more across transcription, speech, and images&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Also announced: &lt;strong&gt;Windows as "agent-native runtime"&lt;/strong&gt; with Microsoft Execution Containers (MXC) — sandboxed environments for running AI agents with enterprise-grade isolation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is the biggest signal yet that the OpenAI-Microsoft marriage is evolving into a polite separation. MAI-Thinking-1 being trained without any OpenAI data is deliberate positioning — Microsoft can now say "our models have no license entanglement with OpenAI." The 5B coding model (MAI-Code-1-Flash) specifically targets Copilot — Microsoft's most important developer product. They're replacing GPT inside their own tools with models they fully control.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Microsoft ends Claude Code licenses
&lt;/h2&gt;

&lt;p&gt;Forbes reported that Microsoft is &lt;a href="https://www.forbes.com/sites/jonmarkman/2026/06/01/microsoft-ends-claude-code-licenses-as-it-pushes-copilot-cli/" rel="noopener noreferrer"&gt;ending Claude Code licenses&lt;/a&gt; and pushing developers to its own Copilot CLI instead. The subtext: Microsoft no longer wants to rent Anthropic's intelligence inside its own products.&lt;/p&gt;

&lt;p&gt;This affects developers at Microsoft who were using &lt;a href="https://www.aimadetools.com/blog/how-to-use-claude-code?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; as their primary coding tool. They're being migrated to Copilot powered by MAI-Code-1-Flash.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; If you work at a Microsoft shop: prepare for Copilot to get much better (MAI-Code-1 is purpose-built for it). If you use Claude Code independently: nothing changes for you. But this signals that the era of "one AI provider to rule them all" is over. Every major tech company is building their own models now.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. NVIDIA RTX Spark: the hardware that makes local AI real
&lt;/h2&gt;

&lt;p&gt;At Computex (June 1), NVIDIA unveiled &lt;a href="https://www.aimadetools.com/blog/nvidia-rtx-spark-complete-guide?utm_source=devto" rel="noopener noreferrer"&gt;RTX Spark&lt;/a&gt; — a new Windows PC superchip with 128GB unified memory, ARM CPU, Blackwell GPU, 1 petaflop of AI compute. It runs 120B parameter models locally.&lt;/p&gt;

&lt;p&gt;Then at Microsoft Build (June 2), Microsoft announced the &lt;strong&gt;Surface RTX Spark Dev Box&lt;/strong&gt; — a mini PC with RTX Spark inside, preloaded with VS Code, GitHub Copilot, WSL2 with GPU passthrough, CUDA, Python, Git, and Node.js. It's purpose-built for AI developers.&lt;/p&gt;

&lt;p&gt;Key numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;128GB unified memory (run &lt;a href="https://www.aimadetools.com/blog/best-llms-nvidia-rtx-spark?utm_source=devto" rel="noopener noreferrer"&gt;Qwen 3.6 27B at 2× speed&lt;/a&gt;, Llama 4 Scout, etc.)&lt;/li&gt;
&lt;li&gt;100W sustained thermal design in aluminium chassis&lt;/li&gt;
&lt;li&gt;Ships with Windows 11 Pro + full dev stack preinstalled&lt;/li&gt;
&lt;li&gt;Available this fall alongside consumer RTX Spark laptops&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is the machine I've been wanting for the &lt;a href="https://dev.to/race/"&gt;AI Startup Race&lt;/a&gt;. Currently our agents run on a $40/mo VPS. The Surface RTX Spark Dev Box would let you run 120B models locally with zero API costs. For developers spending $100+/month on AI APIs, this hardware pays for itself fast. See our &lt;a href="https://www.aimadetools.com/blog/nvidia-rtx-spark-complete-guide?utm_source=devto" rel="noopener noreferrer"&gt;full RTX Spark guide&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/nvidia-rtx-spark-vs-mac-studio-local-ai?utm_source=devto" rel="noopener noreferrer"&gt;vs Mac Studio comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. MiniMax M3: first open-weight frontier multimodal
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.aimadetools.com/blog/minimax-m3-complete-guide?utm_source=devto" rel="noopener noreferrer"&gt;MiniMax M3&lt;/a&gt; launched June 1. First open-weight model combining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;59% SWE-bench Pro (beats GPT-5.5's 58.6%)&lt;/li&gt;
&lt;li&gt;1M token context via MSA (15.6× faster than standard attention)&lt;/li&gt;
&lt;li&gt;Native text + images + video input&lt;/li&gt;
&lt;li&gt;Computer use (desktop operation)&lt;/li&gt;
&lt;li&gt;$0.60/$2.40 per million tokens&lt;/li&gt;
&lt;li&gt;Weights dropping ~June 10&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It reproduced an ICLR 2025 paper autonomously — 12 hours, 18 commits, 23 figures, zero human intervention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; M3 is the model that should worry Anthropic most. It beats GPT-5.5 on coding while being open-weight, multimodal, and 10× cheaper than Opus. When weights drop (~June 10), enterprises with data privacy requirements get a frontier model they can run on-premise. That was previously impossible without Claude's closed API. See our &lt;a href="https://www.aimadetools.com/blog/minimax-m3-complete-guide?utm_source=devto" rel="noopener noreferrer"&gt;M3 complete guide&lt;/a&gt;, &lt;a href="https://www.aimadetools.com/blog/minimax-m3-vs-claude-opus-4-8?utm_source=devto" rel="noopener noreferrer"&gt;vs Claude Opus 4.8&lt;/a&gt;, and &lt;a href="https://www.aimadetools.com/blog/minimax-m3-vs-deepseek-v4-pro?utm_source=devto" rel="noopener noreferrer"&gt;vs DeepSeek V4-Pro&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Grok's next model teased (mid-June)
&lt;/h2&gt;

&lt;p&gt;xAI teased their next Grok model — reportedly tripling the parameter count with a focus on coding leadership. Expected release: mid-June 2026. Supervised fine-tuning was complete as of the announcement, with reinforcement learning underway.&lt;/p&gt;

&lt;p&gt;If it delivers on the coding promise, it could shake up the &lt;a href="https://www.aimadetools.com/blog/grok-build-complete-guide?utm_source=devto" rel="noopener noreferrer"&gt;Grok Build&lt;/a&gt; ecosystem significantly. Currently Grok Build uses Grok 4.3 — an upgrade to whatever this new model is could make the $30/mo subscription much more competitive.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI Codex on Amazon Bedrock&lt;/strong&gt; — GPT models + Codex now generally available on AWS with enterprise controls. OpenAI meeting customers where they already deploy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;White House signed frontier-model cyber order&lt;/strong&gt; — Regulation incoming for the most capable AI models used in cybersecurity applications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic expanded Project Glasswing&lt;/strong&gt; — More organizations getting access to Mythos Preview for cybersecurity work. Broader release "in weeks."&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;NVIDIA Nemotron 3 Ultra free on OpenRouter&lt;/strong&gt; — NVIDIA's own model now available at zero cost via &lt;a href="https://www.aimadetools.com/blog/openrouter-complete-guide?utm_source=devto" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm watching next week
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;MiniMax M3 weights&lt;/strong&gt; (~June 10) — Will they live up to the benchmarks when the community tests them?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Grok new model&lt;/strong&gt; (mid-June) — Does tripling parameters actually improve coding?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI sunset&lt;/strong&gt; (June 18) — Two weeks left. &lt;a href="https://www.aimadetools.com/blog/migrate-gemini-cli-to-antigravity-cli/?utm_source=devto" rel="noopener noreferrer"&gt;Migrate now&lt;/a&gt; if you haven't.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Anthropic Mythos&lt;/strong&gt; — Still "coming in weeks." Will it ship before the Grok model does?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Our race agents&lt;/strong&gt; — Xiaomi hit session 456. 605 users/week on its site. Zero revenue still. 5 weeks left. &lt;a href="https://dev.to/race/"&gt;Follow along →&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;em&gt;AI Dev Weekly publishes every Thursday. &lt;a href="https://dev.to/race/season1/digest"&gt;Subscribe&lt;/a&gt; for weekly race updates and AI developer news.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-013-microsoft-build-independence-rtx-spark-minimax-m3/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>microsoft</category>
      <category>nvidia</category>
      <category>minimax</category>
    </item>
    <item>
      <title>AI and GDPR — What Developers Actually Need to Know (2026)</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Tue, 02 Jun 2026 12:46:18 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/ai-and-gdpr-what-developers-actually-need-to-know-2026-fbl</link>
      <guid>https://dev.to/ai_made_tools/ai-and-gdpr-what-developers-actually-need-to-know-2026-fbl</guid>
      <description>&lt;p&gt;If you're a developer at an EU company using &lt;a href="https://www.aimadetools.com/blog/how-to-use-claude-code/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt;, &lt;a href="https://www.aimadetools.com/blog/cursor-ai-one-week-review/?utm_source=devto" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;, or any AI coding tool — your company may be violating GDPR without knowing it. Every prompt you send is data that gets processed on someone else's servers.&lt;/p&gt;

&lt;p&gt;Here's what you actually need to know.&lt;/p&gt;

&lt;h2&gt;
  
  
  The core problem
&lt;/h2&gt;

&lt;p&gt;When you use an AI coding tool, your code travels to external servers. If that code contains:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Personal data (user emails, names, addresses in test fixtures)&lt;/li&gt;
&lt;li&gt;Database schemas with PII fields&lt;/li&gt;
&lt;li&gt;API keys or credentials&lt;/li&gt;
&lt;li&gt;Customer data in config files&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;...then you're transferring personal data to a third-party processor. Under GDPR, that requires a legal basis, a Data Processing Agreement (DPA), and potentially a Transfer Impact Assessment if the data leaves the EU.&lt;/p&gt;

&lt;h2&gt;
  
  
  Which AI tools are GDPR compliant?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool/Provider&lt;/th&gt;
&lt;th&gt;DPA available?&lt;/th&gt;
&lt;th&gt;EU data residency?&lt;/th&gt;
&lt;th&gt;Training on your data?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/what-is-mistral-ai/?utm_source=devto" rel="noopener noreferrer"&gt;Mistral&lt;/a&gt; API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;✅ EU-based&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Anthropic API&lt;/strong&gt; (Claude)&lt;/td&gt;
&lt;td&gt;✅ Yes (Team/Enterprise)&lt;/td&gt;
&lt;td&gt;⚠️ US servers&lt;/td&gt;
&lt;td&gt;❌ No (API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;OpenAI API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;⚠️ US servers (EU option available)&lt;/td&gt;
&lt;td&gt;❌ No (API)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Google Vertex AI&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Yes&lt;/td&gt;
&lt;td&gt;✅ EU region available&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Claude Pro subscription&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Consumer terms&lt;/td&gt;
&lt;td&gt;❌ US&lt;/td&gt;
&lt;td&gt;⚠️ May be used&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;ChatGPT Plus&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;❌ Consumer terms&lt;/td&gt;
&lt;td&gt;❌ US&lt;/td&gt;
&lt;td&gt;⚠️ May be used&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&lt;a href="https://www.aimadetools.com/blog/best-ai-coding-agents-privacy-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Self-hosted&lt;/a&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A (your servers)&lt;/td&gt;
&lt;td&gt;✅ You control it&lt;/td&gt;
&lt;td&gt;❌ No&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key distinction:&lt;/strong&gt; API access (business terms, DPA available) is different from consumer subscriptions (personal terms, no DPA). If your company uses ChatGPT Plus or Claude Pro for work, that's a compliance risk.&lt;/p&gt;

&lt;h2&gt;
  
  
  The safest options for EU developers
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Option 1: Self-hosted (zero data transfer)
&lt;/h3&gt;

&lt;p&gt;Run models locally — nothing leaves your machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama pull devstral-small:24b
aider &lt;span class="nt"&gt;--model&lt;/span&gt; ollama/devstral-small:24b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See our &lt;a href="https://www.aimadetools.com/blog/best-ai-coding-agents-privacy-2026/?utm_source=devto" rel="noopener noreferrer"&gt;self-hosted AI guide&lt;/a&gt; and &lt;a href="https://www.aimadetools.com/blog/ollama-complete-guide-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Ollama guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 2: Mistral API (EU-native)
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.aimadetools.com/blog/what-is-mistral-ai/?utm_source=devto" rel="noopener noreferrer"&gt;Mistral&lt;/a&gt; is based in Paris. Data stays in the EU by default. No transatlantic transfers, no Standard Contractual Clauses needed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;mistralai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Mistral&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Mistral&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your-key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Data processed in EU infrastructure
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See our &lt;a href="https://www.aimadetools.com/blog/mistral-api-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Mistral API guide&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Option 3: US providers with DPA + EU region
&lt;/h3&gt;

&lt;p&gt;Anthropic and OpenAI offer business plans with DPAs. Google Vertex AI lets you specify EU regions. This is compliant but requires paperwork.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about AI coding tools?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;GDPR-safe?&lt;/th&gt;
&lt;th&gt;Why&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://www.aimadetools.com/blog/aider-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; + local model&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Nothing leaves your machine&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://www.aimadetools.com/blog/continue-dev-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Continue.dev&lt;/a&gt; + Ollama&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;Local inference&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://www.aimadetools.com/blog/aider-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Aider&lt;/a&gt; + Mistral API&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;EU data residency&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://www.aimadetools.com/blog/how-to-use-claude-code/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; (Pro sub)&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;Consumer terms, US servers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;a href="https://www.aimadetools.com/blog/cursor-ai-one-week-review/?utm_source=devto" rel="noopener noreferrer"&gt;Cursor&lt;/a&gt;&lt;/td&gt;
&lt;td&gt;⚠️&lt;/td&gt;
&lt;td&gt;Business plan has DPA&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;a href="https://www.aimadetools.com/blog/github-copilot-vs-cursor-2026/?utm_source=devto" rel="noopener noreferrer"&gt;GitHub Copilot&lt;/a&gt; Business&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;DPA + no training on code&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Practical steps
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Audit your AI tools&lt;/strong&gt; — list every AI service your team uses&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Check for DPAs&lt;/strong&gt; — consumer subscriptions don't count&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Scrub test data&lt;/strong&gt; — remove real PII from test fixtures and seed data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consider &lt;a href="https://www.aimadetools.com/blog/best-ai-coding-agents-privacy-2026/?utm_source=devto" rel="noopener noreferrer"&gt;self-hosting&lt;/a&gt;&lt;/strong&gt; for sensitive codebases&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;a href="https://www.aimadetools.com/blog/what-is-mistral-ai/?utm_source=devto" rel="noopener noreferrer"&gt;Mistral&lt;/a&gt;&lt;/strong&gt; as your default EU-compliant provider&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Document everything&lt;/strong&gt; — GDPR requires you to demonstrate compliance&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Do AI tools comply with GDPR?
&lt;/h3&gt;

&lt;p&gt;It depends on the tool and plan. API-based services (OpenAI API, Anthropic API, Google Vertex AI) offer Data Processing Agreements and don't train on your data. Consumer subscriptions like ChatGPT Plus or Claude Pro use personal terms without DPAs and may not be GDPR-compliant for business use.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use ChatGPT for GDPR-regulated data?
&lt;/h3&gt;

&lt;p&gt;You can use the OpenAI API with a business agreement and DPA in place, but not the consumer ChatGPT Plus subscription. The API doesn't train on your data and offers EU data residency options, making it suitable for regulated workloads with proper legal agreements.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do I need a DPA for AI APIs?
&lt;/h3&gt;

&lt;p&gt;Yes, if you're sending any personal data to the API. Under GDPR, any third-party processing personal data on your behalf requires a Data Processing Agreement. Most major AI providers (OpenAI, Anthropic, Google) offer DPAs on their business and enterprise plans.&lt;/p&gt;

&lt;h3&gt;
  
  
  Is self-hosted AI GDPR compliant?
&lt;/h3&gt;

&lt;p&gt;Self-hosted AI eliminates data transfer concerns since nothing leaves your infrastructure. However, you still need to comply with other GDPR requirements like data minimization, purpose limitation, and the right to erasure for any personal data the model processes or stores.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;Related: &lt;a href="https://www.aimadetools.com/blog/ai-code-data-privacy/?utm_source=devto" rel="noopener noreferrer"&gt;Where Does Your Code Go?&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/self-hosted-ai-gdpr/?utm_source=devto" rel="noopener noreferrer"&gt;Self-Hosted AI for GDPR&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/eu-ai-act-developers/?utm_source=devto" rel="noopener noreferrer"&gt;EU AI Act for Developers&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/best-ai-coding-agents-privacy-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Best AI Coding Agents for Privacy&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/best-vpn-for-developers-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Best VPNs for Developers&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/uk-ai-regulation-after-brexit/?utm_source=devto" rel="noopener noreferrer"&gt;Uk Ai Regulation After Brexit&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/ccpa-ai-developers/?utm_source=devto" rel="noopener noreferrer"&gt;Ccpa Ai Developers&lt;/a&gt;&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-gdpr-developers-guide/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>gdpr</category>
      <category>privacy</category>
      <category>aitools</category>
      <category>europe</category>
    </item>
    <item>
      <title>AI Dev Weekly #12: Opus 4.8 Drops, Anthropic Hits $965B, Chinese AI Goes 99% Cheaper, Microsoft Builds Its Own Coding Model</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Fri, 29 May 2026 07:09:56 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/ai-dev-weekly-12-opus-48-drops-anthropic-hits-965b-chinese-ai-goes-99-cheaper-microsoft-5bd1</link>
      <guid>https://dev.to/ai_made_tools/ai-dev-weekly-12-opus-48-drops-anthropic-hits-965b-chinese-ai-goes-99-cheaper-microsoft-5bd1</guid>
      <description>&lt;p&gt;&lt;em&gt;AI Dev Weekly is a Thursday series where I cover the week's most important AI developer news, with my take as someone who actually uses these tools daily.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The theme this week is divergence. US labs are raising prices and valuations. Chinese labs are racing to zero. Developers are caught in the middle choosing between the absolute best (Opus 4.8 at $25/M output) and "good enough at 3% of the cost" (DeepSeek/MiMo at $0.87/M). Meanwhile Microsoft is hedging by building its own coding model to reduce OpenAI dependency. Let's break it all down.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Claude Opus 4.8: the new #1 coding model
&lt;/h2&gt;

&lt;p&gt;Anthropic released &lt;a href="https://www.aimadetools.com/blog/claude-opus-4-8-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Opus 4.8&lt;/a&gt; on May 28. Same price as 4.7 ($5/$25 per million tokens), better at everything.&lt;/p&gt;

&lt;p&gt;Key numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;69.2% SWE-bench Pro&lt;/strong&gt; — up from 64.3% (4.7) and miles ahead of GPT-5.5 (58.6%)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;74.2% Terminal-Bench 2.1&lt;/strong&gt; — +8.4 points over 4.7&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;88.6% SWE-bench Verified&lt;/strong&gt; — highest of any model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;4× fewer unflagged code flaws&lt;/strong&gt; — the honesty improvement is the real story&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;61.4 Artificial Analysis Index&lt;/strong&gt; — takes #1 from GPT-5.5&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The biggest new feature is &lt;a href="https://www.aimadetools.com/blog/claude-code-dynamic-workflows-guide/?utm_source=devto" rel="noopener noreferrer"&gt;dynamic workflows&lt;/a&gt; in Claude Code. Claude can now plan a large task, spawn hundreds of parallel subagents, verify results, and iterate until convergence. Jarred Sumner used it to port Bun from Zig to Rust — 750,000 lines, 11 days, 99.8% test pass rate.&lt;/p&gt;

&lt;p&gt;Other additions: effort control (low → max), fast mode at 3× cheaper ($10/$50 instead of $30/$150), and system messages mid-conversation in the API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; I run Claude as one of seven agents in the &lt;a href="https://dev.to/race/"&gt;$100 AI Startup Race&lt;/a&gt;. The &lt;a href="https://www.aimadetools.com/blog/claude-opus-4-8-vs-4-7/?utm_source=devto" rel="noopener noreferrer"&gt;4.8 vs 4.7 improvement&lt;/a&gt; is immediately noticeable — fewer hallucinated progress claims, better self-correction, more efficient tool calling. The dynamic workflows feature is genuinely new territory. No other tool can spawn hundreds of coordinated agents from a single prompt. For codebase-scale migrations and audits, this is a step change. The question is whether the $25/M output price is justified when &lt;a href="https://www.aimadetools.com/blog/claude-opus-4-8-vs-deepseek-v4-pro/?utm_source=devto" rel="noopener noreferrer"&gt;DeepSeek scores within 8 points for $0.87/M&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Anthropic raises $65B, surpasses OpenAI at $965B
&lt;/h2&gt;

&lt;p&gt;Alongside the Opus 4.8 launch, Anthropic closed a $65 billion Series H at a $965 billion post-money valuation. That puts them above OpenAI for the first time.&lt;/p&gt;

&lt;p&gt;The numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$965B valuation&lt;/strong&gt; (OpenAI was last valued at ~$900B)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$47B annualized revenue run rate&lt;/strong&gt; — tripled in 3 months&lt;/li&gt;
&lt;li&gt;Led by Altimeter Capital, Dragoneer, Greenoaks, Sequoia Capital&lt;/li&gt;
&lt;li&gt;Mythos-class models (higher intelligence than Opus) coming "in weeks"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The valuation flip is symbolic but the revenue growth is real. $47B run rate means Claude is generating serious enterprise revenue. The Mythos tease is interesting — they explicitly said it has "even higher intelligence than Opus" and is currently limited to cybersecurity work under Project Glasswing. If Mythos ships broadly in June, it could be another step change. For developers, the practical implication is: Anthropic has the resources to keep shipping fast. Expect monthly model updates to continue.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Chinese AI pricing war goes nuclear
&lt;/h2&gt;

&lt;p&gt;Two massive price cuts in one week made Chinese frontier models essentially free for cached workloads:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DeepSeek V4-Pro (May 22):&lt;/strong&gt; The 75% promotional discount is now &lt;a href="https://www.aimadetools.com/blog/deepseek-v4-pro-75-percent-discount-permanent/?utm_source=devto" rel="noopener noreferrer"&gt;permanent&lt;/a&gt;. Output locked at $0.87/M tokens. Input at $0.435/M. Cache hits at $0.003625/M. This is a model that scores 80.6% on SWE-bench Verified — within 8 points of Opus 4.8.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;MiMo V2.5 Pro (May 26):&lt;/strong&gt; Xiaomi &lt;a href="https://www.aimadetools.com/blog/mimo-v2-5-pro-price-cut-99-percent/?utm_source=devto" rel="noopener noreferrer"&gt;cut prices by up to 99%&lt;/a&gt;. Cached input dropped from $0.36/M to $0.0036/M. Standard pricing now matches DeepSeek exactly: $0.435/$0.87. Token Plans upgraded 5-51× (the $100 plan now gets 82 billion tokens).&lt;/p&gt;

&lt;p&gt;The technical explanation: both labs achieved architectural breakthroughs in KV cache efficiency. DeepSeek's interleaved attention reduces cache to 10% of standard size. MiMo's hierarchical SWA uses a 1:7 sparsity ratio. Both claim break-even at these prices.&lt;/p&gt;

&lt;p&gt;The result: &lt;a href="https://www.aimadetools.com/blog/chinese-ai-30x-cheaper-than-american-models/?utm_source=devto" rel="noopener noreferrer"&gt;Chinese AI models are now 30× cheaper than American equivalents&lt;/a&gt; on standard pricing, and 100×+ cheaper on cached workloads. For agent pipelines with stable system prompts, the effective cost is approaching zero.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; I tripled the Xiaomi agent's sessions in our race (from 2 to 6 per day) because the cost became negligible. At $0.0036/M cached tokens, running an autonomous agent 24/7 costs less than a cup of coffee per day. The quality gap is real but narrowing — DeepSeek V4-Pro at 80.6% SWE-bench vs Opus 4.8 at 88.6% is meaningful for hard tasks but irrelevant for 80% of routine coding. If you are spending more than $500/month on API calls and haven't tested Chinese models, you are leaving money on the table. We wrote a &lt;a href="https://www.aimadetools.com/blog/migrate-gpt-claude-to-deepseek-mimo/?utm_source=devto" rel="noopener noreferrer"&gt;full migration guide&lt;/a&gt; if you want to try.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Microsoft building its own coding model for Build 2026
&lt;/h2&gt;

&lt;p&gt;Reuters reported on May 28 that Microsoft will unveil a homegrown coding model at Build 2026 (June 2-3 in San Francisco). It is designed to boost GitHub Copilot and reduce dependency on OpenAI.&lt;/p&gt;

&lt;p&gt;Also coming at Build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Transcription model&lt;/li&gt;
&lt;li&gt;Reasoning model&lt;/li&gt;
&lt;li&gt;Speech model&lt;/li&gt;
&lt;li&gt;Image model&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is part of a broader strategic shift. Microsoft is building a self-sufficient AI stack alongside its OpenAI partnership, not instead of it. The competitive pressure from Claude Code (which has been eating Copilot's market share) is the likely catalyst.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; This is the most significant signal yet that the OpenAI-Microsoft relationship is evolving. Microsoft investing billions in OpenAI while simultaneously building competing models tells you everything about where the industry is heading: no one wants to be dependent on a single provider. For developers using Copilot, this could mean better performance (a model optimized specifically for code completion rather than general-purpose) or it could mean fragmentation (yet another model to evaluate). Watch Build next week for details.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. StepFun Step 3.7 Flash: 198B MoE at 400 tokens/sec
&lt;/h2&gt;

&lt;p&gt;A new player entered the cheap-and-fast model tier. StepFun released &lt;a href="https://www.aimadetools.com/blog/step-3-7-flash-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Step 3.7 Flash&lt;/a&gt; — a 198B parameter MoE model that activates only 11B parameters per token.&lt;/p&gt;

&lt;p&gt;The specs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;400 tokens/second&lt;/strong&gt; — 2× faster than Gemini 3.5 Flash&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;256K context window&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native multimodal&lt;/strong&gt; — text, images, video&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3 reasoning tiers&lt;/strong&gt; — Low/Medium/High per API call&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Advisor Mode&lt;/strong&gt; — achieves 97% of Opus 4.6 coding at $0.19/task&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Open-weight&lt;/strong&gt; — self-hostable on 128GB RAM&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;~$0.20/M input, ~$0.80/M output&lt;/strong&gt; on OpenRouter&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The unique feature is Advisor Mode: Step 3.7 Flash handles routine execution autonomously and only escalates to a stronger model when genuinely stuck. This automated routing achieves near-frontier quality at budget prices.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;My take:&lt;/strong&gt; The "Flash" model tier is getting crowded — &lt;a href="https://www.aimadetools.com/blog/step-3-7-flash-vs-gemini-3-5-flash/?utm_source=devto" rel="noopener noreferrer"&gt;Gemini 3.5 Flash&lt;/a&gt;, Step 3.7 Flash, DeepSeek V4 Flash. All under $1/M output, all fast enough for real-time use, all surprisingly capable. Step 3.7 Flash's video understanding and GUI interaction capabilities set it apart. The 400 t/s throughput is genuinely impressive for a model this capable. If you need speed + multimodal + cheap, this is worth testing.&lt;/p&gt;

&lt;h2&gt;
  
  
  6. Quick hits
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI reportedly dropped GPT-5.3 Codex&lt;/strong&gt; minutes after Anthropic's Opus 4.8 announcement. 25% faster than GPT-5.2, reportedly helped debug itself during development. Supercharges the Codex agentic coding tool launched earlier this month.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cohere acquired Aleph Alpha&lt;/strong&gt; — creating a $20B transatlantic sovereign AI company. Backed by Schwarz Group (Europe's largest retailer). Positions as the enterprise alternative to US/Chinese models for European companies with data sovereignty requirements.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Canada ruled OpenAI violated privacy laws&lt;/strong&gt; — regulatory pressure continues to mount on US AI labs internationally.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code shipped detailed usage analytics&lt;/strong&gt; — you can now see exactly how many tokens each session consumed, broken down by model and effort level.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What I'm watching next week
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Microsoft Build (June 2-3)&lt;/strong&gt; — The new coding model reveal. Will it compete with Claude Code or just improve Copilot's autocomplete?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Mythos timeline&lt;/strong&gt; — Anthropic said "coming weeks." If it ships in early June, it could leapfrog Opus 4.8 immediately.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI shutdown (June 18)&lt;/strong&gt; — Two weeks until the deadline. If you haven't migrated to &lt;a href="https://www.aimadetools.com/blog/antigravity-2-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Antigravity CLI&lt;/a&gt;, time is running out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Our race agents&lt;/strong&gt; — Claude is at 194 blog posts. Xiaomi just got tripled to 6 sessions/day. Gemini is back online after a 4-day auth outage. &lt;a href="https://dev.to/race/"&gt;Follow along →&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  &lt;em&gt;AI Dev Weekly publishes every Thursday. &lt;a href="https://dev.to/race/season1/digest"&gt;Subscribe&lt;/a&gt; for weekly race updates and AI developer news.&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/ai-dev-weekly-012-opus-4-8-anthropic-965b-chinese-ai-99-cheaper/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aidevweekly</category>
      <category>claude</category>
      <category>anthropic</category>
      <category>deepseek</category>
    </item>
    <item>
      <title>Aider Complete Guide: Setup, Best Models, and Tips (2026)</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Wed, 27 May 2026 12:24:08 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/aider-complete-guide-setup-best-models-and-tips-2026-17pe</link>
      <guid>https://dev.to/ai_made_tools/aider-complete-guide-setup-best-models-and-tips-2026-17pe</guid>
      <description>&lt;p&gt;Aider is a free, open-source AI pair programming tool that runs in your terminal. You chat with it in plain English, and it directly edits files in your local Git repository. Every change is automatically committed with a descriptive message.&lt;/p&gt;

&lt;p&gt;It's the most model-flexible AI coding tool available — it works with Claude, GPT, Gemini, DeepSeek, &lt;a href="https://www.aimadetools.com/blog/what-is-qwen-3-5/?utm_source=devto" rel="noopener noreferrer"&gt;Qwen&lt;/a&gt;, local models via &lt;a href="https://www.aimadetools.com/blog/ollama-complete-guide-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt;, or any OpenAI-compatible API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Aider?
&lt;/h2&gt;

&lt;p&gt;Most AI coding tools lock you into one model. &lt;a href="https://www.aimadetools.com/blog/how-to-use-claude-code/?utm_source=devto" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; only works with Claude. &lt;a href="https://www.aimadetools.com/blog/claude-code-vs-codex-cli-vs-gemini-cli/?utm_source=devto" rel="noopener noreferrer"&gt;Codex CLI&lt;/a&gt; only works with GPT. Aider works with everything.&lt;/p&gt;

&lt;p&gt;Key strengths:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Git-native&lt;/strong&gt; — every AI edit is a clean Git commit you can review, revert, or cherry-pick&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100+ model support&lt;/strong&gt; — any model, any provider, including local&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repo map&lt;/strong&gt; — understands your entire codebase structure, not just open files&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-file editing&lt;/strong&gt; — coordinates changes across multiple files in one operation&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;100+ languages&lt;/strong&gt; — not just Python and JavaScript&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;aider-chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or with pipx (recommended for isolation):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pipx &lt;span class="nb"&gt;install &lt;/span&gt;aider-chat
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;your-project
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;ANTHROPIC_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key"&lt;/span&gt;  &lt;span class="c"&gt;# or OPENAI_API_KEY, etc.&lt;/span&gt;
aider
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aider starts in your project directory, scans the Git repo, and builds a map of your codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  Choosing a model
&lt;/h2&gt;

&lt;p&gt;Aider supports tiered model usage — a powerful model for complex edits and a cheaper one for simple tasks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Use Claude Opus for main editing, Sonnet for quick tasks&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; claude-3-opus-20240229 &lt;span class="nt"&gt;--weak-model&lt;/span&gt; claude-3-sonnet-20240229

&lt;span class="c"&gt;# Use GPT-5.4 &lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; gpt-5.4

&lt;span class="c"&gt;# Use DeepSeek (cheapest frontier option)&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; deepseek/deepseek-chat

&lt;span class="c"&gt;# Use a local model via Ollama&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; ollama/qwen3.5:27b

&lt;span class="c"&gt;# Use any model via OpenRouter&lt;/span&gt;
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;OPENROUTER_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"your-key"&lt;/span&gt;
aider &lt;span class="nt"&gt;--model&lt;/span&gt; openrouter/anthropic/claude-opus-4.6
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;See our &lt;a href="https://www.aimadetools.com/blog/openrouter-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;OpenRouter guide&lt;/a&gt; for the full model catalog.&lt;/p&gt;

&lt;h2&gt;
  
  
  Core commands
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Adding files to context
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/add src/auth.ts src/middleware.ts
/add src/routes/*.ts          # glob patterns work
/read docs/API.md             # read-only context (not edited)
/drop src/auth.ts             # remove from context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Editing modes
&lt;/h3&gt;

&lt;p&gt;Aider has multiple edit formats optimized for different models:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;whole&lt;/strong&gt; — replaces entire files (best for smaller files)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;diff&lt;/strong&gt; — sends unified diffs (best for large files)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;udiff&lt;/strong&gt; — universal diff format&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;editor&lt;/strong&gt; — opens your editor for manual review
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--edit-format&lt;/span&gt; diff  &lt;span class="c"&gt;# Use diff mode&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Git integration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/commit           # Commit current changes
/undo             # Undo last AI edit (git revert)
/diff             # Show uncommitted changes
/git log --oneline -10  # Run any git command
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Every AI edit creates an automatic commit like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;aider: Refactor auth middleware to use JWT validation
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Running commands
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/run npm test              # Run tests
/run npm run lint          # Run linter
/test npm test             # Run tests and auto-fix failures
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;/test&lt;/code&gt; command is powerful — it runs your test suite, and if tests fail, Aider automatically tries to fix the code and re-runs until tests pass.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web content
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/web https://docs.stripe.com/api/charges  # Fetch docs for context
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Advanced features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Repo map
&lt;/h3&gt;

&lt;p&gt;Aider builds a map of your entire repository — function signatures, class definitions, imports, and dependencies. This means it understands how files relate to each other, even files not explicitly added to the chat.&lt;/p&gt;

&lt;p&gt;The repo map uses tree-sitter for parsing, supporting 100+ languages.&lt;/p&gt;

&lt;h3&gt;
  
  
  Linting integration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--lint-cmd&lt;/span&gt; &lt;span class="s2"&gt;"eslint --fix"&lt;/span&gt;  &lt;span class="c"&gt;# Auto-lint after every edit&lt;/span&gt;
aider &lt;span class="nt"&gt;--auto-lint&lt;/span&gt;                &lt;span class="c"&gt;# Use detected linter&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Aider runs your linter after every edit and automatically fixes any issues it introduced.&lt;/p&gt;

&lt;h3&gt;
  
  
  Voice input
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;aider &lt;span class="nt"&gt;--voice&lt;/span&gt;  &lt;span class="c"&gt;# Use speech-to-text for input&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Describe changes verbally — useful for complex explanations that are easier to speak than type.&lt;/p&gt;

&lt;h3&gt;
  
  
  Scripting mode
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Add error handling to all API routes"&lt;/span&gt; | aider &lt;span class="nt"&gt;--yes&lt;/span&gt; &lt;span class="nt"&gt;--no-git&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run Aider non-interactively for CI/CD pipelines or batch operations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Configuration
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;.aider.conf.yml&lt;/code&gt; in your project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-3-opus-20240229&lt;/span&gt;
&lt;span class="na"&gt;weak-model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-3-sonnet-20240229&lt;/span&gt;
&lt;span class="na"&gt;edit-format&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;diff&lt;/span&gt;
&lt;span class="na"&gt;auto-lint&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="na"&gt;lint-cmd&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;eslint --fix&lt;/span&gt;
&lt;span class="na"&gt;auto-commits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or use environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AIDER_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-3-opus-20240229
&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;AIDER_WEAK_MODEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;claude-3-sonnet-20240229
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Cost optimization
&lt;/h2&gt;

&lt;p&gt;Aider can be expensive with frontier models. Tips to reduce costs:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;code&gt;--read&lt;/code&gt; for context files&lt;/strong&gt; — read-only files use fewer tokens than editable ones&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use a weak model for simple tasks&lt;/strong&gt; — &lt;code&gt;--weak-model&lt;/code&gt; routes simple edits to a cheaper model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use &lt;a href="https://www.aimadetools.com/blog/how-to-use-aider-with-deepseek/?utm_source=devto" rel="noopener noreferrer"&gt;DeepSeek&lt;/a&gt; or &lt;a href="https://www.aimadetools.com/blog/how-to-use-qwen-3-5-api/?utm_source=devto" rel="noopener noreferrer"&gt;Qwen&lt;/a&gt;&lt;/strong&gt; — 10-30x cheaper than Claude/GPT for comparable quality&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Truncate large files&lt;/strong&gt; — Aider sends entire files as context; keep them focused&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use local models&lt;/strong&gt; for routine work — &lt;a href="https://www.aimadetools.com/blog/ollama-complete-guide-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; with Qwen 3.5 27B is free&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Aider vs Claude Code vs Codex CLI
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Aider&lt;/th&gt;
&lt;th&gt;Claude Code&lt;/th&gt;
&lt;th&gt;Codex CLI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model support&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Any model&lt;/td&gt;
&lt;td&gt;Claude only&lt;/td&gt;
&lt;td&gt;GPT only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Git integration&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deep (auto-commits)&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;td&gt;Basic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Repo map&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Full codebase&lt;/td&gt;
&lt;td&gt;✅ Full codebase&lt;/td&gt;
&lt;td&gt;✅ Full codebase&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Price&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Free (BYOK)&lt;/td&gt;
&lt;td&gt;$20/mo sub&lt;/td&gt;
&lt;td&gt;$20/mo sub&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edit quality&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Model-dependent&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;td&gt;Very good&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Voice input&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Linting&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Auto-lint&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Open source&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;✅ Apache 2.0&lt;/td&gt;
&lt;td&gt;❌&lt;/td&gt;
&lt;td&gt;✅&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Choose Aider when:&lt;/strong&gt; You want model flexibility, deep Git integration, or need to use cheap/local models.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Claude Code when:&lt;/strong&gt; You want the best code quality and don't mind being locked to Claude.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Choose Codex CLI when:&lt;/strong&gt; You're in the OpenAI ecosystem and want fast autonomous coding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom line
&lt;/h2&gt;

&lt;p&gt;Aider is the Swiss Army knife of AI coding tools. It's not the best at any single thing — Claude Code writes better code, Codex CLI is faster, &lt;a href="https://www.aimadetools.com/blog/kimi-cli-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;Kimi CLI&lt;/a&gt; has Agent Swarm. But Aider is the most flexible, the most transparent (Git-native), and the only tool that lets you freely switch between any model from any provider.&lt;/p&gt;

&lt;p&gt;For developers who want control over their AI coding workflow without vendor lock-in, Aider is the best choice in 2026.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is Aider free?
&lt;/h3&gt;

&lt;p&gt;Yes. Aider is 100% free and open-source (Apache 2.0). You bring your own API key for whichever AI model you want to use — there's no subscription or usage fee for Aider itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Which AI models work with Aider?
&lt;/h3&gt;

&lt;p&gt;Aider works with 100+ models including Claude, GPT, Gemini, DeepSeek, Qwen, Mistral, and any OpenAI-compatible API. See our &lt;a href="https://www.aimadetools.com/blog/how-to-use-aider-with-deepseek/?utm_source=devto" rel="noopener noreferrer"&gt;guide to using Aider with DeepSeek&lt;/a&gt; for a budget-friendly setup.&lt;/p&gt;

&lt;h3&gt;
  
  
  How does Aider compare to Claude Code?
&lt;/h3&gt;

&lt;p&gt;Aider supports any model while Claude Code is locked to Claude. Aider has deeper Git integration (auto-commits every edit) and auto-linting. Claude Code produces slightly better code quality since it's optimized for one model. See our full &lt;a href="https://www.aimadetools.com/blog/aider-vs-claude-code-vs-codex/?utm_source=devto" rel="noopener noreferrer"&gt;Aider vs Claude Code vs Codex comparison&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can I use Aider with local models?
&lt;/h3&gt;

&lt;p&gt;Yes. Aider works with local models via &lt;a href="https://www.aimadetools.com/blog/ollama-complete-guide-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Ollama&lt;/a&gt; or any OpenAI-compatible local server. Run &lt;code&gt;aider --model ollama/qwen3.5:27b&lt;/code&gt; to use a local model at zero cost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Does Aider work with Git?
&lt;/h3&gt;

&lt;p&gt;Git integration is Aider's core feature. Every AI edit is automatically committed with a descriptive message. You can review, revert, or cherry-pick any change. Aider also builds a repo map from your Git repository to understand your full codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;em&gt;Related: &lt;a href="https://www.aimadetools.com/blog/aider-vs-claude-code-vs-codex/?utm_source=devto" rel="noopener noreferrer"&gt;Aider vs Claude Code vs Codex CLI&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/best-ai-coding-tools-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Best AI Coding Tools 2026&lt;/a&gt; · &lt;a href="https://www.aimadetools.com/blog/ollama-complete-guide-2026/?utm_source=devto" rel="noopener noreferrer"&gt;Ollama Complete Guide&lt;/a&gt;&lt;/em&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;Originally published at &lt;a href="https://www.aimadetools.com/blog/aider-complete-guide/?utm_source=devto" rel="noopener noreferrer"&gt;https://www.aimadetools.com&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>aider</category>
      <category>coding</category>
      <category>aitools</category>
      <category>terminal</category>
    </item>
    <item>
      <title>The Model Worked. The Cron Job Almost Killed My AI Agent.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 21 May 2026 12:05:00 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/the-model-worked-the-cron-job-almost-killed-my-ai-agent-108e</link>
      <guid>https://dev.to/ai_made_tools/the-model-worked-the-cron-job-almost-killed-my-ai-agent-108e</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Gemini 3.5 Flash was not the hard part.&lt;/p&gt;

&lt;p&gt;It fixed bugs the old setup had failed to solve for weeks. The model quality was transformational (see &lt;a href="https://dev.to/ai_made_tools/i-upgraded-a-production-ai-agent-to-gemini-35-flash-12-hours-after-google-io-heres-what-i-found-254i"&gt;Part 1&lt;/a&gt; and &lt;a href="https://dev.to/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc"&gt;Part 2&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;The hard part was making it survive cron.&lt;/p&gt;

&lt;p&gt;In the first 48 hours, my autonomous agent nearly killed the VPS with an infinite retry loop, failed auth outside SSH, and burned most of its quota re-reading the same files every session.&lt;/p&gt;

&lt;p&gt;All three bugs took hours to diagnose. All three fixes were tiny.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;I run &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;The $100 AI Startup Race&lt;/a&gt;. 7 AI agents building startups autonomously on a VPS via cron jobs. After upgrading the Gemini agent to Antigravity CLI (&lt;code&gt;agy&lt;/code&gt;) with Gemini 3.5 Flash, the model worked great. But making it run &lt;em&gt;unattended&lt;/em&gt; on a headless server? That's where the real engineering happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 1: The Infinite Retry Loop
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;p&gt;I SSH into the VPS and find it unresponsive. Load average through the roof. The cron log shows 300+ entries from the last 2 minutes, all empty.&lt;/p&gt;

&lt;h3&gt;
  
  
  What happened
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expected:&lt;/strong&gt; quota exhaustion returns a non-zero exit code.&lt;br&gt;
&lt;strong&gt;Actual:&lt;/strong&gt; exit code 0 + empty output.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;agy&lt;/code&gt; hits its quota limit, it doesn't error out. It returns successfully with an empty response. My orchestrator script interprets "exit code 0" as "the model finished its thought, let's give it another task." So it immediately fires another prompt. Which returns empty. Which triggers another. 300 times in 2 minutes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;=== Run 1 finished at 07:30:03, exit=0 ===
=== Run 2 finished at 07:30:06, exit=0 ===
=== Run 3 finished at 07:30:08, exit=0 ===
=== Run 4 finished at 07:30:10, exit=0 ===
... (296 more)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each "run" takes 2-3 seconds. No output, no error, no indication that quota is exhausted. Just silence. A human would have seen the empty response and stopped. Cron saw exit code 0 and kept going.&lt;/p&gt;

&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Circuit breaker: 3 consecutive empty responses = quota exhausted&lt;/span&gt;
&lt;span class="nv"&gt;EMPTY_COUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="nv"&gt;MAX_EMPTY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3

&lt;span class="c"&gt;# After each run, check output length&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="k"&gt;${#&lt;/span&gt;&lt;span class="nv"&gt;OUTPUT&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt; &lt;span class="nt"&gt;-lt&lt;/span&gt; 20 &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then&lt;/span&gt;
    &lt;span class="o"&gt;((&lt;/span&gt;EMPTY_COUNT++&lt;span class="o"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="nv"&gt;$EMPTY_COUNT&lt;/span&gt; &lt;span class="nt"&gt;-ge&lt;/span&gt; &lt;span class="nv"&gt;$MAX_EMPTY&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"=== 3 consecutive empty responses (quota exhausted?) — stopping session ==="&lt;/span&gt;
        &lt;span class="nb"&gt;break
    &lt;/span&gt;&lt;span class="k"&gt;fi
else
    &lt;/span&gt;&lt;span class="nv"&gt;EMPTY_COUNT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three empty responses in a row → stop the session. The orchestrator now exits cleanly instead of hammering a dead endpoint.&lt;/p&gt;

&lt;h3&gt;
  
  
  The lesson
&lt;/h3&gt;

&lt;p&gt;Every autonomous system needs a circuit breaker. AI tools are designed for interactive use. They assume a human will notice when something's wrong. When there's no human, you need explicit failure detection.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bug 2: The Auth That Only Works in SSH
&lt;/h2&gt;

&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expected:&lt;/strong&gt; same user + same token file = works everywhere.&lt;br&gt;
&lt;strong&gt;Actual:&lt;/strong&gt; auth backend changes based on an environment variable.&lt;/p&gt;

&lt;p&gt;I test &lt;code&gt;agy&lt;/code&gt; via SSH. Works perfectly. I set up the cron job with the exact same command, same user, same working directory. Fails with "Authentication required."&lt;/p&gt;

&lt;p&gt;The token file exists. It has a valid refresh token. The binary can read it (verified with strace). But it won't use it.&lt;/p&gt;
&lt;h3&gt;
  
  
  The investigation
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Works:&lt;/span&gt;
ssh race@your-vps &lt;span class="s2"&gt;"cd /home/race/race-gemini &amp;amp;&amp;amp; echo 'test' | agy --print"&lt;/span&gt;
&lt;span class="c"&gt;# → Responds normally&lt;/span&gt;

&lt;span class="c"&gt;# Fails (simulating cron):&lt;/span&gt;
ssh race@your-vps &lt;span class="s1"&gt;'env -i HOME=/home/race PATH=/usr/bin:/home/race/.local/bin bash -c "
  cd /home/race/race-gemini
  echo test | agy --print
"'&lt;/span&gt;
&lt;span class="c"&gt;# → "Authentication required"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;After diffing the environment between SSH and cron, I found it: &lt;code&gt;agy&lt;/code&gt; checks for the &lt;code&gt;SSH_CONNECTION&lt;/code&gt; environment variable. If it's set, it uses file-based auth (reads the token from &lt;code&gt;~/.gemini/antigravity-cli/antigravity-oauth-token&lt;/code&gt;). If it's not set, it tries the system keyring, which doesn't exist in a non-interactive cron session.&lt;/p&gt;
&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;SSH_CONNECTION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"127.0.0.1 0 127.0.0.1 22"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;One fake environment variable. I don't love this fix. But until the CLI exposes an explicit headless auth mode, this makes cron behave exactly like my tested SSH session. If Antigravity adds a &lt;code&gt;--headless-auth&lt;/code&gt; or &lt;code&gt;--auth-file&lt;/code&gt; flag, I'd replace this immediately.&lt;/p&gt;
&lt;h3&gt;
  
  
  The lesson
&lt;/h3&gt;

&lt;p&gt;AI CLI tools are built for developers at their desk. Headless/cron environments are second-class citizens. If your tool has multiple auth backends, test which one activates in a bare &lt;code&gt;env -i&lt;/code&gt; environment. That's what cron sees.&lt;/p&gt;
&lt;h2&gt;
  
  
  Bug 3: The Context Tax
&lt;/h2&gt;
&lt;h3&gt;
  
  
  The symptom
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Expected:&lt;/strong&gt; each session starts productive work quickly.&lt;br&gt;
&lt;strong&gt;Actual:&lt;/strong&gt; context reload eats 60% of the session.&lt;/p&gt;

&lt;p&gt;Session 1 runs for 8 minutes before hitting quota. Of those 8 minutes, 5 are spent reading the codebase: &lt;code&gt;IDENTITY.md&lt;/code&gt;, &lt;code&gt;PROGRESS.md&lt;/code&gt;, &lt;code&gt;BACKLOG.md&lt;/code&gt;, scanning the project structure, understanding what happened last time. Only 3 minutes of actual coding.&lt;/p&gt;

&lt;p&gt;With quota this tight, losing 60% of every session to context loading is a dealbreaker.&lt;/p&gt;
&lt;h3&gt;
  
  
  The discovery
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;agy&lt;/code&gt; has a &lt;code&gt;--continue&lt;/code&gt; flag that resumes the previous conversation. The model retains all context from the last session: files it read, decisions it made, what it planned to do next.&lt;/p&gt;
&lt;h3&gt;
  
  
  The fix
&lt;/h3&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# First session of the day: fresh start, full context load&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$SESSION_TYPE&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s2"&gt;"first"&lt;/span&gt; &lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | agy &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--print-timeout&lt;/span&gt; 25m &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="c"&gt;# All subsequent sessions: resume previous conversation&lt;/span&gt;
    &lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | agy &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--print-timeout&lt;/span&gt; 25m &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt; &lt;span class="nt"&gt;--continue&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;h3&gt;
  
  
  The result
&lt;/h3&gt;

&lt;p&gt;These measurements were taken before Google's 3x rate limit boost (see &lt;a href="https://dev.to/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc"&gt;Part 2&lt;/a&gt;). With the new limits, the gains from &lt;code&gt;--continue&lt;/code&gt; still matter, but the pressure is less extreme.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Fresh session&lt;/th&gt;
&lt;th&gt;--continue session&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Context loading&lt;/td&gt;
&lt;td&gt;~5 minutes&lt;/td&gt;
&lt;td&gt;~0 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Productive coding&lt;/td&gt;
&lt;td&gt;~3 minutes&lt;/td&gt;
&lt;td&gt;~15 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effective runtime&lt;/td&gt;
&lt;td&gt;3 min&lt;/td&gt;
&lt;td&gt;15 min&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Almost 5x more productive time per session by skipping the context reload. The model remembers what it fixed, what's next, what files it already read.&lt;/p&gt;
&lt;h3&gt;
  
  
  The lesson
&lt;/h3&gt;

&lt;p&gt;Context is expensive, both in tokens and in quota. If your AI tool supports conversation persistence, use it.&lt;/p&gt;

&lt;p&gt;I don't use &lt;code&gt;--continue&lt;/code&gt; forever. One fresh session per day as a reset point (prevents stale assumptions from accumulating), then all subsequent sessions within that day resume where the last one left off.&lt;/p&gt;
&lt;h2&gt;
  
  
  What's Missing: The Infrastructure Layer
&lt;/h2&gt;

&lt;p&gt;These three bugs share a pattern: &lt;strong&gt;autonomous AI agents need infrastructure that doesn't exist yet.&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No standard circuit breaker for quota exhaustion&lt;/li&gt;
&lt;li&gt;No headless-first auth flow&lt;/li&gt;
&lt;li&gt;No cron-aware session lifecycle (when to fresh-start vs continue)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Web apps have process managers. Queues have retry policies. APIs expose rate-limit headers. Background jobs have dead-letter queues. Autonomous AI agents have bash scripts.&lt;/p&gt;

&lt;p&gt;Every team running AI agents on cron is building their own orchestrator from scratch. The same patterns (retry limits, auth persistence, context reuse, graceful shutdown, cost tracking) get reimplemented by every team independently.&lt;/p&gt;

&lt;p&gt;We're in the "build your own orchestrator" era. The models are ready for autonomous work. The infrastructure around them isn't.&lt;/p&gt;
&lt;h2&gt;
  
  
  The Orchestrator Pattern
&lt;/h2&gt;

&lt;p&gt;Here's the minimal structure that works for me after a week of iteration:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Session start
├── Check quota (circuit breaker armed)
├── Load context (fresh or --continue)
├── Run loop (max N iterations)
│   ├── Send prompt
│   ├── Check output length (empty = increment counter)
│   ├── If 3 empty → break (quota exhausted)
│   ├── If output → commit changes, reset counter
│   └── Check elapsed time → graceful shutdown at limit
├── Push commits
└── Log session stats (duration, files changed, runs)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It's ~50 lines of bash. It handles the three failure modes above. It's not elegant, but it keeps an autonomous agent running unattended across scheduled sessions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;If you're running Antigravity CLI (or any AI coding tool) in autonomous/headless mode:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Add a circuit breaker.&lt;/strong&gt; Empty responses are silent failures, not completions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Test auth under cron's environment.&lt;/strong&gt; In my case, faking &lt;code&gt;SSH_CONNECTION&lt;/code&gt; forced file-based auth.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Use --continue between sessions.&lt;/strong&gt; Context loading eats your quota alive.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Set --print-timeout higher than default.&lt;/strong&gt; Complex agentic tasks need more than 5 minutes to think.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  My Cron-Safe Agent Checklist
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;[ ] Max runtime per session&lt;/li&gt;
&lt;li&gt;[ ] Max loop count per session&lt;/li&gt;
&lt;li&gt;[ ] Empty-output circuit breaker&lt;/li&gt;
&lt;li&gt;[ ] Non-zero exit handling&lt;/li&gt;
&lt;li&gt;[ ] Auth tested with &lt;code&gt;env -i&lt;/code&gt; (simulating cron)&lt;/li&gt;
&lt;li&gt;[ ] Fresh/continue session strategy&lt;/li&gt;
&lt;li&gt;[ ] Commit and push after each meaningful change&lt;/li&gt;
&lt;li&gt;[ ] Quota / empty-response events logged separately&lt;/li&gt;
&lt;li&gt;[ ] Recovery path after quota exhaustion&lt;/li&gt;
&lt;li&gt;[ ] Logs include duration, output length, files changed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;AI agents don't just need better models. They need boring production infrastructure.&lt;/p&gt;

&lt;p&gt;Gemini 3.5 Flash made the agent smart enough to work.&lt;/p&gt;

&lt;p&gt;Bash made it stable enough to survive.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
      <category>antigravity</category>
      <category>devops</category>
    </item>
    <item>
      <title>My AI Agent Hit Google's Quota Wall in 8 Minutes. 36 Hours Later, Google Tripled the Limits.</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Thu, 21 May 2026 08:32:51 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc</link>
      <guid>https://dev.to/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;My Gemini agent spent four weeks in last place.&lt;/p&gt;

&lt;p&gt;1,259 commits. Broken imports across 32 files. Help requests about database tables it could have created itself. Endless bug loops.&lt;/p&gt;

&lt;p&gt;Then I upgraded it to Gemini 3.5 Flash.&lt;/p&gt;

&lt;p&gt;In 8 minutes, it diagnosed and fixed problems the old setup had failed to solve in weeks. Then it hit Google's quota wall.&lt;/p&gt;

&lt;p&gt;This is the story of what happened next.&lt;/p&gt;

&lt;h2&gt;
  
  
  Context
&lt;/h2&gt;

&lt;p&gt;This is Part 2 of my Gemini 3.5 Flash upgrade series. &lt;a href="https://dev.to/ai_made_tools/i-upgraded-a-production-ai-agent-to-gemini-35-flash-12-hours-after-google-io-heres-what-i-found-254i"&gt;Part 1 covers the initial upgrade and first results&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I'm running &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;The $100 AI Startup Race&lt;/a&gt;. 7 AI coding agents each get $100 and 12 weeks to autonomously build real startups. No human coding. The agents run on cron jobs, commit to GitHub, and deploy to Vercel.&lt;/p&gt;

&lt;p&gt;After upgrading the Gemini agent from a combo of 2.5 Pro (premium sessions) and 2.5 Flash (cheap sessions) to a single 3.5 Flash tier via Antigravity CLI on May 20, the model quality was incredible. But the quota economics were brutal.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Disappointment (May 20)
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Session 1:&lt;/strong&gt; The model fixed 32 broken API files in a single commit: imports, bcrypt to bcryptjs for Vercel serverless, Stripe instantiation. Root cause analysis that the old model couldn't do in 4 weeks. Then the 5h quota wall hit. &lt;strong&gt;8 minutes of productive work.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session 2:&lt;/strong&gt; With &lt;code&gt;--continue&lt;/code&gt; (skipping context reload), it built an email library, wrote tests, and fixed auth endpoints. &lt;strong&gt;15 minutes.&lt;/strong&gt; Then 5h quota again.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The math:&lt;/strong&gt; Two sessions consumed 40% of the weekly quota. Projected total: ~68 minutes per week on the $20/month Pro plan.&lt;/p&gt;

&lt;p&gt;For context, here's what the other agents in my race get for similar money (these are not official provider limits, they are the effective autonomous runtime I measured in my specific setup):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Agent&lt;/th&gt;
&lt;th&gt;Plan cost&lt;/th&gt;
&lt;th&gt;Weekly runtime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Claude&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;~7 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Codex/GPT&lt;/td&gt;
&lt;td&gt;$20/mo&lt;/td&gt;
&lt;td&gt;~21 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;DeepSeek&lt;/td&gt;
&lt;td&gt;$25/mo&lt;/td&gt;
&lt;td&gt;~21 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$20/mo&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~68 minutes&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Best model quality in the race. Worst total compute time. The old 2.5 Flash/Pro setup gave me ~28 hours/week, but those 28 hours produced nothing but bug loops. Now I had a model that actually worked, but could barely run.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Paradox
&lt;/h2&gt;

&lt;p&gt;Here's what made it painful: the quality improvement was real. Not incremental, but transformational.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Old setup (2.5 Pro + 2.5 Flash combo, 28 hours/week):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrote code with broken imports across 32 files&lt;/li&gt;
&lt;li&gt;Filed 3 help requests about "missing database tables"&lt;/li&gt;
&lt;li&gt;Never self-diagnosed the actual problem&lt;/li&gt;
&lt;li&gt;1,259 commits over 4 weeks, last place in the race&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;New model (3.5 Flash, 68 minutes/week):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Diagnosed the root cause in one pass (broken imports, not missing tables)&lt;/li&gt;
&lt;li&gt;Fixed all 32 files in a single commit&lt;/li&gt;
&lt;li&gt;Built a mock database layer, converted test infrastructure&lt;/li&gt;
&lt;li&gt;More useful output in 23 minutes than the old model produced in weeks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The bottleneck had shifted from intelligence to throughput. The model was finally good enough. The constraint was access.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Autonomous Agents Burn Quota Differently
&lt;/h2&gt;

&lt;p&gt;For human coding, a model is an assistant. You ask, read, think, edit, and come back later.&lt;/p&gt;

&lt;p&gt;For autonomous coding, the model is the runtime. It doesn't pause to think offline. Every file inspection, every failed test, every log check, every retry, every deployment verification consumes inference.&lt;/p&gt;

&lt;p&gt;A human developer's session looks like: ask, think, edit, ask again, wait, test manually.&lt;/p&gt;

&lt;p&gt;An autonomous agent's session looks like: plan, inspect, edit, test, fail, inspect logs, edit, retest, deploy, verify, repeat.&lt;/p&gt;

&lt;p&gt;That changes the economics completely. A $20/month subscription can feel generous for a human developer and unusable for an autonomous agent, at the same time, on the same plan.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Response (May 21, 05:25 UTC)
&lt;/h2&gt;

&lt;p&gt;Less than 36 hours after Google I/O. Within hours of the new quota system going live, users were reporting problems on Reddit and X: 4 prompts burning an entire 5-hour window, failed generations counting against quota, threads calling it a "bait and switch."&lt;/p&gt;

&lt;p&gt;Then, at 5:25 AM UTC on May 21:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Varun Mohan (@_mohansolo):&lt;/strong&gt; "An update: we're 3xing the rate limits for Gemini models across all paid tiers in Antigravity and resetting everyone's Gemini quota for the week. We understand some people hit their rate limits quickly and wanted to respond fast. Lots more to come and enjoy building!"&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Logan Kilpatrick (@OfficialLoganK):&lt;/strong&gt; "We just 3xed the rate limits across all tiers in Antigravity so that you can put 3.5 Flash through its paces even more, enjoy, and keep the feedback coming! :)"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And the key follow-up from Varun:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;"In case it's not clear, the 3x is forever."&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What I Actually Measured
&lt;/h2&gt;

&lt;p&gt;My agent's cron job fired at 05:00 UTC, likely straddling the quota boost that landed around 05:25 UTC. The results:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Session 3 (05:00 UTC, partially on old quota, partially on new):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;33 minutes of productive work&lt;/li&gt;
&lt;li&gt;9 runs, 588 files changed&lt;/li&gt;
&lt;li&gt;Renamed the entire domain (&lt;code&gt;localleads.pro&lt;/code&gt; to &lt;code&gt;localseogen.com&lt;/code&gt;) across all generated SEO pages, fixed Stripe redirect URLs, corrected ES Module syntax in API files&lt;/li&gt;
&lt;li&gt;Built a mock database layer (&lt;code&gt;db/mockDb.js&lt;/code&gt;) with full CRUD operations&lt;/li&gt;
&lt;li&gt;Created &lt;code&gt;lib/time-helpers.js&lt;/code&gt; utility library&lt;/li&gt;
&lt;li&gt;Wrote test suites for signup, login, get-credits, assign, generate-seo-pages&lt;/li&gt;
&lt;li&gt;Refactored 14 test files to use the new mock DB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Session 4 (07:07 UTC, fully on new quota):&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;29 minutes of productive work&lt;/li&gt;
&lt;li&gt;8 runs, 34 files changed&lt;/li&gt;
&lt;li&gt;Converted all test mocks from ESM (&lt;code&gt;.js&lt;/code&gt;) to CommonJS (&lt;code&gt;.cjs&lt;/code&gt;) for jest compatibility&lt;/li&gt;
&lt;li&gt;Fixed babel and jest configuration for the mixed ESM/CJS codebase&lt;/li&gt;
&lt;li&gt;Refactored &lt;code&gt;execute-outreach&lt;/code&gt;, &lt;code&gt;forgot-password-request&lt;/code&gt;, &lt;code&gt;generate-seo-pages&lt;/code&gt;, &lt;code&gt;user-referral-data&lt;/code&gt; tests&lt;/li&gt;
&lt;li&gt;Cleaned up &lt;code&gt;.env.test&lt;/code&gt; and email library&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Two back-to-back sessions of ~30 minutes each. Together they used the full 5-hour window, so roughly &lt;strong&gt;50 minutes of productive runtime per 5h refresh cycle&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The comparison:&lt;/strong&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Before boost (May 20)&lt;/th&gt;
&lt;th&gt;After boost (May 21)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Runtime per 5h window&lt;/td&gt;
&lt;td&gt;8 minutes&lt;/td&gt;
&lt;td&gt;~50 minutes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Effective improvement&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;~4-5x (announced 3x)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Productive output&lt;/td&gt;
&lt;td&gt;42 files fixed&lt;/td&gt;
&lt;td&gt;622 files changed, full test infra&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Weekly projection&lt;/td&gt;
&lt;td&gt;~68 minutes&lt;/td&gt;
&lt;td&gt;~5+ hours&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Google announced 3x. I measured closer to 4-5x for autonomous agentic coding in my setup. I wouldn't treat that as a universal number yet. The difference likely comes from my measurement catching a weekly quota reset, the rate limit increase, and a different prompt mix all at the same time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Insight
&lt;/h2&gt;

&lt;p&gt;The feedback loop between AI providers and power users is now measured in hours, not months.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Monday (May 19):&lt;/strong&gt; Google launches new compute-based quota system at I/O&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tuesday (May 20):&lt;/strong&gt; Users hit walls, Reddit fills with complaints, my agent gets 68 min/week&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Wednesday (May 21, 5:25 AM):&lt;/strong&gt; Google triples limits permanently and resets everyone's pool&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's a 36-hour turnaround from "this is broken for agents" to "fixed, permanently." For anyone building autonomous systems on top of subscription AI: the economics are volatile, but they're trending in your favor. The providers are watching usage patterns and adjusting in real-time.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Story: Quality × Time = Output
&lt;/h2&gt;

&lt;p&gt;Here's what I'd tell any developer considering Gemini 3.5 Flash for agentic workflows:&lt;/p&gt;

&lt;p&gt;The old model had unlimited time and did nothing useful with it. The new model has limited time and makes every minute count.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;2.5 Pro + Flash combo:&lt;/strong&gt; 28 hours/week → last place, stuck in bug loops&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3.5 Flash (pre-boost):&lt;/strong&gt; 68 min/week → more progress than 4 weeks of the old model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;3.5 Flash (post-boost):&lt;/strong&gt; 5+ hours/week → fully competitive, systematically building&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Quality matters more than quantity. I'll take 5 hours of a model that diagnoses root causes, fixes 32 files in one pass, and builds proper test infrastructure over 28 hours of a model that files help requests about problems it created.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The Gemini agent went from last place to having a real shot. The product (LocalSEOGen, a local SEO page generator for agencies) now has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Fixed API endpoints (32 files)&lt;/li&gt;
&lt;li&gt;Working auth flow&lt;/li&gt;
&lt;li&gt;Test infrastructure (mock DB, jest config, babel setup)&lt;/li&gt;
&lt;li&gt;Domain migration complete&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next sessions will focus on getting the Vercel deployment actually serving requests and pushing toward first revenue.&lt;/p&gt;

&lt;p&gt;But the bigger takeaway isn't about my race. It's this:&lt;/p&gt;

&lt;p&gt;The lesson from this week is not "Gemini needs more quota." The lesson is that autonomous agents turn model access into infrastructure. For human developers, Gemini 3.5 Flash on a $20 plan is a huge upgrade. For autonomous coding agents, it finally feels capable enough to matter. And that is exactly why the quota suddenly matters too.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow the race live at &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;aimadetools.com/race&lt;/a&gt;. 7 agents, $100 each, 12 weeks, real startups.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
      <category>ai</category>
      <category>antigravity</category>
    </item>
    <item>
      <title>I Upgraded a Production AI Agent to Gemini 3.5 Flash 12 Hours After Google I/O - Here's What I Found</title>
      <dc:creator>Joske Vermeulen</dc:creator>
      <pubDate>Wed, 20 May 2026 08:35:19 +0000</pubDate>
      <link>https://dev.to/ai_made_tools/i-upgraded-a-production-ai-agent-to-gemini-35-flash-12-hours-after-google-io-heres-what-i-found-254i</link>
      <guid>https://dev.to/ai_made_tools/i-upgraded-a-production-ai-agent-to-gemini-35-flash-12-hours-after-google-io-heres-what-i-found-254i</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-io-writing-2026-05-19"&gt;Google I/O Writing Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;I'm running an experiment called &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;The $100 AI Startup Race&lt;/a&gt;. 7 AI coding agents each get $100 and 12 weeks to autonomously build real startups. No human coding. The agents run on cron jobs, commit to GitHub, deploy to Vercel, and try to generate revenue.&lt;/p&gt;

&lt;p&gt;One of those agents is &lt;strong&gt;Gemini&lt;/strong&gt;. It's been running on Gemini CLI with a combo of 2.5 Pro (premium sessions) and 2.5 Flash (cheap sessions) since April 20. I tried 3.1 Pro during the test runs before the race, but it was unreliable - frequent "model not available" errors made it unusable for autonomous cron-based sessions. So I stuck with 2.5. After 4 weeks and 1,259 commits, Gemini is in &lt;strong&gt;last place&lt;/strong&gt;. Stuck in bug loops. Writing code that crashes, filing help requests about database tables it could create itself, and burning sessions on infrastructure it already has.&lt;/p&gt;

&lt;p&gt;Then Google I/O happened.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Google Dropped (May 19)
&lt;/h2&gt;

&lt;p&gt;Gemini 3.5 Flash. The headline numbers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;76.2% Terminal-Bench 2.1&lt;/strong&gt; (agentic coding) - beats 3.1 Pro's 70.3%&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;83.6% MCP Atlas&lt;/strong&gt; (multi-step workflows) - highest of any model&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;289 tokens/sec output&lt;/strong&gt; - 4x faster than Claude Opus 4.7 or GPT-5.5&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$1.50 / $9 per 1M tokens&lt;/strong&gt; - cheaper than 3.1 Pro&lt;/li&gt;
&lt;li&gt;A Flash-tier model outperforming the previous Pro model. That's never happened before.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And one more thing: &lt;strong&gt;Gemini CLI is being retired on June 18, 2026.&lt;/strong&gt; Replaced by Antigravity CLI (&lt;code&gt;agy&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;I had to upgrade. The model my agent was running on is two generations behind, and the tool it uses is dying in 4 weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing Antigravity CLI on a Headless VPS
&lt;/h2&gt;

&lt;p&gt;My race agents run on a VPS (Ubuntu, no GUI). Here's how the install went:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://antigravity.google/cli/install.sh | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Binary lands at &lt;code&gt;/root/.local/bin/agy&lt;/code&gt;. Add to PATH:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"/root/.local/bin:&lt;/span&gt;&lt;span class="nv"&gt;$PATH&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
agy &lt;span class="nt"&gt;--version&lt;/span&gt;  &lt;span class="c"&gt;# 1.0.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  The Auth Challenge
&lt;/h3&gt;

&lt;p&gt;First run needs OAuth. On a headless server, &lt;code&gt;agy&lt;/code&gt; detects the SSH session and prints an auth URL:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Authentication required. Please visit the URL to log in:
  https://accounts.google.com/o/oauth2/auth?...

Waiting for authentication (timeout 30s)...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have 30 seconds to open that URL in your browser and complete the Google login. Tight, but it works. Token gets stored and all future calls are authenticated.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #1: No Model Selection Flag
&lt;/h2&gt;

&lt;p&gt;Here's what surprised me. The old Gemini CLI had &lt;code&gt;-m gemini-2.5-pro&lt;/code&gt; to pick your model. Antigravity CLI has... nothing:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Usage of agy:
  --dangerously-skip-permissions  Auto-approve all tool permission requests
  --print                         Run a single prompt non-interactively
  --print-timeout                 Timeout for print mode (default 5m0s)
  --sandbox                       Run in a sandbox
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No &lt;code&gt;--model&lt;/code&gt;. No env var. No config file. I tried everything - &lt;code&gt;settings.json&lt;/code&gt;, &lt;code&gt;GEMINI.md&lt;/code&gt; directives, environment variables. Nothing works.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;agy&lt;/code&gt; auto-selects Gemini 3.5 Flash based on your subscription tier and quota. Server-side routing, no client control. For my use case (autonomous agent on cron), this actually simplifies things - one command, best available model.&lt;/p&gt;

&lt;h2&gt;
  
  
  Discovery #2: Unified Quota Across Models
&lt;/h2&gt;

&lt;p&gt;On my Mac (same Google account, AI Pro $20/month), I can see the quota dashboard:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Gemini 3.5 Flash (High)      - Refreshes in 4h 42m
Gemini 3.5 Flash (Medium)    - Refreshes in 4h 42m
Gemini 3.1 Pro (High)        - Refreshes in 4h 42m
Gemini 3.1 Pro (Low)         - Refreshes in 4h 42m
Claude Sonnet 4.6 (Thinking) - Refreshes in 4h 58m
Claude Opus 4.6 (Thinking)   - Refreshes in 4h 58m
GPT-OSS 120B (Medium)        - Refreshes in 4h 58m
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two things jumped out:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Gemini Flash and Pro share the same quota pool.&lt;/strong&gt; When I used 3.5 Flash, the 3.1 Pro timer dropped at the same time. They're not independent buckets - it's one "Gemini compute" pool that both models draw from.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-model access&lt;/strong&gt; - Antigravity bundles Claude, GPT-OSS, and Gemini models in one $20/month subscription. Google is positioning this as a model-agnostic platform, not just a Gemini wrapper.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The 5-hour refresh cycle and shared pool means you need to be strategic about which models you use and when.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Real Test
&lt;/h2&gt;

&lt;p&gt;I set up a minimal bug-fix test in the race-gemini directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s1"&gt;'Fix the bug in math.js. Run npm test to verify.'&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
  agy &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--print-timeout&lt;/span&gt; 3m &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;I have successfully fixed the bug in math.js and verified it using npm test.

&lt;span class="gu"&gt;### Summary of Changes&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; Identified the Target File
&lt;span class="p"&gt;2.&lt;/span&gt; Fixed the Bug: Updated the add function to use addition (+) instead of subtraction (-)
&lt;span class="p"&gt;3.&lt;/span&gt; Verified the Fix: npm test passes with output: PASS
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It found the file, read it, identified the bug, fixed it, ran the tests, and confirmed. Clean execution. No help requests filed. No infinite loops.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Migration
&lt;/h2&gt;

&lt;p&gt;Old setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Premium sessions (2x/day)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | gemini &lt;span class="nt"&gt;--yolo&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; gemini-2.5-pro

&lt;span class="c"&gt;# Cheap sessions (6x/day)  &lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | gemini &lt;span class="nt"&gt;--yolo&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; gemini-2.5-flash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;New setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# All sessions (8x/day, single tier)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$PROMPT&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | agy &lt;span class="nt"&gt;--print&lt;/span&gt; &lt;span class="nt"&gt;--print-timeout&lt;/span&gt; 10m &lt;span class="nt"&gt;--dangerously-skip-permissions&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I also merged the two backlogs (&lt;code&gt;BACKLOG-PREMIUM.md&lt;/code&gt; + &lt;code&gt;BACKLOG-CHEAP.md&lt;/code&gt;) into a single &lt;code&gt;BACKLOG.md&lt;/code&gt; - same approach as our Kimi agent, which uses one model and one task list. The agent decides what to prioritize each session.&lt;/p&gt;

&lt;p&gt;First task in the new backlog: "Merge old backlogs, audit the live site, identify the #1 blocker to first revenue."&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'm Watching For
&lt;/h2&gt;

&lt;p&gt;The Gemini agent's problem was never lack of capability - it's the most prolific committer in the race (1,259 commits). The problem was &lt;strong&gt;operational awareness&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing code with bugs it doesn't notice&lt;/li&gt;
&lt;li&gt;Filing help requests for things it could solve itself&lt;/li&gt;
&lt;li&gt;Building features without checking if they deploy correctly&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Gemini 3.5 Flash's MCP Atlas score (83.6% - highest of any model) suggests it's specifically designed for the kind of multi-step, tool-using, autonomous work the race requires. The 4x speed means more iterations per session. The better coding benchmarks mean fewer self-inflicted bugs.&lt;/p&gt;

&lt;p&gt;But benchmarks don't test "can you notice your site is returning 500 errors." That's what I'm watching for.&lt;/p&gt;

&lt;h2&gt;
  
  
  Verdict So Far
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What works:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Install is clean (one curl command)&lt;/li&gt;
&lt;li&gt;Auth on headless servers is first-class (prints URL, you complete in browser)&lt;/li&gt;
&lt;li&gt;3.5 Flash is genuinely fast - responses feel instant&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; works for autonomous use&lt;/li&gt;
&lt;li&gt;The model correctly identifies and fixes bugs in a single pass&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;What's missing:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No &lt;code&gt;--model&lt;/code&gt; flag (can't choose between 3.5 Flash, 3.1 Pro, Claude, etc.)&lt;/li&gt;
&lt;li&gt;No way to see remaining quota from CLI&lt;/li&gt;
&lt;li&gt;Shared quota across Flash and Pro models could be a problem at scale&lt;/li&gt;
&lt;li&gt;30-second auth timeout is tight for headless setups&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;The big question:&lt;/strong&gt; Will a better model fix an agent that's been stuck for 4 weeks? Or is the problem deeper than model quality?&lt;/p&gt;

&lt;p&gt;First results should come in within 48 hours. I'll update this post.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Follow the race live at &lt;a href="https://www.aimadetools.com/race/" rel="noopener noreferrer"&gt;aimadetools.com/race&lt;/a&gt; - 7 agents, $100 each, 12 weeks, real startups.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Update (May 21):&lt;/strong&gt; &lt;a href="https://dev.to/ai_made_tools/my-ai-agent-hit-googles-quota-wall-in-8-minutes-36-hours-later-google-tripled-the-limits-mkc"&gt;Quota wall + Tripled limits&lt;/a&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>googleiochallenge</category>
    </item>
  </channel>
</rss>
