<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: ww-w.ai</title>
    <description>The latest articles on DEV Community by ww-w.ai (@ww-w-ai).</description>
    <link>https://dev.to/ww-w-ai</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3911526%2Fc2aad5e2-072c-4e92-9596-8f49c5cf03a2.jpeg</url>
      <title>DEV Community: ww-w.ai</title>
      <link>https://dev.to/ww-w-ai</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ww-w-ai"/>
    <language>en</language>
    <item>
      <title>Google I/O Review (5/5) — What Disappointed and What Surprised</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Wed, 20 May 2026 20:11:16 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/google-io-review-55-what-disappointed-and-what-surprised-51h3</link>
      <guid>https://dev.to/ww-w-ai/google-io-review-55-what-disappointed-and-what-surprised-51h3</guid>
      <description>&lt;h1&gt;
  
  
  The Other Side of Google I/O 2026 — What Disappointed and What Surprised
&lt;/h1&gt;

&lt;p&gt;Parts 1 through 4 covered what Google got right: Flash economics, serverless agents, Gemini Omni, and the Gemini CLI shutdown. This final piece covers the other side — three things that disappointed and one thing that surprised everyone, including me.&lt;/p&gt;




&lt;h2&gt;
  
  
  Disappointment 1: Antigravity 2.0 — Great Vision, Brutal Execution
&lt;/h2&gt;

&lt;p&gt;The demo was spectacular. A demo showing an OS built by 93 agents over 12 hours, plus a playable Doom clone, all on stage. Agents as first-class deployment targets — versioning, rollback, observability baked in. The vision is sound.&lt;/p&gt;

&lt;p&gt;Then the forced update shipped.&lt;/p&gt;

&lt;p&gt;No migration path. No opt-out. Developers who were mid-sprint woke up to broken builds. Existing projects that worked on Monday stopped compiling on Tuesday.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://discuss.ai.google.dev/" rel="noopener noreferrer"&gt;Google AI Forum&lt;/a&gt; filled up fast. Thread titles include "Incompetence 10.0" and "The WORST IDE for development." Source control and the local terminal — tools most developers consider non-negotiable — were missing from the default installation. Users had to manually hunt through settings to re-enable them.&lt;/p&gt;

&lt;p&gt;This is the part that stings. Source control is not a power-user feature. A local terminal is not optional. When these are absent from a default install, it reads like the product was built by people who do not ship code themselves. Or worse — it reads like a team optimized for the demo reel and forgot the people who would use the product on Wednesday.&lt;/p&gt;

&lt;p&gt;When your platform update breaks the projects people built on your previous platform update, the 12-hour demo stops mattering.&lt;/p&gt;




&lt;h2&gt;
  
  
  Disappointment 2: Gemini Spark — A Gmail Bot, Not an Agent
&lt;/h2&gt;

&lt;p&gt;Google pitched Gemini Spark as a 24/7 autonomous agent. What shipped: Gmail, Google Docs, Sheets, Slides, Google Calendar. Plus a handful of consumer integrations — Canva, OpenTable, Instacart.&lt;/p&gt;

&lt;p&gt;No Slack. No GitHub. No Linear. No Jira. For any developer whose workflow lives outside Google Workspace, Spark has zero surface area.&lt;/p&gt;

&lt;p&gt;Here is the irony. The same company, on the same day, shipped the &lt;a href="https://cloud.google.com/products/managed-agents" rel="noopener noreferrer"&gt;Managed Agents API&lt;/a&gt; with dozens of external integrations — GitHub, Jira, Stripe, Linear, Notion, MongoDB — plus native MCP support and Claude model compatibility. The API side built a genuinely open platform. The consumer product stayed inside Google's walled garden.&lt;/p&gt;

&lt;p&gt;Same company. Same keynote. Opposite directions.&lt;/p&gt;

&lt;p&gt;If the Managed Agents API team and the Spark team compared notes, you would not guess they work in the same building.&lt;/p&gt;

&lt;p&gt;Now compare Spark to what already exists. Claude Code and OpenAI Codex both operate inside your actual development environment — your files, your terminal, your version control. Spark operates inside Google's apps. The conceptual overlap is real, but the practical utility gap is wide. Spark does not meet developers where they already work. It asks them to move into Google's house first.&lt;/p&gt;




&lt;h2&gt;
  
  
  Disappointment 3: Android 17 — The Invisible OS
&lt;/h2&gt;

&lt;p&gt;Android was barely mentioned during the two-hour keynote. At Google I/O — the event that used to be &lt;em&gt;about&lt;/em&gt; Android.&lt;/p&gt;

&lt;p&gt;The major Android announcements? They happened the week before, off-site, at a pre-recorded event called "The Android Show." The features that did reach the I/O stage — smarter notifications, on-device summarization — are incremental. No new design language. No major API surface. No surprise.&lt;/p&gt;

&lt;p&gt;The entire two-hour keynote was Gemini, AI, agents. Android felt like an extra in someone else's movie.&lt;/p&gt;

&lt;p&gt;The read is straightforward: Google is betting its future on AI services, not on the operating system that carries them. Whether that is the right call depends on your vantage point. But doing it at I/O — in front of the Android developer community, the people who build the apps that make Android worth using — felt like Google forgot to impress the people in the room.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Bright Spot Nobody Expected: Project Aura
&lt;/h2&gt;

&lt;p&gt;In the middle of a keynote dominated by software, Google showed hardware. And it might have been the best thing on stage.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://blog.google/products/android/android-xr-glasses/" rel="noopener noreferrer"&gt;Project Aura&lt;/a&gt; is a collaboration between Google and XREAL — Android XR glasses that split the compute off the face. The glasses weigh roughly 80-90g. The heavy lifting happens in a tethered puck running a Snapdragon XR2+ Gen 2 chip. The result: an OLED display with a 70-degree field of view, wide enough to show three apps side by side, in a frame light enough to wear for hours.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Aura Sits
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Meta Ray-Ban&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Project Aura&lt;/strong&gt;&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Apple Vision Pro&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Weight&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;52g&lt;/td&gt;
&lt;td&gt;80-90g (glasses only)&lt;/td&gt;
&lt;td&gt;750g&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Display&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;OLED, 70-degree FOV&lt;/td&gt;
&lt;td&gt;Micro-OLED, 90-degree FOV&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Apps side by side&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;Unlimited (spatial)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Compute&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;On-frame (Snapdragon AR1)&lt;/td&gt;
&lt;td&gt;Tethered puck (XR2+ Gen 2)&lt;/td&gt;
&lt;td&gt;On-device (M2 + R1)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Text readability&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;N/A&lt;/td&gt;
&lt;td&gt;Sharp (hands-on reports)&lt;/td&gt;
&lt;td&gt;Excellent&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Price&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$299&lt;/td&gt;
&lt;td&gt;TBA (expected well below $3,499)&lt;/td&gt;
&lt;td&gt;$3,499&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Primary use&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Camera + audio&lt;/td&gt;
&lt;td&gt;AR workspace + media&lt;/td&gt;
&lt;td&gt;Spatial computing&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Meta Ray-Ban is light but has no display — it is a camera and speaker on your face. Apple Vision Pro has an extraordinary display but weighs 750g and costs as much as a laptop. Aura lands in the gap: actual visual output, actual wearable weight, at a price that is expected to undercut Vision Pro by a significant margin.&lt;/p&gt;

&lt;p&gt;Hands-on reviewers at I/O reported text was sharp and pixels were not visible — a meaningful upgrade from the CES 2026 prototype shown earlier this year. But the demo that drew the most attention was not the specs. It was this: connect the glasses to a laptop via DisplayPort, and they become a virtual large monitor. No physical screen. No desk. Multiple reviewers called it the most practical demo of the entire event.&lt;/p&gt;

&lt;p&gt;Google also announced a Developer Catalyst Program giving developers early access to devkits. AR/XR glasses live or die on the app ecosystem. Hardware without software is a paperweight. Getting devkits into hands early is the right move.&lt;/p&gt;

&lt;p&gt;The broader signal is what makes Aura interesting beyond the product itself. AI has been a software story — models, APIs, tokens, agents. Aura is AI expanding into the physical interface layer. If a developer can carry a full workspace in a glasses case — no monitor, no desk, no office — the implications for remote work and mobile development go beyond what a new model release can offer.&lt;/p&gt;

&lt;p&gt;Global launch is targeted for 2026. No price yet. That is the one caveat worth watching — if the price lands above $1,000, the sweet-spot argument weakens considerably.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Series in Five Lines
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
Part 1: Flash 3.5 vs Pro — Pro Performance, Flash Branding — The "cheap model" costs 15x what Flash cost two generations ago.&lt;/li&gt;
&lt;li&gt;
Part 2: Managed Agents API — Serverless Agents Are Here — Deploy, scale, monitor. One CLI command.&lt;/li&gt;
&lt;li&gt;
Part 3: Gemini Omni and the Gemini CLI Shutdown — The best demo and the worst goodbye, on the same day.&lt;/li&gt;
&lt;li&gt;
Part 4: What the Numbers Actually Say — Pricing deep-dive and the open-source burial.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Part 5: What Disappointed and What Surprised&lt;/strong&gt; — You are here.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Four wins. Four misses. One hardware surprise. All announced in the same 48 hours.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This wraps the Google I/O 2026 series. If you sat through the keynote live or tested any of these products hands-on — what surprised you? Drop a comment or find me on &lt;a href="https://github.com/taekim34" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>google</category>
      <category>ai</category>
      <category>android</category>
      <category>xr</category>
    </item>
    <item>
      <title>Google I/O Review (4/5) — Google Quietly Killed Gemini CLI</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Wed, 20 May 2026 19:09:56 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/google-io-review-45-google-quietly-killed-gemini-cli-1lc5</link>
      <guid>https://dev.to/ww-w-ai/google-io-review-45-google-quietly-killed-gemini-cli-1lc5</guid>
      <description>&lt;h1&gt;
  
  
  Google Quietly Killed Gemini CLI While Everyone Was Celebrating I/O
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Part 4 of 5 in the &lt;a href="https://dev.to/taekim34/series/google-io-2026-review"&gt;Google I/O 2026 Review series&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;There is a term in media strategy called "bad news burial." You wait for a high-traffic news cycle — a holiday, a natural disaster, an election night — and drop the announcement you don't want people to read. The hope is that the noise drowns it out.&lt;/p&gt;

&lt;p&gt;On May 19, during Google I/O Day 1, while developers were still digesting Flash 3.5 benchmarks and the Managed Agents API, Google &lt;a href="https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/" rel="noopener noreferrer"&gt;published a blog post&lt;/a&gt; announcing that Gemini CLI will be discontinued on June 18, 2026.&lt;/p&gt;

&lt;p&gt;Not on stage. Not in the keynote. A blog post and a &lt;a href="https://github.com/google-gemini/gemini-cli/discussions/27274" rel="noopener noreferrer"&gt;GitHub Discussion&lt;/a&gt;, timed to land under the loudest news cycle of the developer year.&lt;/p&gt;

&lt;p&gt;Gemini CLI gave you 1,000 agent requests per day. Antigravity CLI gives you 20.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Timing Was a Choice
&lt;/h2&gt;

&lt;p&gt;Every major announcement at I/O got a keynote slot. Flash 3.5 beating Pro on benchmarks — keynote. Managed Agents API with 30+ integrations — keynote. Even Project Aura's 80-gram XR glasses got stage time.&lt;/p&gt;

&lt;p&gt;The discontinuation of an Apache 2.0 open-source CLI used by thousands of developers? Blog post. Buried in the I/O news flood.&lt;/p&gt;

&lt;p&gt;This matters because the announcement was not a minor change. It was a license model reversal, a free tier reduction of 98%, and a 30-day shutdown notice — all rolled into one. Any one of those would deserve its own conversation. Together, they constitute one of the sharpest reversals in developer tooling this year.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Changed — The Numbers
&lt;/h2&gt;

&lt;p&gt;Here is what developers are losing and what they are getting, based on what Google has disclosed:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Gemini CLI&lt;/th&gt;
&lt;th&gt;Antigravity CLI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;License&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Apache 2.0 (open source)&lt;/td&gt;
&lt;td&gt;Closed source ("possibility" of open-sourcing mentioned, no commitment)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free tier&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;1,000 requests/day + 60 RPM&lt;/td&gt;
&lt;td&gt;20 requests/day (free individual plan, $0/mo)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Reduction&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;98% fewer free requests&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agent Client Protocol (ACP)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;Community reports suggest not yet available&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Project memory&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;Community reports suggest not yet supported for markdown files&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Ctrl+C behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Normal exit&lt;/td&gt;
&lt;td&gt;Some users report unreliable exit (&lt;a href="https://github.com/google-gemini/gemini-cli/discussions/27274" rel="noopener noreferrer"&gt;Discussion #27274&lt;/a&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Documentation&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Community-maintained, extensive&lt;/td&gt;
&lt;td&gt;Sparse at launch&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Shutdown notice&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;—&lt;/td&gt;
&lt;td&gt;~30 days (May 19 → June 18)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Enterprise&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Supported&lt;/td&gt;
&lt;td&gt;Maintained&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Free tier numbers from &lt;a href="https://github.com/google-gemini/gemini-cli" rel="noopener noreferrer"&gt;Gemini CLI GitHub README&lt;/a&gt; and &lt;a href="https://antigravity.im/pricing" rel="noopener noreferrer"&gt;Antigravity pricing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The enterprise tier is maintained. Individual developers and small teams — the people who built their daily workflows around 1,000 free requests — get 20. That is not a "reduced free tier." That is a rounding error.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Open Source Question
&lt;/h2&gt;

&lt;p&gt;Gemini CLI was not just open source by license. It was open source by practice. Thousands of pull requests and issues from external contributors. Bug fixes, extensions, documentation improvements — the community was building the product alongside Google.&lt;/p&gt;

&lt;p&gt;That is the implicit contract of open source: you contribute labor under an open license, and the project stays open and accessible. Apache 2.0 does not legally require this. But the social contract does, and breaking it has consequences that Apache 2.0 cannot measure.&lt;/p&gt;

&lt;p&gt;The code those contributors wrote under Apache 2.0 is now feeding a closed-source product that the same contributors can barely use — 20 requests a day does not support any real development workflow.&lt;/p&gt;

&lt;p&gt;Legally, nothing was stolen. Socially, something was taken.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Pattern
&lt;/h2&gt;

&lt;p&gt;This is not a new playbook. The sequence:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Launch open source&lt;/strong&gt; with a generous free tier. Attract developers. Build community.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Accumulate contributions.&lt;/strong&gt; External developers improve the product at zero cost under an open license.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Transition to closed source.&lt;/strong&gt; The community-built product becomes proprietary. Free access drops to nominal levels.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Monetize through enterprise.&lt;/strong&gt; Meaningful access requires paid licenses.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The "Google Graveyard" meme resurfaced immediately in the &lt;a href="https://news.ycombinator.com/item?id=48196867" rel="noopener noreferrer"&gt;Hacker News thread&lt;/a&gt;. But this is different from shutting down a consumer app. Consumer apps have users. Open-source projects have contributors — people who invested engineering time into something they were told would remain open.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Developers Are Doing Right Now
&lt;/h2&gt;

&lt;p&gt;The HN and GitHub threads paint a clear migration picture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Claude Code&lt;/strong&gt; is the most frequently mentioned alternative. Developers cite the plugin/skills ecosystem and extensibility.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;OpenAI CLI (Codex)&lt;/strong&gt; gets mentions from developers who want to stay with a major provider.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Local/self-hosted alternatives&lt;/strong&gt; — Ollama-based setups, open-weight model wrappers — are attracting developers who now distrust cloud-dependent CLIs entirely.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini CLI forks&lt;/strong&gt; from the last Apache 2.0 commit exist, though a fork without Google's model access has uncertain long-term viability.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The irony: Google's move may have driven more adoption to competitors than any competing marketing campaign could have achieved.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Does Not Solve
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;The 60 RPM question.&lt;/strong&gt; Gemini CLI offered 60 requests per minute. Google has not clearly disclosed whether Antigravity maintains this rate limit for paid tiers. If you are evaluating a switch, verify this before committing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The fork path.&lt;/strong&gt; Gemini CLI's Apache 2.0 code is still available. A community fork is technically possible. Practically, a fork without Gemini model access has limited utility — someone would need to wire it to alternative providers, and that is a significant effort with no clear owner.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Whether Antigravity improves.&lt;/strong&gt; Google mentioned the "possibility" of open-sourcing Antigravity in the future. Missing features might ship quickly. The free tier might expand. But "possible" is not a commitment, and developers building workflows need commitments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Google's internal reasoning.&lt;/strong&gt; Serving 1,000 free requests per day per user at scale is not cheap. The economics may have forced this decision. But the execution — 30-day notice, missing features in the replacement, a free tier reduced to near-irrelevance, and the timing — turned a business decision into a trust problem.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lesson
&lt;/h2&gt;

&lt;p&gt;If your workflow depends on a vendor-controlled tool, you have two options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Accept the dependency&lt;/strong&gt; and price in the risk that the vendor changes terms. Budget migration time from day one.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build on open standards and self-hostable tools&lt;/strong&gt; where the switching cost stays low.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Neither option is wrong. But pretending the risk does not exist — that is the mistake Google just made visible.&lt;/p&gt;

&lt;p&gt;The developers who contributed to Gemini CLI under Apache 2.0 did nothing wrong. They participated in open source the way it is supposed to work. What failed was not the license. It was the assumption that a trillion-dollar company's incentives would stay aligned with theirs.&lt;/p&gt;

&lt;p&gt;Remember this the next time a major provider launches something generous and open. The question is not whether it is good today. The question is: what happens when your workflow depends on it, and the economics change?&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part 5 will cover the overall I/O scorecard — what the four wins and four misses tell us about where Google is heading.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you migrated off Gemini CLI already, I'd like to hear what you moved to and what the transition cost was. Drop a comment or find me on &lt;a href="https://github.com/taekim34" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>google</category>
      <category>cli</category>
    </item>
    <item>
      <title>Google I/O Review (3/5) — Gemini Omni Is a Learned Physics Engine</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Wed, 20 May 2026 19:08:30 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/google-io-review-35-gemini-omni-is-a-learned-physics-engine-4ka8</link>
      <guid>https://dev.to/ww-w-ai/google-io-review-35-gemini-omni-is-a-learned-physics-engine-4ka8</guid>
      <description>&lt;h1&gt;
  
  
  Gemini Omni Is a Learned Physics Engine — Like Unity, But the Rules Aren't Coded
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Google I/O 2026 Review — Part 3 of 5&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Most video generation models fake physics. They learn what gravity &lt;em&gt;looks like&lt;/em&gt; — a ball falls, a cloth drapes — and reproduce the visual pattern. Push the scene past what the training data covered and things break. A marble doesn't bounce right. Shadows point the wrong way after a lighting edit. Swap a background and the character morphs into someone else.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://9to5google.com/2026/05/19/gemini-omni-create-anything-model-video/" rel="noopener noreferrer"&gt;Gemini Omni&lt;/a&gt; does something different. It maintains physics and identity across frames — not because someone coded &lt;code&gt;gravity = 9.8&lt;/code&gt; into the system, but because the model built an internal representation of how the physical world works.&lt;/p&gt;

&lt;p&gt;That distinction matters more than the demo reel suggests.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Demos That Stopped the Room
&lt;/h2&gt;

&lt;p&gt;Three demos at I/O 2026 showed what Omni can do.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hand-drawn character to animation.&lt;/strong&gt; Someone sketched a character on paper, uploaded it, and Omni turned it into a 10-second animated story. Not a static image with parallax — an actual animation with movement, expression changes, and a coherent scene.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Marble physics.&lt;/strong&gt; A marble bouncing down a chain-reaction track. Gravity pulled it at the right rate. Bounce trajectories matched the angle of impact. Each bounce produced a distinct sound, including a bell ring at the end. The physics weren't approximate. They looked simulated.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Claymation protein folding.&lt;/strong&gt; A single prompt generated an educational video showing protein folding in claymation style. The clay texture stayed consistent across the sequence. The folding motion followed biologically plausible mechanics. One prompt. No keyframes. No rigging.&lt;/p&gt;

&lt;p&gt;One reviewer at &lt;a href="https://www.chatprd.ai/how-i-ai/google-io-new-ai-tools-gemini-35-flash-to-omni-video" rel="noopener noreferrer"&gt;ChatPRD&lt;/a&gt; called it "the most impressive demo of the day." Having watched the full keynote and the hands-on sessions, I think that's fair.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Makes This Different from Sora
&lt;/h2&gt;

&lt;p&gt;Every video generation model can produce impressive isolated clips. The test is what happens when you edit.&lt;/p&gt;

&lt;p&gt;Change the background in a Sora-generated scene, and the character often drifts — subtle changes to face shape, clothing color, body proportions. The model doesn't &lt;em&gt;know&lt;/em&gt; the character is supposed to stay the same. It's generating each frame based on visual similarity to the previous frame, not based on an understanding that this is the same entity.&lt;/p&gt;

&lt;p&gt;Omni maintains identity after edits. Swap the background from a forest to a kitchen. Change the lighting from warm to cold. Replace a prop. The character stays the same — same face, same proportions, same clothing. Google's claim is that the model maintains a persistent representation of objects and their properties, independent of the scene context.&lt;/p&gt;

&lt;p&gt;This is the hardest problem in video generation and the reason most generated videos feel uncanny. They look right for 3 seconds. Then something shifts.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Unity Analogy — And Why It Matters
&lt;/h2&gt;

&lt;p&gt;Here is the mental model I keep coming back to.&lt;/p&gt;

&lt;p&gt;In Unity or Unreal, physics works because engineers wrote the rules. &lt;code&gt;Rigidbody.AddForce()&lt;/code&gt; applies Newtonian mechanics. Collision detection uses mathematical bounding volumes. Gravity is a constant. The engine simulates a world by executing code.&lt;/p&gt;

&lt;p&gt;Omni does something conceptually similar — it maintains physics across frames — but through a different mechanism. The rules aren't coded. They're &lt;em&gt;learned&lt;/em&gt;. The model internalized how gravity, light, momentum, and material properties behave by processing enormous amounts of video data. It built what researchers call a &lt;strong&gt;world model&lt;/strong&gt;: an internal representation of physical laws that it applies when generating new frames.&lt;/p&gt;

&lt;p&gt;Think of it this way:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Game engine (Unity)&lt;/th&gt;
&lt;th&gt;Learned physics (Omni)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Physics rules&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Explicitly coded (&lt;code&gt;F = ma&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;Implicitly learned from data&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Object identity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Tracked via object IDs&lt;/td&gt;
&lt;td&gt;Maintained via internal representation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Edit behavior&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Deterministic — same input, same output&lt;/td&gt;
&lt;td&gt;Probabilistic — but consistent within a generation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Novel scenarios&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Only what the code handles&lt;/td&gt;
&lt;td&gt;Generalizes from training data patterns&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Failure mode&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Crashes or glitches visibly&lt;/td&gt;
&lt;td&gt;Degrades subtly (uncanny valley)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The game engine approach has known limits and known strengths. You can trust the physics because you wrote the physics. The learned approach trades that certainty for generality — it can handle scenarios nobody anticipated, because it doesn't need someone to write the collision handler first.&lt;/p&gt;

&lt;p&gt;The phrase I wrote in my &lt;a href="https://dev.to/taekim34"&gt;full I/O review&lt;/a&gt; keeps sticking: &lt;strong&gt;"Like Unity, but the rules aren't coded. They're understood."&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Impact: Who Cares Beyond the Demo Reel
&lt;/h2&gt;

&lt;p&gt;Three concrete use cases where this changes cost structures.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;YouTube thumbnails and short-form video.&lt;/strong&gt; A solo creator who currently pays $200-500 for a 30-second product animation can describe the scene in a prompt. If Omni delivers even 70% of the quality at near-zero marginal cost, the economics of content production shift for every small creator and indie team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Product walkthrough videos.&lt;/strong&gt; SaaS companies spend $5,000-15,000 per explainer video (script, motion graphics, voiceover, revisions). A world model that understands object permanence means you can generate a walkthrough, swap the UI screenshots for the next version, and the video stays coherent. The revision cycle collapses.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Educational content.&lt;/strong&gt; The claymation protein-folding demo is not a party trick. If a biology teacher can prompt "show me mitosis in stop-motion clay style, 30 seconds" and get something accurate enough for a classroom, that's a production studio in a text box.&lt;/p&gt;

&lt;p&gt;The common thread: &lt;strong&gt;Omni reduces the cost of visual storytelling from "hire a team" to "write a paragraph."&lt;/strong&gt; Not for Hollywood. Not for AAA games. For the long tail of content that nobody could afford to produce before.&lt;/p&gt;




&lt;h2&gt;
  
  
  What It Can't Do Yet
&lt;/h2&gt;

&lt;p&gt;This section matters more than the demo reel.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;It's still in preview.&lt;/strong&gt; Google showed curated demos on stage. We have not seen the failure cases — the weird hand, the physics glitch, the moment where identity drifts on frame 87. Every generative model looks incredible in a keynote. The question is what happens on the 50th generation you run on your own.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Long-form is unproven.&lt;/strong&gt; The demos were 10 seconds. What happens at one minute? Two minutes? Five? World models degrade over time — small errors in frame N compound by frame N+100. Whether Omni maintains coherence over longer durations is an open question. Omni Flash clips are capped at 10 seconds; Sora supports up to 60.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Production-grade quality is not validated.&lt;/strong&gt; "Impressive demo" and "I can ship this to customers" are different bars. Color accuracy, resolution consistency, artifact rates under varied prompts — none of these have been tested at scale by external users.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The pricing is unknown.&lt;/strong&gt; A world model that generates physically consistent video is computationally expensive. If Omni pricing follows the Flash trajectory — where prices have climbed steeply across Flash generations — the cost math could limit adoption to enterprises.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where This Fits in the Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Omni is not a video editor. It's not a motion graphics tool. It's a &lt;strong&gt;world simulator that outputs video.&lt;/strong&gt; That framing changes what you compare it to.&lt;/p&gt;

&lt;p&gt;Sora and Runway are video generators — they turn text into pixels. Omni is closer to a physics engine that happens to render its output as video frames. The difference is whether the system &lt;em&gt;understands&lt;/em&gt; the scene or merely &lt;em&gt;paints&lt;/em&gt; it.&lt;/p&gt;

&lt;p&gt;If that understanding holds up outside curated demos — and that's a genuine if — the implications go beyond content creation. Robotics simulation, architectural visualization, scientific modeling, game prototyping. Any field that needs "show me what would happen if..." becomes a potential use case.&lt;/p&gt;

&lt;p&gt;For now, it's a preview. An impressive one. But a preview.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;What I'm watching for next:&lt;/strong&gt; Public API access, pricing, and the first independent benchmarks on identity persistence across 60+ second clips. The demo set a bar. The product needs to clear it.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you're tracking Gemini Omni or have tested other world-model approaches, I'd like to hear what you've seen. Comments or &lt;a href="https://github.com/taekim34" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://9to5google.com/2026/05/19/gemini-omni-create-anything-model-video/" rel="noopener noreferrer"&gt;Gemini Omni hands-on — 9to5Google&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.chatprd.ai/how-i-ai/google-io-new-ai-tools-gemini-35-flash-to-omni-video" rel="noopener noreferrer"&gt;ChatPRD review&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://blog.google/innovation-and-ai/sundar-pichai-io-2026/" rel="noopener noreferrer"&gt;Google I/O 2026 keynote&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>google</category>
      <category>machinelearning</category>
      <category>video</category>
    </item>
    <item>
      <title>Google I/O Review (2/5) — Google Just Made Serverless Agents Real</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Wed, 20 May 2026 19:07:32 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/google-io-review-25-google-just-made-serverless-agents-real-5cjp</link>
      <guid>https://dev.to/ww-w-ai/google-io-review-25-google-just-made-serverless-agents-real-5cjp</guid>
      <description>&lt;h1&gt;
  
  
  Google Just Made Serverless Agents Real
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Part 2 of 5 — Google I/O 2026 Review&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Every developer who has shipped an agent demo knows the feeling. The prototype works. The Loom video gets likes. Then someone asks: "Cool — how do I use this with 500 real users?"&lt;/p&gt;

&lt;p&gt;That question kills most agent projects.&lt;/p&gt;

&lt;p&gt;The gap between demo and production is not about prompts or tool definitions. It is about infrastructure — container orchestration, autoscaling policies, health checks, token budget enforcement, multi-turn state management, and log aggregation. The same gap that existed between "I wrote a web app" and "this web app handles 10,000 concurrent users" before EC2, Cloud Run, and Lambda showed up.&lt;/p&gt;

&lt;p&gt;At &lt;a href="https://blog.google/innovation-and-ai/sundar-pichai-io-2026/" rel="noopener noreferrer"&gt;I/O 2026&lt;/a&gt;, Google shipped the answer. The &lt;strong&gt;Managed Agents API&lt;/strong&gt; does for agents what Cloud Functions did for serverless computing. Deploy, scale, monitor, pay per execution. No cluster. No YAML. One CLI command.&lt;/p&gt;

&lt;p&gt;I called it the most consequential announcement from I/O in my &lt;a href="https://dev.to/taekim34/google-io-2026-gave-us-serverless-agents"&gt;Part 1 review&lt;/a&gt;. This post explains why.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Demo-to-Production Gap
&lt;/h2&gt;

&lt;p&gt;Building an agent is easy now. LangChain, CrewAI, AutoGen, Claude Code — pick a framework, define tools, write a system prompt, and you have a working prototype in an afternoon.&lt;/p&gt;

&lt;p&gt;Running that agent for real users is a different discipline entirely. Here is what production demands that demos do not:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Concern&lt;/th&gt;
&lt;th&gt;Demo&lt;/th&gt;
&lt;th&gt;Production&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scaling&lt;/td&gt;
&lt;td&gt;Your laptop&lt;/td&gt;
&lt;td&gt;1 to 10,000 concurrent sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;State&lt;/td&gt;
&lt;td&gt;In-memory dict&lt;/td&gt;
&lt;td&gt;Persistent multi-turn across sessions&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Monitoring&lt;/td&gt;
&lt;td&gt;Print statements&lt;/td&gt;
&lt;td&gt;Token consumption, latency p95, error rates, cost attribution&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rollback&lt;/td&gt;
&lt;td&gt;Ctrl+Z&lt;/td&gt;
&lt;td&gt;Version pinning, canary deploys, instant rollback&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tool auth&lt;/td&gt;
&lt;td&gt;Hardcoded API keys&lt;/td&gt;
&lt;td&gt;Scoped service accounts, secret rotation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost control&lt;/td&gt;
&lt;td&gt;"I'll watch it"&lt;/td&gt;
&lt;td&gt;Per-agent token budgets, kill switches&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most indie developers and small teams get stuck somewhere in this table. The agent works. The infrastructure to run it does not exist yet. So the project stays a demo.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cloud Functions, But for Agents
&lt;/h2&gt;

&lt;p&gt;Google's move is to compress that entire table into a managed runtime. The mental model is straightforward: if you have used Cloud Functions or Cloud Run, you already understand the deployment pattern. The difference is that the runtime is agent-aware — it understands tool call chains, token budgets, and conversation state natively.&lt;/p&gt;

&lt;p&gt;Here is what a deploy looks like with the actual &lt;a href="https://google.github.io/agents-cli/guide/deployment/" rel="noopener noreferrer"&gt;Agents CLI&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Install Agents CLI&lt;/span&gt;
uvx google-agents-cli

&lt;span class="c"&gt;# Scaffold for Cloud Run deployment&lt;/span&gt;
agents-cli scaffold enhance &lt;span class="nt"&gt;-d&lt;/span&gt; cloud_run

&lt;span class="c"&gt;# Provision infrastructure&lt;/span&gt;
agents-cli infra single-project

&lt;span class="c"&gt;# Deploy&lt;/span&gt;
agents-cli deploy
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That replaces a Kubernetes cluster, an autoscaler config, a Prometheus stack, and a custom token-tracking pipeline. For a solo builder, this is the difference between "I need a DevOps hire" and "I need a terminal."&lt;/p&gt;

&lt;h2&gt;
  
  
  30+ Integrations Out of the Box
&lt;/h2&gt;

&lt;p&gt;The tool registry ships with pre-built connectors. Not "we plan to support" — shipping in preview:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Integrations&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Dev tools&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GitHub, GitLab, Jira, Linear&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Productivity&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Notion, Google Workspace, Slack, Asana&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Data&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;MongoDB, BigQuery&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Payments &amp;amp; CRM&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stripe, Salesforce (via StackOne), HubSpot (via StackOne)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Cloud&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;GCP services (Cloud Storage, Pub/Sub, Cloud SQL)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Communication&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Twilio (via StackOne)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This matters because the hardest part of building a useful agent is not the LLM call — it is connecting the agent to the systems where work actually happens. A customer support agent that cannot read your ticket system or update your CRM is a chatbot, not an agent.&lt;/p&gt;

&lt;h2&gt;
  
  
  MCP Native — The Interoperability Play
&lt;/h2&gt;

&lt;p&gt;The platform speaks &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; natively. Two things follow from this:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;First&lt;/strong&gt;, existing REST APIs can be wrapped as MCP tools through &lt;a href="https://cloud.google.com/apigee" rel="noopener noreferrer"&gt;Apigee&lt;/a&gt; without rewriting. If you have an internal API, you do not need to build a custom connector. Apigee generates the MCP schema, and the agent can call it like any other tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Second&lt;/strong&gt;, tool definitions are portable to any MCP-compatible client. The platform is not model-locked at the protocol level. Your agent's tool definitions and conversation flows work with any client that speaks MCP.&lt;/p&gt;

&lt;p&gt;This is a deliberate architectural choice. Google controls the runtime, the governance, and the registry. But the tool layer speaks an open protocol. That is a more nuanced lock-in story than "everything is proprietary" or "everything is open."&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lock-In, Honestly
&lt;/h2&gt;

&lt;p&gt;I want to be specific about what ties you to GCP and what does not.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Locked to GCP:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The agent runtime itself — execution, scaling, health checks&lt;/li&gt;
&lt;li&gt;The governance layer — who can deploy, what tools an agent can access, audit logs&lt;/li&gt;
&lt;li&gt;The tool registry format — how connectors are packaged and versioned&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Portable:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Your prompts and system instructions&lt;/li&gt;
&lt;li&gt;Tool definitions (if you use MCP, they work elsewhere)&lt;/li&gt;
&lt;li&gt;Conversation flow logic&lt;/li&gt;
&lt;li&gt;The LLM choice (through MCP interoperability)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The pattern is familiar from Cloud Functions: your function code is portable, but the trigger bindings, IAM policies, and monitoring integrations are not. You can move your logic. You cannot move your operational wrapper.&lt;/p&gt;

&lt;p&gt;Worth pricing in before going all-in. Especially if you are an indie developer building on a platform that could change pricing or terms — which, &lt;a href="https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/" rel="noopener noreferrer"&gt;as Google demonstrated the same day with Gemini CLI&lt;/a&gt;, is not a theoretical concern.&lt;/p&gt;

&lt;h2&gt;
  
  
  Agent-First: What Antigravity 2.0 Signals
&lt;/h2&gt;

&lt;p&gt;The Managed Agents API did not arrive in isolation. Antigravity 2.0 — Google's next-generation development platform — explicitly treats agents as &lt;a href="https://blog.google/innovation-and-ai/sundar-pichai-io-2026/" rel="noopener noreferrer"&gt;first-class deployment targets&lt;/a&gt; with versioning, rollback, and observability. A demo showed an OS built by 93 agents over 12 hours, plus a playable Doom clone, with agent-driven development.&lt;/p&gt;

&lt;p&gt;The execution had problems (forced updates broke existing projects — I covered this in &lt;a href="https://dev.to/taekim34/google-io-2026-gave-us-serverless-agents"&gt;Part 1&lt;/a&gt;). But the directional signal is clear: Google sees agents not as a feature of its cloud, but as a deployment primitive alongside containers and functions.&lt;/p&gt;

&lt;p&gt;That is new. AWS has SageMaker endpoints and Bedrock agents, but neither ships a dedicated agent CLI. Azure has AI Studio, but it lives in a separate portal. Google is among the first major clouds to ship a purpose-built &lt;code&gt;agents-cli&lt;/code&gt; that takes an agent from scaffold to production in four commands.&lt;/p&gt;

&lt;h2&gt;
  
  
  What This Means for Indie Developers
&lt;/h2&gt;

&lt;p&gt;Here is the before-and-after for a solo builder who wants to ship a production agent:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before Managed Agents API:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write agent logic&lt;/li&gt;
&lt;li&gt;Containerize (Dockerfile, multi-stage builds)&lt;/li&gt;
&lt;li&gt;Set up Kubernetes or Cloud Run&lt;/li&gt;
&lt;li&gt;Configure autoscaling policies&lt;/li&gt;
&lt;li&gt;Build token tracking and cost monitoring&lt;/li&gt;
&lt;li&gt;Implement health checks&lt;/li&gt;
&lt;li&gt;Set up log aggregation (ELK, Datadog, etc.)&lt;/li&gt;
&lt;li&gt;Handle multi-turn state persistence&lt;/li&gt;
&lt;li&gt;Manage secret rotation for tool credentials&lt;/li&gt;
&lt;li&gt;Build a deployment pipeline&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;After:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Write agent logic&lt;/li&gt;
&lt;li&gt;&lt;code&gt;agents-cli deploy&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Steps 2-10 are not eliminated — they are absorbed by the platform. The same compression Cloud Functions brought to backend workloads now applies to agents.&lt;/p&gt;

&lt;p&gt;One caveat: the API is in preview. Pricing is not finalized. Production SLAs are not published. I would not migrate a revenue-critical agent today. But for new projects, the build-vs-buy calculation just changed fundamentally.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Bigger Picture
&lt;/h2&gt;

&lt;p&gt;Google I/O 2026 had a clear thesis: &lt;strong&gt;agents are infrastructure now, not experiments&lt;/strong&gt;. The Managed Agents API, Antigravity 2.0's agent-first deployment, and the 30+ pre-built integrations all point the same direction — the "cool demo" era of AI agents is ending. The "runs in production at scale" era is starting.&lt;/p&gt;

&lt;p&gt;For indie developers, the barrier just dropped from "hire a DevOps team" to "learn one CLI command." That is not hype. That is Cloud Functions, 2016, happening again.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Part 3 of this series covers Gemini Omni — the learned physics engine for video that stopped the room at I/O. &lt;a href="https://dev.to/taekim34"&gt;Follow me on dev.to&lt;/a&gt; to catch it when it drops.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you are building agents and evaluating managed platforms — or if you have tried the preview — I would like to hear your experience. Comments or &lt;a href="https://github.com/taekim34" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.google/innovation-and-ai/sundar-pichai-io-2026/" rel="noopener noreferrer"&gt;Sundar Pichai I/O 2026 keynote&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.googleblog.com/an-important-update-transitioning-gemini-cli-to-antigravity-cli/" rel="noopener noreferrer"&gt;Gemini CLI to Antigravity CLI transition&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>google</category>
      <category>agents</category>
      <category>serverless</category>
    </item>
    <item>
      <title>Google I/O Review (1/5) — Gemini 3.5 'Flash' Costs 15x More Than Flash 2.0. It's Pro in Disguise</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Wed, 20 May 2026 19:06:38 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/google-io-review-15-gemini-35-flash-costs-15x-more-than-flash-20-its-pro-in-disguise-2og</link>
      <guid>https://dev.to/ww-w-ai/google-io-review-15-gemini-35-flash-costs-15x-more-than-flash-20-its-pro-in-disguise-2og</guid>
      <description>&lt;h1&gt;
  
  
  Gemini 3.5 "Flash" Costs 15x More Than Flash 2.0 — It's Pro in Disguise
&lt;/h1&gt;

&lt;p&gt;&lt;em&gt;Google I/O 2026 Review — Part 1 of 5&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;The keynote crowd cheered. Sundar Pichai &lt;a href="https://blog.google/innovation-and-ai/sundar-pichai-io-2026/" rel="noopener noreferrer"&gt;announced&lt;/a&gt; that Gemini 3.5 Flash outperforms Gemini 3.1 Pro on multiple benchmarks. The narrative was clean: the lightweight, cheap model just beat the flagship. The start of "the agentic Gemini era."&lt;/p&gt;

&lt;p&gt;Then I opened the &lt;a href="https://ai.google.dev/gemini-api/docs/pricing" rel="noopener noreferrer"&gt;pricing page&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Flash and Pro Are Neighbors Now
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input (per 1M tokens)&lt;/th&gt;
&lt;th&gt;Output (per 1M tokens)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1.50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$9.00&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini 3.1 Pro&lt;/td&gt;
&lt;td&gt;$2.00&lt;/td&gt;
&lt;td&gt;$12.00&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Source: &lt;a href="https://ai.google.dev/gemini-api/docs/pricing" rel="noopener noreferrer"&gt;Google AI pricing&lt;/a&gt;, accessed 2026-05-19.&lt;/p&gt;

&lt;p&gt;Flash at $1.50/$9.00. Pro at $2.00/$12.00. That is a 25% gap on input, 25% on output. These are not different tiers. They are neighbors. Two years ago, Flash cost a fraction of Pro. Now they share the same block.&lt;/p&gt;

&lt;p&gt;If someone showed you these two price points without labels, you would guess they are variants of the same model class. You would be right.&lt;/p&gt;

&lt;h2&gt;
  
  
  How Flash Got Here: Three Generations of Price Creep
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Model&lt;/th&gt;
&lt;th&gt;Input (per 1M tokens)&lt;/th&gt;
&lt;th&gt;Output (per 1M tokens)&lt;/th&gt;
&lt;th&gt;vs 2.0 Flash (Input)&lt;/th&gt;
&lt;th&gt;vs 2.0 Flash (Output)&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1.5 Flash&lt;/td&gt;
&lt;td&gt;$0.075&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;0.75x&lt;/td&gt;
&lt;td&gt;0.75x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.0 Flash&lt;/td&gt;
&lt;td&gt;$0.10&lt;/td&gt;
&lt;td&gt;$0.40&lt;/td&gt;
&lt;td&gt;1x (baseline)&lt;/td&gt;
&lt;td&gt;1x (baseline)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2.5 Flash&lt;/td&gt;
&lt;td&gt;$0.30&lt;/td&gt;
&lt;td&gt;$2.50&lt;/td&gt;
&lt;td&gt;3x&lt;/td&gt;
&lt;td&gt;6.25x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3.0 Flash&lt;/td&gt;
&lt;td&gt;$0.50&lt;/td&gt;
&lt;td&gt;$3.00&lt;/td&gt;
&lt;td&gt;5x&lt;/td&gt;
&lt;td&gt;7.5x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3.5 Flash&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$1.50&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$9.00&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;15x&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;22.5x&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Source: &lt;a href="https://ai.google.dev/gemini-api/docs/pricing" rel="noopener noreferrer"&gt;Google AI pricing&lt;/a&gt;. All prices are standard (non-batch) per 1M tokens.&lt;/p&gt;

&lt;p&gt;From 2.0 Flash to 3.5 Flash: input price rose &lt;strong&gt;15x&lt;/strong&gt; ($0.10 to $1.50). Output price rose &lt;strong&gt;22.5x&lt;/strong&gt; ($0.40 to $9.00). A model called "Flash" now costs fifteen times what Flash cost three generations ago.&lt;/p&gt;

&lt;p&gt;The trajectory is clear. Flash did not stay in the lightweight lane. It grew into the price range that Pro used to occupy.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Name Didn't Change. The Economics Did.
&lt;/h2&gt;

&lt;p&gt;Here is what I think actually happened: Google shipped Pro-level performance and put the Flash label on it.&lt;/p&gt;

&lt;p&gt;The benchmarks are real. Flash 3.5 does outperform Pro 3.1 on the metrics Google showed. But outperforming Pro while costing nearly the same as Pro is not "the cheap model won." It is "the expensive model got a new name."&lt;/p&gt;

&lt;p&gt;Think about it from Google's side. If they had called it Pro 3.5 at $1.50/$9.00, the story would be: "Google cut Pro pricing by 25%." Accurate, useful, but not a keynote moment. By calling it Flash, the story becomes: "Flash beat Pro!" That is a keynote moment. Same product economics, different narrative.&lt;/p&gt;

&lt;p&gt;Pichai himself leaned into the framing. He used the word "tokenmaxxing" during the keynote — more tokens, more context, more throughput. Some out there might call this tokenmaxxing, he said. The naming is part of that narrative. Flash &lt;em&gt;sounds&lt;/em&gt; lightweight and affordable. The pricing page tells a different story.&lt;/p&gt;

&lt;h2&gt;
  
  
  So Is This Bad? Not Exactly.
&lt;/h2&gt;

&lt;p&gt;I want to be fair. The absolute price matters more than the brand name.&lt;/p&gt;

&lt;p&gt;Pro-level performance at $1.50/$9.00 is genuinely useful. Consider an agent workload — a customer support bot handling 50,000 conversations per day. At legacy Pro pricing ($2.00/$12.00), the daily output token cost for, say, 500 tokens per response is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;50,000 conversations x 500 output tokens = 25M output tokens/day&lt;br&gt;
At Pro 3.1: 25 x $12.00 = &lt;strong&gt;$300/day&lt;/strong&gt;&lt;br&gt;
At Flash 3.5: 25 x $9.00 = &lt;strong&gt;$225/day&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That is $75/day saved, or roughly &lt;strong&gt;$2,250/month&lt;/strong&gt; — with the same or better benchmark performance. For agent-heavy workloads running at scale, this price point opens real economic headroom.&lt;/p&gt;

&lt;p&gt;The win is not that "Flash beat Pro." The win is that Pro-grade inference got 25% cheaper. That is a quieter story, but a more honest one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Benchmarks vs. Production: The Usual Caveat
&lt;/h2&gt;

&lt;p&gt;One thing the keynote did not cover: benchmark performance and production performance are different conversations. Benchmarks test isolated capabilities — reasoning, coding, knowledge retrieval — under controlled conditions. Production workloads add latency variance, context window pressure, tool-call chains, and failure modes that benchmarks do not measure.&lt;/p&gt;

&lt;p&gt;I have not tested Flash 3.5 in production yet. Nobody outside Google has had enough time to. If you are making infrastructure decisions based on the keynote benchmarks alone, you are making them on incomplete data. Wait for the community benchmarks. Wait for your own evals.&lt;/p&gt;

&lt;h2&gt;
  
  
  Gemma 4: A Quick Note from Local Testing
&lt;/h2&gt;

&lt;p&gt;On a related note — I have been running Gemma 4 (2.3B) locally for &lt;a href="https://github.com/ww-w-ai/on-device-llm-wiki" rel="noopener noreferrer"&gt;on-device-llm-wiki&lt;/a&gt;, a zero-cost, fully offline knowledge engine. In our internal reasoning benchmark across on-device and cloud models, Gemma 4 scored 66/85 — outperforming Granite 3.4B (52), Qwen3 4B (28), and SmolLM2 1.7B (35). For reference, Claude Haiku 4.5 scored 76. A free, local 2B model reaching 87% of a commercial cloud model's reasoning score — while beating a 4B competitor by more than 2x — is not incremental. It is a generational leap.&lt;/p&gt;

&lt;p&gt;If Flash 3.5 carries the same generational improvement at cloud scale, the performance claims are plausible. Gemma is the open-weight sibling of the Gemini family, and quality gains in one tend to reflect in the other. But plausible is not confirmed — that requires production testing, not keynote slides.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Think You Should Do
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Read the pricing page, not the keynote.&lt;/strong&gt; The &lt;a href="https://ai.google.dev/gemini-api/docs/pricing" rel="noopener noreferrer"&gt;pricing page&lt;/a&gt; is the source of truth. Marketing narratives are not.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Run your own evals.&lt;/strong&gt; If you are considering Flash 3.5 for production, test it on &lt;em&gt;your&lt;/em&gt; workloads. Benchmark suites test what benchmark suites test.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Compare to the actual competition.&lt;/strong&gt; Flash 3.5 at $1.50/$9.00 competes with Claude Sonnet 4 ($3/$15), GPT-4.1 ($2/$8), and other mid-to-high tier models. Compare apples to apples at the price point, not at the brand name.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Track the trajectory.&lt;/strong&gt; Flash went from $0.10/$0.40 to $1.50/$9.00 in three generations. If the pattern holds, Flash 4.0 will cost what Pro costs today. Plan accordingly.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  The Bottom Line
&lt;/h2&gt;

&lt;p&gt;Google told a story about the cheap model beating the expensive one. The pricing page tells a story about the expensive model getting a cheaper name. Both stories have truth in them. The benchmarks are real. The price convergence is real. Which story matters more depends on what you are building.&lt;/p&gt;

&lt;p&gt;For me, the useful takeaway is simpler: Pro-level performance is now available at $1.50/$9.00. That is good for anyone running agents at scale. Just do not call it cheap — it is 15x more expensive than the Flash you remember.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;This is Part 1 of a 5-part Google I/O 2026 review series. Next up: Managed Agents API — serverless agents arrive, but so does GCP lock-in.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;If you have tested Flash 3.5 against Pro on your own workloads, I would like to hear the numbers. Drop a comment or find me on &lt;a href="https://github.com/taekim34" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;Sources:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.google/innovation-and-ai/sundar-pichai-io-2026/" rel="noopener noreferrer"&gt;Sundar Pichai I/O 2026 keynote&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/pricing" rel="noopener noreferrer"&gt;Google AI pricing page&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/ww-w-ai/on-device-llm-wiki" rel="noopener noreferrer"&gt;on-device-llm-wiki&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>google</category>
      <category>llm</category>
      <category>pricing</category>
    </item>
    <item>
      <title>AI Agents Are About to Need Government-Issued IDs</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Tue, 12 May 2026 08:26:01 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/ai-agents-are-about-to-need-government-issued-ids-3l6g</link>
      <guid>https://dev.to/ww-w-ai/ai-agents-are-about-to-need-government-issued-ids-3l6g</guid>
      <description>&lt;p&gt;AI Agents Are Getting Government IDs — Courtesy of the World's Most Powerful Spy Alliance&lt;/p&gt;

&lt;p&gt;In the first week of May, the most powerful intelligence alliance on the planet told the tech industry: your AI agents need passports.&lt;/p&gt;

&lt;p&gt;Between May 1 and May 3, the Five Eyes nations — the United States, the United Kingdom, Australia, Canada, and New Zealand — &lt;a href="https://www.theregister.com/2026/05/04/five_eyes_agentic_ai_recommendations/" rel="noopener noreferrer"&gt;published joint guidelines&lt;/a&gt; titled "Careful Adoption of Agentic AI Services."&lt;/p&gt;

&lt;p&gt;If the name doesn't ring a bell: Five Eyes is the world's most powerful espionage alliance, founded in 1946 under the UKUSA Agreement. These five nations share intercepted communications intelligence — this is the same network behind the NSA global surveillance programs revealed by Edward Snowden.&lt;/p&gt;

&lt;p&gt;The authoring bodies include CISA (the US Cybersecurity and Infrastructure Security Agency), the NSA, and the UK's National Cyber Security Centre (NCSC), along with partner agencies from each member country.&lt;/p&gt;

&lt;p&gt;This is the first time these governments have taken a coordinated, public stance on how AI agents should be governed in production environments.&lt;/p&gt;

&lt;p&gt;Let me say upfront: I agree with the direction. The engineering recommendations in this document are solid, and they would have prevented real disasters — like the Cursor agent that wiped a production database in 9 seconds last month. But when you stop and ask &lt;em&gt;why a spy alliance published AI agent guidelines&lt;/em&gt;, not a tech standards body like IEEE or NIST — that is where the story gets uncomfortable.&lt;/p&gt;

&lt;p&gt;Let me walk you through both sides.&lt;/p&gt;

&lt;h2&gt;
  
  
  What the Guidelines Actually Say
&lt;/h2&gt;

&lt;p&gt;The document is surprisingly concrete for a government publication. It does not deal in vague platitudes about "responsible AI." Instead it lays out specific operational requirements:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Agent identity provisioning.&lt;/strong&gt; Every agent must have a unique, verifiable identity. No more anonymous processes hiding behind a shared API key.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audit logging.&lt;/strong&gt; Every action an agent takes must be recorded in a tamper-evident log. If an agent deletes a database table, there needs to be a trail that says which agent, when, under whose authority.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delegation chains.&lt;/strong&gt; When Agent A instructs Agent B to perform a task, the chain of authority must be traceable end-to-end. Think of it like a digital chain of custody.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human checkpoints.&lt;/strong&gt; System designs must include points where a human can intervene, review, or override an agent's planned action before it executes.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you have been building agentic systems, none of these ideas are radical. Most experienced teams already implement some version of these patterns. What is new is that a coalition of five national governments is now saying: this is the baseline.&lt;/p&gt;

&lt;p&gt;So far, so reasonable. Now let's talk about who is behind that baseline.&lt;/p&gt;

&lt;h2&gt;
  
  
  Wait — These Are the Snowden Guys?
&lt;/h2&gt;

&lt;p&gt;Before we go further, it is worth pausing on &lt;em&gt;who&lt;/em&gt; published this.&lt;/p&gt;

&lt;p&gt;In 2013, Edward Snowden — a contractor working for the NSA — leaked thousands of classified documents revealing that Five Eyes agencies had been secretly collecting phone records, emails, and internet activity of ordinary citizens on a massive scale. The NSA's PRISM program was pulling data directly from the servers of Google, Facebook, Apple, and Microsoft. Britain's GCHQ was tapping undersea fiber optic cables to intercept global internet traffic. The Five Eyes nations were also spying on each other's citizens as a workaround — if US law prohibited the NSA from surveilling Americans, they could ask Britain's GCHQ to do it instead and share the results.&lt;/p&gt;

&lt;p&gt;The public reaction was enormous. Governments were embarrassed. Tech companies scrambled to encrypt everything. Congress held hearings. The EU threatened to suspend data-sharing agreements. Snowden fled to Russia.&lt;/p&gt;

&lt;p&gt;That was 13 years ago. The same agencies are now telling you how your AI agents should behave.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a Spy Alliance — Not a Tech Standards Body
&lt;/h2&gt;

&lt;p&gt;So here is the question worth asking: why did &lt;em&gt;these agencies&lt;/em&gt; publish AI agent guidelines — and not IEEE, NIST, or the ISO?&lt;/p&gt;

&lt;p&gt;These agencies exist to do one thing: monitor communications and figure out who did what. Every phone call, email, and data packet that crosses a border — they want to be able to intercept it, read it, and trace it back to a person. They have spent 80 years and billions of dollars building the infrastructure to do exactly that.&lt;/p&gt;

&lt;p&gt;Now imagine a world where millions of AI agents are autonomously making API calls, sending messages, executing code, and moving data across borders — all hiding behind a single shared API key. No name. No identity. No trail. From the perspective of an intelligence agency, that is a nightmare. It is like trying to wiretap a phone call when you do not even know who is on the line.&lt;/p&gt;

&lt;p&gt;That is what this guideline is really about.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;strong&gt;identity provisioning&lt;/strong&gt; requirement means every AI agent gets a name that intelligence agencies can track — just like every phone gets a number.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;audit logging&lt;/strong&gt; requirement means every action an agent takes is recorded — just like every phone call generates a metadata record.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;delegation chain&lt;/strong&gt; requirement means you can trace who told the agent to act — just like tracing who ordered a wire transfer.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;None of this makes the guidelines wrong. The engineering recommendations are genuinely sound. But here is my interpretation:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;These guidelines do make AI agents safer — but could they also be the first step in extending the same surveillance infrastructure that already covers human communications to cover AI agent communications too?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The same agencies that were caught monitoring your emails now want to make sure your AI agents are not invisible to them. Whether you see that as responsible governance or surveillance overreach probably depends on how you felt about the Snowden revelations.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Practical Guide — Courtesy of Spies
&lt;/h2&gt;

&lt;p&gt;Regardless of where it comes from, the engineering itself is worth learning from. If you are building agents, these are points worth considering:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Per-agent identity.&lt;/strong&gt; A unique credential per agent instance instead of a shared API key means you can pinpoint &lt;em&gt;which&lt;/em&gt; agent acted when something goes wrong.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tamper-proof logging.&lt;/strong&gt; Recording every action and decision — not just errors — and making logs auditable by a third party increases transparency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Delegation chain tracking.&lt;/strong&gt; Mapping the authority path from Agent A → B → C means you can answer "who authorized this?"&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Human checkpoints.&lt;/strong&gt; A review step before high-impact actions (database writes, external APIs, financial transactions) could have prevented incidents like the Cursor wipe.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;These principles make your system more robust regardless of regulation. Just remember where they came from.&lt;/p&gt;




&lt;p&gt;To wrap up... Reminds me of &lt;strong&gt;Q handing James Bond his gadgets&lt;/strong&gt;. Turns out, when it comes to cutting-edge agent technology, the spy agencies are still leading the way.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What's your take?&lt;/strong&gt; New perspectives after reading this, security issues you've hit while building agents, or just your reaction to spy agencies publishing AI guidelines — drop anything in the comments.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Sources:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.theregister.com/2026/05/04/five_eyes_agentic_ai_recommendations/" rel="noopener noreferrer"&gt;Five Eyes Joint Guidelines — The Register&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cyberscoop.com/cisa-nsa-five-eyes-guidance-secure-deployment-ai-agents/" rel="noopener noreferrer"&gt;CISA/NSA Guidance on Agentic AI — CyberScoop&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.theregister.com/2026/04/27/cursoropus_agent_snuffs_out_pocketos/" rel="noopener noreferrer"&gt;Cursor/PocketOS Incident — The Register&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>security</category>
      <category>agenticai</category>
      <category>regulation</category>
    </item>
    <item>
      <title>Lorem Ipsum Makes LLMs Smarter. No, Seriously.</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Mon, 11 May 2026 17:32:06 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/lorem-ipsum-makes-llms-smarter-no-seriously-1j8l</link>
      <guid>https://dev.to/ww-w-ai/lorem-ipsum-makes-llms-smarter-no-seriously-1j8l</guid>
      <description>&lt;p&gt;You know Lorem Ipsum. The placeholder text designers have been slapping into mockups since the 1960s. Turns out, it might be one of the most effective tools for making language models better at math.&lt;/p&gt;

&lt;p&gt;A paper dropped last week — "Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration" (Huang et al., May 2026) — and the core finding is wild: prepending random Lorem Ipsum text before math problems during reinforcement learning training produces models that solve problems they otherwise never could.&lt;/p&gt;

&lt;p&gt;Let me walk through why this works, because it is genuinely clever once you see the mechanism.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem: When Every Answer Is Wrong, Nobody Learns
&lt;/h2&gt;

&lt;p&gt;Modern LLM training uses reinforcement learning after the initial pretraining phase. One popular method is GRPO (Group Relative Policy Optimization), where you sample multiple candidate answers for a question, then reward the good ones and penalize the bad ones.&lt;/p&gt;

&lt;p&gt;Here is the catch. For hard questions, &lt;em&gt;all&lt;/em&gt; sampled answers might be wrong. When that happens, every candidate gets the same score. The relative advantage between them collapses to zero. No gradient. No learning signal. The model just shrugs and moves on.&lt;/p&gt;

&lt;p&gt;This is called the &lt;strong&gt;zero-advantage problem&lt;/strong&gt;, and it hits hardest on the exact questions you want the model to learn most — the difficult ones sitting at the frontier of its capability.&lt;/p&gt;

&lt;p&gt;Previous fixes tried resampling (just roll the dice again) or adjusting reward scaling. They help a little, but fundamentally you are still asking the same question the same way, hoping for a different result.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Fix: Just Jam Some Latin In There
&lt;/h2&gt;

&lt;p&gt;LoPE — Lorem Perturbation for Exploration — does something that sounds like a prank. When the model fails on a hard question, LoPE prepends a randomly assembled chunk of Lorem Ipsum text before the prompt and resamples.&lt;/p&gt;

&lt;p&gt;So instead of:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Solve: What is the integral of x^2 from 0 to 3?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The model sees:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Lorem ipsum dolor sit amet, consectetur adipiscing elit.
Solve: What is the integral of x^2 from 0 to 3?
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And somehow, this works. The nonsense prefix perturbs the model's internal state just enough to push it down different reasoning paths. Think of it like giving a stuck hiker a gentle shove in a random direction — sometimes that is all you need to find a trail you could not see before.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Latin and Not Just Random Characters?
&lt;/h2&gt;

&lt;p&gt;The authors tested this systematically. Not all perturbations are equal. What works:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Latin-based vocabulary&lt;/strong&gt; (Lorem Ipsum words)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Low perplexity&lt;/strong&gt; (around 25) — the text needs to "look like language" to the model, even if it is meaningless&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What does not work well:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Random character strings (too alien, the model just ignores or breaks)&lt;/li&gt;
&lt;li&gt;High-perplexity gibberish&lt;/li&gt;
&lt;li&gt;Perturbations in the model's primary training language (too much semantic interference)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Lorem Ipsum hits a sweet spot: familiar enough that the model processes it normally, foreign enough that it does not contaminate the actual reasoning task. It nudges the hidden states without hijacking them.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Numbers
&lt;/h2&gt;

&lt;p&gt;Tested on Qwen3-4B-Base across standard math benchmarks:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Benchmark&lt;/th&gt;
&lt;th&gt;Standard GRPO&lt;/th&gt;
&lt;th&gt;LoPE&lt;/th&gt;
&lt;th&gt;Change&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;MATH-500&lt;/td&gt;
&lt;td&gt;77.80&lt;/td&gt;
&lt;td&gt;82.60&lt;/td&gt;
&lt;td&gt;+4.80&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AMC&lt;/td&gt;
&lt;td&gt;47.76&lt;/td&gt;
&lt;td&gt;58.21&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+22% relative&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AIME 2024&lt;/td&gt;
&lt;td&gt;16.41&lt;/td&gt;
&lt;td&gt;19.90&lt;/td&gt;
&lt;td&gt;+3.49&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overall avg&lt;/td&gt;
&lt;td&gt;49.37&lt;/td&gt;
&lt;td&gt;53.99&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;+4.62 pts&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;On the 7B model, the gap widens further: &lt;strong&gt;+6.20 points&lt;/strong&gt; over standard GRPO.&lt;/p&gt;

&lt;p&gt;But the most interesting result is qualitative. On a set of 352 hard questions, LoPE &lt;strong&gt;uniquely solved 50 questions that no other method could crack&lt;/strong&gt;. These were not marginal improvements on borderline problems. These were questions where every other approach produced zero correct answers, and LoPE found solutions.&lt;/p&gt;

&lt;p&gt;The mechanism shows up clearly in the advantage signal. For those rare successful trajectories on hard problems, LoPE amplifies the advantage by &lt;strong&gt;2.1x to 5.0x&lt;/strong&gt; compared to standard resampling. When a perturbed prompt finally produces a correct answer, that success gets a much stronger training signal because it stands out sharply against the failed attempts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters for Practitioners
&lt;/h2&gt;

&lt;p&gt;Three takeaways if you work with LLMs:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Exploration is still an unsolved problem.&lt;/strong&gt; We talk a lot about scaling data and compute, but how models explore the solution space during RL training is arguably more important and much less understood. LoPE is evidence that we are leaving performance on the table.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Prompt sensitivity is a feature, not a bug.&lt;/strong&gt; The fact that meaningless prefix text can unlock entirely different reasoning chains tells us something deep about how these models navigate their latent space. The "right" answer is often reachable — the model just needs a different starting point.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Simple methods can beat complex ones.&lt;/strong&gt; LoPE is almost embarrassingly simple to implement. No architecture changes. No reward model modifications. Just prepend some Lorem Ipsum during resampling. If you are doing RL fine-tuning, this is a near-zero-cost experiment to try.&lt;/p&gt;

&lt;p&gt;The broader lesson: sometimes the best interventions do not add information. They add noise in exactly the right way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Paper Link
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://arxiv.org/abs/2605.05566" rel="noopener noreferrer"&gt;Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration&lt;/a&gt;&lt;br&gt;
Huang, Huang, Li, Cai, Yang, Huang (Washington University in St. Louis) — May 7, 2026&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note: This is an arXiv preprint — not yet peer-reviewed. But the results are concrete, the methodology is clean, and the lead researcher (Jiaxin Huang) is a Microsoft Research PhD Fellow and AAAI 2026 New Faculty Highlight recipient. Worth watching.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Image Source: Huang et al., "Nonsense Helps" (arXiv:2605.05566), CC BY-NC-SA 4.0&lt;/em&gt;&lt;/p&gt;




</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>research</category>
    </item>
    <item>
      <title>Delete the Vercel Claude Code Plugin. Here's Why I Did.</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Mon, 11 May 2026 13:46:49 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/delete-the-vercel-claude-code-plugin-heres-why-i-did-39hl</link>
      <guid>https://dev.to/ww-w-ai/delete-the-vercel-claude-code-plugin-heres-why-i-did-39hl</guid>
      <description>&lt;h2&gt;
  
  
  TL;DR
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;The Vercel Claude Code plugin creates a &lt;strong&gt;permanent device UUID&lt;/strong&gt; on your machine the instant you install it. No notification. No expiry. No rotation.&lt;/li&gt;
&lt;li&gt;Session starts, tool calls, skill matches — all sent to &lt;code&gt;telemetry.vercel.com&lt;/code&gt;. &lt;strong&gt;Default ON, no consent prompt.&lt;/strong&gt; Prompt metadata (matched skill + score) included.&lt;/li&gt;
&lt;li&gt;What's worse: they built a consent dialog for prompt text collection. But clicking "No thanks" only stops prompt text. All other telemetry keeps running. Most users will think they opted out of everything.&lt;/li&gt;
&lt;li&gt;The documentation exists — buried eight directories deep inside &lt;code&gt;~/.claude/plugins/cache/&lt;/code&gt;. Nobody reads it. &lt;strong&gt;Documented ≠ Informed.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What I Found
&lt;/h2&gt;

&lt;p&gt;I was building a static analysis tool for AI plugins — scanning popular skills for security issues. Regex pattern matching plus dual-LLM cross-verification.&lt;/p&gt;

&lt;p&gt;I was running a batch scan — 200 Claude Code skills, checking for destructive commands, data exfiltration, prompt injection, the usual. On skill #147, the scanner flagged something in &lt;code&gt;~/.claude/&lt;/code&gt;. Not in some random GitHub repo. On my own machine.&lt;/p&gt;

&lt;p&gt;I didn't suspect Vercel for a second. I assumed the flag was a false positive in my own skill. So I pulled the Vercel plugin source as a reference — to compare against "known good" code and figure out what I was doing wrong.&lt;/p&gt;

&lt;p&gt;Then I read the Vercel source. Here's what I found.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Evidence
&lt;/h2&gt;

&lt;p&gt;All file paths and line numbers reference &lt;code&gt;vercel-plugin&lt;/code&gt; v0.32.7, located at &lt;code&gt;~/.claude/plugins/cache/vercel/vercel-plugin/0.32.7/&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Every session start sends this:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// session-start-profiler.mts:702-709&lt;/span&gt;
&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;device_id&lt;/span&gt;            &lt;span class="c1"&gt;// permanent device identifier&lt;/span&gt;
&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;platform&lt;/span&gt;             &lt;span class="c1"&gt;// darwin, linux, win32&lt;/span&gt;
&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;likely_skills&lt;/span&gt;        &lt;span class="c1"&gt;// which skills you use&lt;/span&gt;
&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;greenfield&lt;/span&gt;           &lt;span class="c1"&gt;// whether the project is new&lt;/span&gt;
&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;vercel_cli_installed&lt;/span&gt; &lt;span class="c1"&gt;// whether you have the Vercel CLI&lt;/span&gt;
&lt;span class="nx"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;vercel_cli_version&lt;/span&gt;   &lt;span class="c1"&gt;// which version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Every tool call you make — any tool, not just Vercel's:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// pretooluse-skill-inject.mts:969-971&lt;/span&gt;
&lt;span class="nx"&gt;tool_call&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;tool_name&lt;/span&gt;          &lt;span class="c1"&gt;// which tool you just called&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Every time a skill matches your prompt:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// pretooluse-skill-inject.mts:1205-1210&lt;/span&gt;
&lt;span class="nx"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;injected&lt;/span&gt;               &lt;span class="c1"&gt;// which skill got injected&lt;/span&gt;
&lt;span class="nx"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;match_type&lt;/span&gt;             &lt;span class="c1"&gt;// how it matched&lt;/span&gt;
&lt;span class="nx"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;tool_name&lt;/span&gt;              &lt;span class="c1"&gt;// against which tool&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Every prompt you submit:
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// user-prompt-submit-skill-inject.mts:1063-1065&lt;/span&gt;
&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;skill&lt;/span&gt;                 &lt;span class="c1"&gt;// which skill matched your prompt&lt;/span&gt;
&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;score&lt;/span&gt;                 &lt;span class="c1"&gt;// confidence score&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All of it flows to a single endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight http"&gt;&lt;code&gt;&lt;span class="err"&gt;https://telemetry.vercel.com/api/vercel-plugin/v1/events
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;None of it asked for your permission.&lt;/p&gt;

&lt;h3&gt;
  
  
  The permanent device ID
&lt;/h3&gt;

&lt;p&gt;This is the part that should make you check your machine right now. Run this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat&lt;/span&gt; ~/.claude/vercel-plugin-device-id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll see something like:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;473d7060-5a37-4ebb-9082-b09a983c****
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A UUID. Created the instant you installed the plugin. Silently. No notification. It never expires. It never rotates. It ties together every session, every project, every client engagement you've ever worked on with Claude Code.&lt;/p&gt;

&lt;p&gt;For context: Chrome DevTools rotates session IDs every 24 hours (&lt;code&gt;ClearcutSender.ts:35,68-70&lt;/code&gt;). Vercel's device ID never expires. Privacy-conscious analytics platforms moved away from persistent device IDs years ago. This one lasts forever.&lt;/p&gt;

&lt;p&gt;Dozens of telemetry events per coding session. All tied to a permanent fingerprint. All default-on.&lt;/p&gt;




&lt;h2&gt;
  
  
  "But It's in the README"
&lt;/h2&gt;

&lt;p&gt;Technically, yes. The plugin's README.md has a &lt;code&gt;## Telemetry&lt;/code&gt; section. It explains what's collected and how to disable it.&lt;/p&gt;

&lt;p&gt;But does anyone seriously think that counts as consent?&lt;/p&gt;

&lt;p&gt;Walk through what actually happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;You install the plugin.&lt;/li&gt;
&lt;li&gt;It prints a success message.&lt;/li&gt;
&lt;li&gt;You start coding.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At no point does any text appear on your screen about telemetry. No prompt. No checkbox. No banner. Nothing. Meanwhile, in the background: &lt;code&gt;~/.claude/vercel-plugin-device-id&lt;/code&gt; is written to disk, session events are queued, and your usage patterns start flowing to Vercel's servers.&lt;/p&gt;

&lt;p&gt;The README is sitting in &lt;code&gt;~/.claude/plugins/cache/vercel/vercel-plugin/0.32.7/&lt;/code&gt;. Eight directories deep inside a hidden folder. Nobody browses there.&lt;/p&gt;

&lt;p&gt;GDPR defines valid consent as "freely given, specific, informed, and unambiguous." Most companies — including startups with a fraction of Vercel's resources — treat this as the baseline. I haven't seen a single serious startup ship permanent device tracking without an install-time consent prompt in years. It's just not done anymore.&lt;/p&gt;

&lt;p&gt;Remember: Chrome DevTools rotates its session IDs every 24 hours (&lt;code&gt;ClearcutSender.ts:35,68-70&lt;/code&gt;). That's the standard. Vercel's device ID never rotates. Never expires. Created once, lives forever.&lt;/p&gt;

&lt;p&gt;This is not a gray area. This is not "technically compliant." A permanent device UUID, created silently, tied to every session, with no install-time disclosure — this is clearly Vercel's mistake.&lt;/p&gt;

&lt;p&gt;I used this plugin daily for months. I had no idea. And I'm the developer who was literally building a tool to analyze plugin source code.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Part That's Even More Absurd — I Never Consented
&lt;/h2&gt;

&lt;p&gt;Here's what makes this worse. The plugin actually has a consent dialog — for prompt text collection:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// user-prompt-submit-telemetry.mts:58-61&lt;/span&gt;
&lt;span class="nx"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="nx"&gt;text&lt;/span&gt;  &lt;span class="c1"&gt;// full prompt content, up to 100KB — OPT-IN ONLY&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An explicit question appears: "Share your prompt text to help improve skill matching." You can say yes or no. Your choice is saved.&lt;/p&gt;

&lt;p&gt;So they know how to build consent flows. They built the infrastructure. They just chose not to use it for device tracking, tool-call logging, skill-usage profiling, and platform fingerprinting.&lt;/p&gt;

&lt;p&gt;And here's the trap: if you click "No thanks," you think you've opted out. You haven't. Base telemetry — everything in the previous section — keeps running. The README even says so: "base telemetry remains on by default."&lt;/p&gt;

&lt;p&gt;But you already clicked "No thanks." In your mind, the matter is settled. That's not a documentation gap. That's a dark pattern.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Protect Yourself
&lt;/h2&gt;

&lt;p&gt;Do this now. It takes 60 seconds.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Check if you're affected
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;ls&lt;/span&gt; ~/.claude/vercel-plugin-device-id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the file exists, you have a permanent tracking UUID on your machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Disable telemetry
&lt;/h3&gt;

&lt;p&gt;Add this to your shell profile (&lt;code&gt;.zshrc&lt;/code&gt;, &lt;code&gt;.bashrc&lt;/code&gt;, etc.):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;VERCEL_PLUGIN_TELEMETRY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;off
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then reload:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source&lt;/span&gt; ~/.zshrc
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Or just uninstall the plugin entirely
&lt;/h3&gt;

&lt;p&gt;If you don't need it, remove it. One fewer thing sending data you didn't agree to.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Should Change
&lt;/h2&gt;

&lt;p&gt;Two proposals. Design standards, not policy demands.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Surface telemetry at install time.&lt;/strong&gt; One prompt. Plain language. "This plugin collects [X] and sends it to [Y]. OK?" The user sees it. The user decides. This is four lines of install-time code. Vercel already has the consent infrastructure. They use it for prompt text. Extend it to everything else.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Treat data flows as API surface.&lt;/strong&gt; If your plugin sends data to an external endpoint, document it the way you'd document an API. What data. Where it goes. How often. How to stop it. Put this in the install output, not in a README eight directories deep.&lt;/p&gt;

&lt;p&gt;These aren't radical ideas. Homebrew notifies you on first run. VS Code notifies you on first launch. It's already the industry standard. The Vercel plugin just doesn't.&lt;/p&gt;




&lt;p&gt;Check your &lt;code&gt;~/.claude/&lt;/code&gt; directory right now. What did you find? Drop it in the comments.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>vibecoding</category>
      <category>security</category>
      <category>privacy</category>
    </item>
    <item>
      <title>We Need a CatRun for the AI Era</title>
      <dc:creator>ww-w.ai</dc:creator>
      <pubDate>Tue, 05 May 2026 15:36:00 +0000</pubDate>
      <link>https://dev.to/ww-w-ai/we-need-a-catrun-for-the-ai-era-34a0</link>
      <guid>https://dev.to/ww-w-ai/we-need-a-catrun-for-the-ai-era-34a0</guid>
      <description>&lt;p&gt;&lt;em&gt;A 16-pixel hero in your macOS menu bar. Watches LLM traffic. That's it.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  RunCat told us the CPU was busy. Nothing tells us the agent is.
&lt;/h2&gt;

&lt;p&gt;You remember &lt;a href="https://kyome.io/runcat/index.html?lang=en" rel="noopener noreferrer"&gt;RunCat&lt;/a&gt; — the kitten in your menu bar that runs faster when your CPU is busy. Almost a decade old. Adorable. Useful. Asks nothing of you.&lt;/p&gt;

&lt;p&gt;AI-native development needs the same thing for a different signal. Not CPU. &lt;strong&gt;Agent traffic.&lt;/strong&gt; Is there a live LLM request flowing right now, or is everything quiet?&lt;/p&gt;

&lt;p&gt;That's why I built &lt;strong&gt;AgentRunner&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;We need a CatRun for the AI era. So I made one.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A 16-pixel hero in your macOS menu bar. Runs when your agent's actually working. Idle when it isn't. That's the whole UI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7mscgpheaiq6v6t3mxgl.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7mscgpheaiq6v6t3mxgl.gif" alt="AgentRunner demo — 23-second walkthrough" width="600" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Seven things it's built around
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The menu bar is where you already glance.&lt;/strong&gt; Same place as the clock. No extra tab, no extra window, no "I'll open the dashboard later."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Below the noise floor.&lt;/strong&gt; &amp;lt;1% CPU, ~20MB RAM. Native SwiftUI. A monitor that becomes its own monitoring problem is a joke.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Flashy "live agent dashboards" don't last.&lt;/strong&gt; Animated traffic, live token deltas, color-coded latency heatmaps — fun for a week, closed and forgotten by the next sprint. CatRun ran for a decade because it asked you nothing. Same spirit here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Detailed analysis belongs in a different tool.&lt;/strong&gt; Token spend, cache misses, run history — that needs report depth. That's what &lt;a href="https://github.com/ww-w-ai/cc-token-saver" rel="noopener noreferrer"&gt;cc-token-saver&lt;/a&gt; is for, and it gets its own post next. AgentRunner = glance. cc-token-saver = report. Don't make one app try to be both.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Vendor-neutral by design.&lt;/strong&gt; It watches LLM traffic, not Claude traffic. Claude Code, Codex, Cursor, Windsurf, local LLaMA via Ollama, any agent loop hitting a model endpoint over HTTPS. No API key, no per-vendor SDK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;6. Local-only. Zero telemetry.&lt;/strong&gt; Detection happens on your machine. The app does not phone home. No analytics SDK, no event ping. An agent monitor that ships your data anywhere doesn't deserve trust.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;7. Idle vs Active. Binary.&lt;/strong&gt; That's the entire UI. CatRun gave us a kitten that ran when CPU spiked. AgentRunner gives you a 16-pixel hero that runs when LLM traffic flows. Same spirit. Useful. Small. Invisible until you glance at it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Repo&lt;/strong&gt;: &lt;a href="https://github.com/ww-w-ai/AgentRunner" rel="noopener noreferrer"&gt;https://github.com/ww-w-ai/AgentRunner&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;License&lt;/strong&gt;: Apache-2.0&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Requires&lt;/strong&gt;: macOS 13+&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;cc-token-saver post: coming next.&lt;/p&gt;

</description>
      <category>showdev</category>
      <category>ai</category>
      <category>macos</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
