<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: albe_sf</title>
    <description>The latest articles on DEV Community by albe_sf (@albertomontagnese).</description>
    <link>https://dev.to/albertomontagnese</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3928059%2F8788e7f6-c941-4959-b1cf-18686efc9034.jpg</url>
      <title>DEV Community: albe_sf</title>
      <link>https://dev.to/albertomontagnese</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/albertomontagnese"/>
    <language>en</language>
    <item>
      <title>Anthropic's Dynamic Workflows Aren't Just Another Agent Feature</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Mon, 01 Jun 2026 15:01:48 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/anthropics-dynamic-workflows-arent-just-another-agent-feature-3mj9</link>
      <guid>https://dev.to/albertomontagnese/anthropics-dynamic-workflows-arent-just-another-agent-feature-3mj9</guid>
      <description>&lt;p&gt;Anthropic just shipped Claude Opus 4.8, but the real story isn't the model number. It's a feature called Dynamic Workflows, which orchestrates hundreds of parallel subagents for large-scale projects like codebase migrations. This moves the goalposts for what a coding agent does, shifting from interactive assistance to delegated, autonomous execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  what just changed
&lt;/h2&gt;

&lt;p&gt;The latest flagship model, Claude Opus 4.8, was released with a notable capability for Claude Code called Dynamic Workflows. This feature is designed to manage complex, multi-step tasks by breaking them down and running them as parallel subagents. This is a structural departure from the typical agentic model, which tends to operate serially—it takes a prompt, acts, and waits for the next instruction.&lt;/p&gt;

&lt;p&gt;The key use case mentioned is codebase-scale work, which implies a system that can manage dependencies and context across many files and directories simultaneously. Instead of asking an agent to refactor a single file, you can theoretically define a project-level goal, and the workflow engine will orchestrate the necessary changes across the entire codebase. This suggests a higher level of abstraction where the developer acts as a system architect rather than a micromanager of prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  from copilot to orchestrator
&lt;/h2&gt;

&lt;p&gt;This changes the nature of the work. For years, AI coding tools have been positioned as copilots. They help with line-by-line suggestions, generating boilerplate, and explaining snippets. More advanced agents can tackle multi-file changes, but the interaction remains fundamentally conversational and sequential. You are still in the driver's seat for every major step.&lt;/p&gt;

&lt;p&gt;Dynamic Workflows point to a different interaction model. By allowing for the definition and parallel execution of sub-tasks, the system takes on the role of a project manager or a technical lead. The developer's job shifts from writing code to defining the architecture of the work itself. This requires a different skill: describing a complex change as a graph of dependent tasks that can be safely parallelized.&lt;/p&gt;

&lt;p&gt;This is the kind of work required for daunting tasks like framework upgrades, API deprecations, or migrating a legacy frontend to a new design system. These are projects that involve thousands of repetitive, yet context-sensitive, changes that are painful to execute manually and difficult to specify in a single prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  defining a workflow
&lt;/h2&gt;

&lt;p&gt;While the exact implementation details are not public, one can imagine a declarative format, perhaps a YAML or JSON file, that defines the stages of a large-scale refactoring. This configuration would serve as the master plan for the swarm of subagents.&lt;/p&gt;

&lt;p&gt;A migration from an old data-fetching library to a new one might be defined like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# workflow.yaml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-client-migration&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Migrate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;components&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;legacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`ApiService`&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`GraphQLClient`."&lt;/span&gt;

&lt;span class="c1"&gt;# Phase 1: Identify all call sites of the old service.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inventory-call-sites&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Scan&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;codebase&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;generate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;JSON&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;report&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;files&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;import&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`ApiService`."&lt;/span&gt;
  &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;static-analysis&lt;/span&gt;
  &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/utils/ApiService.js"&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;migration_plan.json"&lt;/span&gt;

&lt;span class="c1"&gt;# Phase 2: Refactor components in parallel.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;refactor-components&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;For&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;each&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;component&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;report,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`ApiService`&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;calls&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`GraphQLClient`&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;queries."&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parallel-map&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;migration_plan.json"&lt;/span&gt;
  &lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt; &lt;span class="c1"&gt;# Run up to 50 subagents at once&lt;/span&gt;
  &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;edit-file&lt;/span&gt;
      &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{item.file}}"&lt;/span&gt;
        &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Replace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fetching&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logic&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;here&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;GraphQLClient.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;The&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{item.equivalent_query}}."&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;run-linter&lt;/span&gt;
      &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{item.file}}"&lt;/span&gt;
        &lt;span class="na"&gt;fix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c1"&gt;# Phase 3: Run integration tests after all refactoring is complete.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;run-tests&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Execute&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;end-to-end&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;suite&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;verify&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;migration."&lt;/span&gt;
  &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refactor-components"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;run-command&lt;/span&gt;
  &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;test:e2e"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is speculative, but it illustrates the shift in thinking. The high-value work is in designing the workflow itself—defining the stages, dependencies, and the instructions for each parallel unit of work.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what
&lt;/h2&gt;

&lt;p&gt;For builders, this is a signal to start thinking about automation at a higher level of abstraction. The new frontier of agentic development may be less about crafting the perfect prompt and more about designing robust, automated workflows that can reliably execute complex engineering projects. If this paradigm holds, the most effective AI-powered developers will not be just expert coders, but expert orchestrators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/" rel="noopener noreferrer"&gt;https://www.anthropic.com/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>claude</category>
      <category>agents</category>
    </item>
    <item>
      <title>Mistral's Codestral Isn't Another Generalist Model</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Fri, 29 May 2026 15:02:25 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/mistrals-codestral-isnt-another-generalist-model-4j98</link>
      <guid>https://dev.to/albertomontagnese/mistrals-codestral-isnt-another-generalist-model-4j98</guid>
      <description>&lt;p&gt;Mistral AI has released Codestral, a 22B parameter model explicitly for code generation. This is a notable release not because it's the largest model, but because it's a specialized one. The takeaway is that the frontier is shifting from massive, general-purpose models to efficient, task-specific architectures for professional tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  what is codestral
&lt;/h2&gt;

&lt;p&gt;Codestral is an open-weight 22B model trained on a dataset covering over 80 programming languages, including Python, Java, C++, JavaScript, and more specialized ones like Swift and Fortran. Its defining feature is its focus. Unlike generalist models that handle a wide range of text-based tasks, Codestral is engineered for code-centric workflows: function completion, test generation, and filling in partial code blocks.&lt;/p&gt;

&lt;p&gt;The model is released under a “Mistral AI Non-Production License,” which makes it available for research and testing purposes. This “open-weight” approach allows developers to download and experiment with the model's parameters directly, but the licensing implies constraints on commercial production use.&lt;/p&gt;

&lt;p&gt;One of its key technical capabilities is a fill-in-the-middle (FIM) mechanism, which is critical for IDE-based code completion where latency is a primary concern. This suggests it's optimized for the kind of low-latency, high-frequency interactions common in tools like VSCode and JetBrains.&lt;/p&gt;

&lt;h2&gt;
  
  
  getting access
&lt;/h2&gt;

&lt;p&gt;There are a few ways to use Codestral. For direct integration and IDE tooling, Mistral has provided a dedicated endpoint at &lt;code&gt;codestral.mistral.ai&lt;/code&gt;. This endpoint is intended for developers integrating the model into their tools and is free during a beta period. It is also available on their standard &lt;code&gt;api.mistral.ai&lt;/code&gt; endpoint, where usage is billed per token.&lt;/p&gt;

&lt;p&gt;For local development and experimentation, you can run the model directly. It's available for download from Hugging Face and can be run using tools like Ollama. This allows for offline use and deeper integration into local development environments.&lt;/p&gt;

&lt;p&gt;Here is a basic example of how to interact with the model via the Ollama API after pulling the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# First, pull the model with Ollama&lt;/span&gt;
ollama pull codestral

&lt;span class="c"&gt;# Then, send a request to the local API&lt;/span&gt;
curl http://localhost:11434/api/chat &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "codestral",
  "messages": [
    {
      "role": "user",
      "content": "Write a Python function to calculate the Fibonacci sequence."
    }
  ]
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integrations are already available in frameworks like LlamaIndex and LangChain for building agentic applications, and in IDE extensions like Tabnine and Continue.dev.&lt;/p&gt;

&lt;h2&gt;
  
  
  why it matters for builders
&lt;/h2&gt;

&lt;p&gt;The release of a dedicated, high-performance code model from a major lab is significant. It signals a move toward a multi-model future where developers will likely route tasks to specialized systems rather than relying on a single, monolithic AI. For code generation, a model trained specifically on code and fluent in dozens of languages offers a performance and latency advantage over a generalist counterpart.&lt;/p&gt;

&lt;p&gt;The 22-billion parameter size is also an intentional choice. It is large enough to be powerful but small enough to be efficient for its target use cases, particularly code completion, where milliseconds matter. Internal evaluations cited in the announcement suggest it significantly reduces latency for autocomplete while maintaining quality.&lt;/p&gt;

&lt;p&gt;However, the non-production license is a critical detail. While it encourages experimentation and research, it means teams looking to embed this in a commercial product need to carefully evaluate the terms. This is a different path from fully open-source models and represents a hybrid strategy for commercializing foundational models.&lt;/p&gt;

&lt;p&gt;For engineers building AI-powered developer tools, Codestral is a new primitive to work with. It's a powerful, specialized engine for code tasks that can be run locally or accessed via a fast, dedicated API. The focus now shifts to how we build intelligent applications on top of these specialized models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://mistral.ai/" rel="noopener noreferrer"&gt;Mistral AI Announcement&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Anthropic's New Security Tooling is a Wake-Up Call for Agent Builders</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Wed, 27 May 2026 15:04:09 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/anthropics-new-security-tooling-is-a-wake-up-call-for-agent-builders-5gkf</link>
      <guid>https://dev.to/albertomontagnese/anthropics-new-security-tooling-is-a-wake-up-call-for-agent-builders-5gkf</guid>
      <description>&lt;p&gt;Anthropic just shipped a security guidance plugin and a self-hosted sandbox for Claude. This isn't just another incremental feature drop; it's a clear signal that the next phase of AI development is about hardening the agent stack. The takeaway is that security is moving from a manual review afterthought to a critical, automated first pass, and you should be building your systems accordingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  what just shipped
&lt;/h2&gt;

&lt;p&gt;Two new security-focused features for Claude were announced: a security guidance plugin and a self-hosted sandbox. The plugin acts as a proactive vulnerability scanner for developers as they write code. Anthropic reported using it internally and seeing a 30-40% decrease in security-related comments on pull requests, suggesting it serves as an effective lightweight first pass before a full human code review.&lt;/p&gt;

&lt;p&gt;The second component is a self-hosted sandbox, currently in public beta. This allows Claude Managed Agents to operate within a user-controlled environment, including connecting to a user's private servers. This moves agent execution from a multi-tenant cloud environment to your own infrastructure, a significant change for handling sensitive tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  why this matters for your agent stack
&lt;/h2&gt;

&lt;p&gt;For the past year, building agents has been an exercise in prompt engineering and orchestration logic. Security has often been reduced to a line in a system prompt like "You are a helpful assistant and you will not perform harmful actions." This approach is brittle and insufficient for production systems.&lt;/p&gt;

&lt;p&gt;Anthropic's move signals a necessary shift from prompt-based security to infrastructure-based security. A local, user-controlled sandbox is a fundamental primitive for running agent-generated code safely. It provides a contained environment where an agent can execute tasks, interact with files, and run code without having access to the host system or network by default. This is table stakes for any serious enterprise use case.&lt;/p&gt;

&lt;p&gt;The security plugin reframes AI-generated code. Instead of treating it as a magical, opaque output, it treats it like any other code written by a junior developer: something to be linted, scanned, and analyzed for common pitfalls before it ever gets to a human reviewer. It makes security proactive, not reactive.&lt;/p&gt;

&lt;h2&gt;
  
  
  integrating security analysis into the workflow
&lt;/h2&gt;

&lt;p&gt;Adopting this model means building security checks directly into your agent's code generation and execution loop. The goal is to catch issues before they are ever executed. While the exact implementation of Anthropic's plugin isn't public, you can imagine how it fits into a CI/CD pipeline or a local development environment.&lt;/p&gt;

&lt;p&gt;Here is a hypothetical configuration for a pre-commit hook that uses an AI security scanner on staged Python files. This is the kind of automated, low-friction check that the new tooling enables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .pre-commit-config.yaml&lt;/span&gt;
&lt;span class="na"&gt;repos&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt;   &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local&lt;/span&gt;
    &lt;span class="na"&gt;hooks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt;   &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-security-scan&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Claude Security Scanner&lt;/span&gt;
        &lt;span class="na"&gt;entry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash -c 'claude-sec-scanner --level=high --fail-on-critical --scope=diff &amp;lt;your_files&amp;gt;'&lt;/span&gt;
        &lt;span class="na"&gt;language&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;system&lt;/span&gt;
        &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
        &lt;span class="na"&gt;stages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;commit&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach automates the first pass of a security review. It doesn't replace a human expert, but it filters out the low-hanging fruit, freeing up senior engineers to focus on more complex architectural issues. The result is a faster, more secure development cycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  the sandbox is the real story
&lt;/h2&gt;

&lt;p&gt;The most significant part of this announcement is the user-controlled sandbox. For any organization working with proprietary code, customer data, or private infrastructure, allowing an external AI model to execute arbitrary code has been a non-starter. A self-hosted sandbox connected to private servers inverts the trust model. Instead of trusting the model provider's environment, you define the environment and its boundaries.&lt;/p&gt;

&lt;p&gt;This unlocks the ability to build agents that can securely perform actions on internal systems. An agent could, for example, be given sandboxed access to a staging database to run diagnostics, or permission to interact with an internal code repository to refactor code, all without that data ever leaving your control.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what
&lt;/h2&gt;

&lt;p&gt;The frontier of AI is no longer just about building larger models with higher benchmark scores. It is increasingly about building the professional-grade tooling required to ship products that use those models, safely and reliably. Anthropic is providing a clear template for how to think about agent security.&lt;/p&gt;

&lt;p&gt;As a builder, your focus should be shifting. The interesting work is less about novel agent architectures and more about the boring, critical infrastructure needed to run them in production. How do you containerize agent execution? How do you define fine-grained permissions for tool use? How do you automate security analysis for generated code? These are the problems that need to be solved to move agents from demos to deployed products, and this recent release shows one major lab is thinking the same way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.securityweek.com/anthropic-releases-new-claude-sandbox-security-guidance-plugin/" rel="noopener noreferrer"&gt;Anthropic Releases New Claude Sandbox, Security Guidance Plugin - SecurityWeek&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>devtools</category>
      <category>programming</category>
    </item>
    <item>
      <title>Google's Gemini 3.5 Flash Isn't For Chat. It's For Agents.</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Mon, 25 May 2026 15:01:49 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/googles-gemini-35-flash-isnt-for-chat-its-for-agents-22d3</link>
      <guid>https://dev.to/albertomontagnese/googles-gemini-35-flash-isnt-for-chat-its-for-agents-22d3</guid>
      <description>&lt;p&gt;Google shipped Gemini 3.5 Flash on May 19, the first model in its new 3.5 series. [4] The release is not just another incremental update; it’s a deliberate shift in strategy. Google is framing this model as 'agent-first, not chatbot-first,' a clear signal that the focus is moving from conversational quality to autonomous tool-use and coding. [4]&lt;/p&gt;

&lt;h2&gt;
  
  
  what shipped
&lt;/h2&gt;

&lt;p&gt;Gemini 3.5 Flash was announced at Google I/O 2026 and, unlike many recent releases, went straight to general availability. [4, 15] It's accessible now for developers through the Gemini API and Google AI Studio, and for enterprise customers in the Gemini Enterprise Agent Platform. [15] This is the initial release from the Gemini 3.5 family, positioned as a workhorse model for developers building agentic systems. [13]&lt;/p&gt;

&lt;p&gt;The model is engineered for speed and efficiency, but Google's performance claims place it above its previous-generation Pro model. [13] This combination of speed and capability is aimed squarely at enabling complex, multi-step tasks that provide tangible utility. [13]&lt;/p&gt;

&lt;h2&gt;
  
  
  an agent-first architecture
&lt;/h2&gt;

&lt;p&gt;The most significant aspect of this release is the framing. Google's announcement emphasized the model's strengths in long-horizon tool-use and coding over traditional chat benchmarks. [4] The company claims Gemini 3.5 Flash outperforms Gemini 3.1 Pro on key benchmarks for agentic and coding tasks, including a 76.2% score on Terminal-Bench 2.1. [13]&lt;/p&gt;

&lt;p&gt;This focus matters because it reflects the broader industry's maturation from chatbots to agents. The engineering challenge is no longer just about generating fluent text, but about building systems that can plan, execute, and self-correct over a series of actions. Google is explicitly designing and marketing this model for that purpose. It's part of a larger ecosystem push that includes tools like the Managed Agents API, which provides secure, Google-hosted environments for running custom agents. [13]&lt;/p&gt;

&lt;h2&gt;
  
  
  pricing for value, not volume
&lt;/h2&gt;

&lt;p&gt;While the 'Flash' branding implies speed and low cost, the pricing tells a different story. At $1.50 per million input tokens and $9.00 per million output tokens, Gemini 3.5 Flash is significantly more expensive than previous Flash models like 3.1 Flash-Lite. [15] This price point is closer to the Gemini 3.1 Pro tier. [15]&lt;/p&gt;

&lt;p&gt;This suggests Google is not competing for the cheapest possible text generation. Instead, it is pricing the model based on the value of the agentic tasks it can perform. For developers, this means 3.5 Flash is likely not the right choice for high-volume, low-complexity chat applications. It is intended for higher-value workflows where its advanced reasoning and coding capabilities can justify the cost.&lt;/p&gt;

&lt;p&gt;Here is a simple configuration for accessing the model via the API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.generativeai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="c1"&gt;# Configure with your API key
&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Set up the model
&lt;/span&gt;&lt;span class="n"&gt;generation_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;65536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;generation_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;generation_config&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Start a chat session
&lt;/span&gt;&lt;span class="n"&gt;convo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;

&lt;span class="n"&gt;convo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your agentic prompt here...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;convo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  the so-what for builders
&lt;/h2&gt;

&lt;p&gt;Gemini 3.5 Flash is a clear statement of direction from Google. The future of its AI platform is centered on agents that can automate complex work. For engineers and builders, this means the tools and models are now being explicitly optimized for these more sophisticated use cases.&lt;/p&gt;

&lt;p&gt;The release of Gemini 3.5 Flash isn't just another model to evaluate. It's a signal to start thinking about your own product roadmaps in terms of agentic workflows. The core infrastructure to support these systems is coming online, and the models are being built specifically to power them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/innovations-from-google-io-26-on-google-cloud" rel="noopener noreferrer"&gt;Innovations from Google I/O 26 on Google Cloud&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://codersera.com/blog/ai-models-released-in-may-2026-complete-roundup/" rel="noopener noreferrer"&gt;AI Models Released in May 2026: Complete Roundup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Google just commoditized the agent stack with a single API call</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Fri, 22 May 2026 15:03:03 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/google-just-commoditized-the-agent-stack-with-a-single-api-call-3092</link>
      <guid>https://dev.to/albertomontagnese/google-just-commoditized-the-agent-stack-with-a-single-api-call-3092</guid>
      <description>&lt;p&gt;Google's release of Managed Agents in the Gemini API is the signal to pay attention to this week. It packages the messy, stateful, and insecure parts of building agents into a single API endpoint, backed by a new, cost-effective frontier model, Gemini 3.5 Flash. The takeaway is that the infrastructure for running autonomous agents in secure, isolated environments is now a utility.&lt;/p&gt;

&lt;h2&gt;
  
  
  what actually shipped
&lt;/h2&gt;

&lt;p&gt;On May 19, 2026, Google released two things that matter for builders: Gemini 3.5 Flash and the public preview of Managed Agents for the Gemini API. Gemini 3.5 Flash is positioned as a model optimized for performance on agentic and coding tasks. It's the engine.&lt;/p&gt;

&lt;p&gt;The more significant release is Managed Agents. This is the platform. It gives developers the ability to build and deploy autonomous, stateful agents that run in secure, Google-hosted Linux sandbox environments. Instead of managing your own infrastructure for code execution and state, you can now spin up an agent via an API call. The first available general-purpose agent is &lt;code&gt;antigravity-preview-05-2026&lt;/code&gt;, which can plan, reason, write and execute code, manage files, and browse the web inside its container.&lt;/p&gt;

&lt;h2&gt;
  
  
  from a local cli to a server-side platform
&lt;/h2&gt;

&lt;p&gt;This release coincides with a strategic shift. Google is transitioning its popular &lt;code&gt;Gemini CLI&lt;/code&gt; to a new &lt;code&gt;Antigravity CLI&lt;/code&gt;. This isn't just a rename. It reflects a move from a local terminal utility to a client for a unified, server-side agent platform. The new CLI is built in Go for better performance and supports asynchronous workflows, letting you orchestrate multiple agents on complex tasks without locking your terminal.&lt;/p&gt;

&lt;p&gt;This transition acknowledges that real agentic work involves multiple agents and shared context, which outgrew the initial CLI's scope. By unifying the backend into the Antigravity platform, improvements to the core agent harness are automatically available to the CLI, the desktop app, and the API. For developers, this means the agent you prototype in the terminal shares the same foundation as the one you deploy to production.&lt;/p&gt;

&lt;h2&gt;
  
  
  how you will use this
&lt;/h2&gt;

&lt;p&gt;Building with this new API means focusing less on the infrastructure of agent execution. You are no longer primarily responsible for the security of running model-generated code or persisting state between long-running tasks. You define the task and the tools, and the managed agent handles the execution loop within its sandboxed environment.&lt;/p&gt;

&lt;p&gt;A request to the new Interactions API might look conceptually like this. You provide the model, the agent definition, and the user's high-level task, and the platform manages the multi-step execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;

&lt;span class="c1"&gt;# Configure the managed agent with a specific toolset and model
&lt;/span&gt;&lt;span class="n"&gt;file_processing_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ManagedAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/my-project/agents/file-processor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# The platform provides the secure code execution environment
&lt;/span&gt;    &lt;span class="n"&gt;harness&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;antigravity-preview-05-2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_reader&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_transformer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Start a stateful session to perform a multi-step task
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_processing_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# The agent plans and executes steps inside its isolated sandbox
&lt;/span&gt;&lt;span class="n"&gt;final_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze the quarterly sales data in /uploads, identify the top three regions, and generate a PDF summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sales_summary.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key change is moving from a request-response loop that you manage to a persistent, stateful agent that you task. The &lt;code&gt;antigravity-preview-05-2026&lt;/code&gt; agent harness provides the core capabilities of file management, web browsing, and code execution out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what for builders
&lt;/h2&gt;

&lt;p&gt;The move toward managed, server-side agents is a significant abstraction layer. For the last year, building a truly autonomous agent meant wrestling with Docker containers, file system permissions, and state management. Google is now offering to handle that plumbing. This lowers the barrier to entry for shipping sophisticated agentic workflows.&lt;/p&gt;

&lt;p&gt;This doesn't eliminate the hard problems of agentic reasoning and reliability. But it does commoditize the execution environment, letting you focus on the agent's actual logic and purpose. It's a platform bet that the future of AI development is less about prompting a model and more about directing a stateful worker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/" rel="noopener noreferrer"&gt;Release notes | Gemini API - Google AI for Developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.googleblog.com/2026/05/an-important-update-transitioning-gemini-cli-to-antigravity-cli.html" rel="noopener noreferrer"&gt;An important update: Transitioning Gemini CLI to Antigravity CLI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Google just shifted the agent workflow from the cloud to the desktop</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Wed, 20 May 2026 15:02:17 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/google-just-shifted-the-agent-workflow-from-the-cloud-to-the-desktop-39lp</link>
      <guid>https://dev.to/albertomontagnese/google-just-shifted-the-agent-workflow-from-the-cloud-to-the-desktop-39lp</guid>
      <description>&lt;p&gt;Google's latest announcements for agentic AI are more than just a new model. The release of Gemini 3.5 Flash and the Antigravity 2.0 development platform signals a shift from prompt-driven exploration to a more grounded, local-first engineering workflow for building agents. This matters because it changes the development loop from a slow, cloud-based iteration cycle to a faster, more tangible one on your own machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  what changed: a fast model and a local orchestrator
&lt;/h2&gt;

&lt;p&gt;Two main components define this shift. First is Gemini 3.5 Flash, a new model engineered for speed and efficiency in agentic workflows. It reportedly outperforms Gemini 3.1 Pro on most benchmarks while running significantly faster. This model is positioned as the high-speed engine needed for agents that must perform complex, long-horizon tasks with low latency.&lt;/p&gt;

&lt;p&gt;The second, and more significant, piece is Antigravity 2.0. This is not just an API update; it's a standalone desktop application designed to be a central hub for agent interaction and orchestration. The platform is designed for developers to take an idea and build a production-ready application. This local-first approach allows for managing multiple agents in parallel, scheduling background tasks, and integrating directly with tools like Google AI Studio, Android, and Firebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  why it matters: from prompting to engineering
&lt;/h2&gt;

&lt;p&gt;For the last couple of years, building with large models has felt like working through a keyhole. You write a prompt, send it to a remote API, and get a response. Building agents required stringing these calls together with scripts and cloud functions. It worked, but it lacked the immediacy of traditional software development.&lt;/p&gt;

&lt;p&gt;Antigravity 2.0 changes this dynamic. By providing a desktop application and a command-line interface (CLI), it treats agent development less like prompt engineering and more like systems engineering. The ability to orchestrate and deploy agents that can execute tasks in parallel from your local machine is a meaningful change. It encourages you to think about agentic systems as a collection of specialized workers, not a single monolithic model. This move from a simple request-response model to a managed, multi-agent system is where the real productivity gains will come from.&lt;/p&gt;

&lt;h2&gt;
  
  
  getting started with the new stack
&lt;/h2&gt;

&lt;p&gt;Developers can access Gemini 3.5 Flash through Google AI Studio and what Google is calling Managed Agents in the Gemini API. The managed agent approach aims to remove the friction of infrastructure setup by delivering the power of the Antigravity agent harness through the API.&lt;/p&gt;

&lt;p&gt;For local development, the Antigravity CLI provides a more direct interface. While the exact commands are still being documented, one could imagine a workflow for deploying a managed agent looking something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Fictional CLI command based on described capabilities&lt;/span&gt;

antigravity agents:deploy &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"daily-report-agent"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"gemini-3.5-flash"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--trigger&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"schedule --cron='0 9 * * *'"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--task-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"./tasks/generate_report.json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"google.workspace.sheets,google.workspace.docs"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This workflow, combining a powerful local CLI with cloud-managed execution, feels much closer to modern DevOps practices than the ad-hoc scripting that has characterized agent-building to date. Google AI Studio is also getting more integrated, with a new feature to export entire projects to Antigravity for local development and production deployment with a single click.&lt;/p&gt;

&lt;h2&gt;
  
  
  the takeaway for builders
&lt;/h2&gt;

&lt;p&gt;This year's I/O updates are a clear signal that the infrastructure for building AI agents is maturing. The focus is shifting from the raw capability of a single model to the developer experience of building, testing, and deploying robust, multi-agent systems. We're moving from an era of AI that assists you to one where agents can independently navigate complex tasks across an entire workflow.&lt;/p&gt;

&lt;p&gt;For engineers in the space, the message is clear: the tooling is catching up to the ambition. It's time to start thinking about agentic workflows not as a series of prompts, but as engineered systems that you can build and control from your own machine.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.google/" rel="noopener noreferrer"&gt;Google I/O 2026 Developer Highlights&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>agents</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Google is embedding an agent in Android. Your app is now an API.</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Mon, 18 May 2026 15:03:17 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/google-is-embedding-an-agent-in-android-your-app-is-now-an-api-28kl</link>
      <guid>https://dev.to/albertomontagnese/google-is-embedding-an-agent-in-android-your-app-is-now-an-api-28kl</guid>
      <description>&lt;p&gt;Google's pre-I/O announcements confirmed what many of us have been expecting: the next major platform shift isn't a new device, but a new layer of intelligence embedded directly into the operating system. With Gemini Intelligence, the AI is moving from a chatbot you open into an agentic layer that lives underneath Android, with the ability to operate across apps to complete tasks. This isn't just a feature update; it's the beginning of a fundamental change in how we should think about building mobile experiences.&lt;/p&gt;

&lt;h2&gt;
  
  
  from chatbot to os layer
&lt;/h2&gt;

&lt;p&gt;For the past few years, AI on mobile has been largely confined to specific apps or voice assistants. You open a chat window, you type a query, you get a response. Gemini Intelligence is designed to break that model. It's an underlying service intended to understand the context on your screen and execute multi-step, autonomous actions without you needing to switch between applications.&lt;/p&gt;

&lt;p&gt;The ambition is to move from reactive assistance to proactive task completion. The demos describe workflows like the system finding a class syllabus in an email, extracting the required textbook titles, and then adding them to a shopping cart—a sequence that currently requires manual context switching and user input across multiple UIs. This implies a system where the OS itself becomes the primary user, and our apps become tools it can wield.&lt;/p&gt;

&lt;h2&gt;
  
  
  building for an agent
&lt;/h2&gt;

&lt;p&gt;This shift has direct implications for developers. If the OS can operate your app on a user's behalf, your app needs to expose its capabilities in a machine-readable way. The traditional GUI is no longer the only interface that matters. You now have to design an API for an AI agent.&lt;/p&gt;

&lt;p&gt;Features like "Create My Widget," where a user describes a widget in natural language and Gemini generates it by pulling data from different services, signal this new direction. It suggests a future where apps declare their capabilities, intents, and data sources to the OS. While the exact implementation details are not public, one could imagine a manifest or configuration file where you define your app's agent-callable functions.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ai.google.com/gemini-intelligence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"app_capabilities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"intent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"com.example.shop.ADD_TO_CART"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"entities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"productName"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"productID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"quantity"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Adds a specified product to the user's shopping cart."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"intent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"com.example.docs.FIND_DOCUMENT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"entities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"keyword"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"creationDate"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Finds a document based on keywords or creation date."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"intent"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"com.example.music.PLAY_PLAYLIST"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"entities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"playlistName"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Starts playback of a named playlist."&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't about just handling intents as we do today. It's about exposing deeper, multi-step functionality that an autonomous agent can chain together with capabilities from other applications to fulfill a high-level user goal.&lt;/p&gt;

&lt;h2&gt;
  
  
  the platform reset is here
&lt;/h2&gt;

&lt;p&gt;This move doesn't happen in a vacuum. It's a direct response to the broader industry's pivot towards agentic AI. By integrating these capabilities at the OS level, Google is positioning Android as a platform for agents, not just apps. This extends beyond the phone; the announcement of Android XR glasses powered by Gemini 2.5 Pro shows the ambition is for this intelligence layer to be present across different form factors.&lt;/p&gt;

&lt;p&gt;For builders, the takeaway is clear. The era of designing self-contained app experiences is giving way to a new model. We need to start thinking about our apps as a collection of services that can be discovered and orchestrated by a higher-level agent. The apps that thrive will be the ones that expose their functionality most effectively to this new intelligence layer. This is a platform reset, and it's time to start planning for it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://thenextweb.com/" rel="noopener noreferrer"&gt;Google I/O 2026: Gemini Intelligence, Googlebooks, Android XR glasses, and what to expect from the keynote&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://beebom.com/what-to-expect-google-io-2024-keynote/" rel="noopener noreferrer"&gt;What We're Expecting from Google I/O 2026 Keynote on May 19&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://towardsai.net/p/machine-learning/google-io-2024-everything-google-is-about-to-announce-on-may-14" rel="noopener noreferrer"&gt;Google I/O 2026: Everything Google Is About to Announce on May 19&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Google's I/O 2024 announcements just reset the AI developer stack</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Thu, 14 May 2026 06:56:29 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/googles-io-2024-announcements-just-reset-the-ai-developer-stack-51id</link>
      <guid>https://dev.to/albertomontagnese/googles-io-2024-announcements-just-reset-the-ai-developer-stack-51id</guid>
      <description>&lt;p&gt;Google's I/O 2024 developer keynote just laid out a new, more powerful, and integrated stack for building AI products. The key takeaway isn't just one model or tool, but a cohesive set of components—from a frontier model with a massive context window to a production-ready open source model and a backend framework to wire it all together. For builders, this means it's time to re-evaluate your stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  a 2m token context window changes the game
&lt;/h2&gt;

&lt;p&gt;The headline feature for many will be Gemini 1.5 Pro entering public preview with a 2 million token context window. This isn't an incremental update. A context window of this size allows an application to reason over entire codebases, multiple large documents, or long videos in a single pass. This fundamentally changes the architecture for context-aware applications, potentially simplifying or even replacing complex retrieval-augmented generation (RAG) pipelines that shuttle context in and out of a smaller window.&lt;/p&gt;

&lt;p&gt;For high-frequency or latency-sensitive tasks where the full context isn't needed, Google also introduced Gemini 1.5 Flash, a lighter-weight variant optimized for speed and efficiency. The combination provides two distinct options for developers: a massive-context model for deep, complex reasoning and a faster model for more common, high-volume tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  open source gets a real contender with gemma 2
&lt;/h2&gt;

&lt;p&gt;On the open-source front, the release of Gemma 2 is a significant development. The new family includes 2B, 9B, and 27B parameter models. The 27-billion parameter variant is particularly notable, delivering performance that surpasses models more than twice its size. This makes it a compelling choice for teams that want to self-host or fine-tune a powerful model without the infrastructure overhead of much larger models.&lt;/p&gt;

&lt;p&gt;Gemma 2 introduces a new architecture designed for performance and efficiency, using Grouped Query Attention (GQA) for faster inference. For developers building specialized applications, the ability to fine-tune a capable open model like Gemma 2 on proprietary data is a critical advantage.&lt;/p&gt;

&lt;h2&gt;
  
  
  firebase genkit: a new backend for your ai stack
&lt;/h2&gt;

&lt;p&gt;Perhaps the most practical announcement for day-to-day builders is Firebase Genkit, a new open-source framework for building AI-powered features in Node.js backends (with Go support coming soon). Genkit provides the plumbing to orchestrate multi-step AI workflows, manage prompts, call models, and integrate with services like vector databases.&lt;/p&gt;

&lt;p&gt;It's designed to be model-agnostic, with integrations for Gemini, open-source models via Ollama, and vector stores like Pinecone and Chroma. This addresses a common pain point for developers: the significant amount of boilerplate code required to build production-ready AI features. Genkit also includes a local developer UI for testing, debugging, and inspecting execution traces.&lt;/p&gt;

&lt;p&gt;Here's what a simple flow might look like in Genkit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;configureGenkit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;defineFlow&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;genkit&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@genkit-ai/core&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;googleAI&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;genkitx-googleai&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nf"&gt;configureGenkit&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nf"&gt;googleAI&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;logLevel&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;debug&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="na"&gt;enableTracingAndMetrics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;menuSuggestionFlow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defineFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
  &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;menuSuggestionFlow&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="na"&gt;inputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;dish&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="na"&gt;outputSchema&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="na"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;string&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}),&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="k"&gt;async &lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt; &lt;span class="nx"&gt;dish&lt;/span&gt; &lt;span class="p"&gt;})&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;llmResponse&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nx"&gt;genkit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;ai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;gemini-1.5-pro-latest&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Suggest a creative and appealing menu description for a dish called: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;dish&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;text&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;});&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;suggestion&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;llmResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  the so-what for builders
&lt;/h2&gt;

&lt;p&gt;The announcements from Google I/O provide a more complete and accessible AI stack. You now have a top-tier proprietary model with a uniquely large context window, a competitive open-source model for custom deployments, and a dedicated backend framework to manage the complexity of building and deploying AI features. This combination lowers the barrier to entry for creating sophisticated, context-aware applications and provides the tooling to do it in a structured, production-ready way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://blog.google/" rel="noopener noreferrer"&gt;100 things we announced at I/O 2024&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://firebase.blog/posts/2024/05/introducing-firebase-genkit" rel="noopener noreferrer"&gt;Introducing Firebase Genkit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Your AI Agents Are Probably Accessing Data They Shouldn't</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Thu, 14 May 2026 06:44:25 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/your-ai-agents-are-probably-accessing-data-they-shouldnt-463</link>
      <guid>https://dev.to/albertomontagnese/your-ai-agents-are-probably-accessing-data-they-shouldnt-463</guid>
      <description>&lt;p&gt;A new report on AI agent security confirms what many of us in the trenches have suspected: we are shipping agents with credentials and permissions that are fundamentally insecure. According to a global study, two-thirds of organizations using AI agents believe they have already accessed data beyond their intended scope. The core takeaway is that the identity and access management patterns we built for humans are failing for autonomous, millisecond-speed agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  the detection-to-execution speed mismatch
&lt;/h2&gt;

&lt;p&gt;The fundamental problem is a mismatch of timescales. The study found that it takes organizations an average of 14 hours to detect a compromised AI agent. An agent, however, operates in milliseconds. That massive gap between machine execution speed and human detection speed creates a critical window of vulnerability. A misconfigured or compromised agent can move laterally across multiple core systems using valid credentials long before a human security team even receives an alert.&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical risk. The same report indicates that 61% of organizations have had to revoke or rotate AI agent credentials due to a suspected exposure. The issue isn't that agents are 'breaking in' through novel exploits; they are being given keys to the front door. The problem is one of authorized access that isn't, and cannot be, governed effectively on a human timeline.&lt;/p&gt;

&lt;h2&gt;
  
  
  static credentials are a ticking time bomb
&lt;/h2&gt;

&lt;p&gt;The root of this vulnerability lies in our continued reliance on static, long-lived credentials. We're treating agents like we treat a monolithic application server from 2015, handing them an API key that lives for months or years and often has broad permissions. More than four out of five organizations surveyed stated that a single compromised credential could impact multiple major systems.&lt;/p&gt;

&lt;p&gt;This pattern is familiar to any of us who have shipped a system under pressure. You create a service account, generate a key, and embed it in a configuration file or environment variable. It looks something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"production"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"billing_api_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sk_live_a1b2c3d4e5f6..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"storage_service_account_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"{&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;type&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;: &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;service_account&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s2"&gt;, ...}"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"staging"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"database_url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"billing_api_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"sk_test_..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"storage_service_account_key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a liability. That &lt;code&gt;billing_api_key&lt;/code&gt; is a persistent secret. If the agent's host environment is compromised, or if the agent itself has a flaw that allows it to leak its own environment, that key is now active in the wild until someone manually revokes it. Given the 14-hour average detection time, the potential damage is significant.&lt;/p&gt;

&lt;h2&gt;
  
  
  towards ephemeral, just-in-time identity
&lt;/h2&gt;

&lt;p&gt;The report points towards a different model: ephemeral identity. Instead of issuing long-lived keys, agents should be granted credentials that are created just-in-time for a specific task and expire immediately afterward. This approach treats identity not as a static property but as a temporary, dynamically-scoped state.&lt;/p&gt;

&lt;p&gt;Implementing this isn't trivial. It requires an infrastructure that can continuously govern agents at runtime, creating and destroying credentials on demand based on the immediate context of the agent's task. But it's the only model that closes the speed gap between machine action and human oversight. If a credential only lives for 500 milliseconds, the window for misuse shrinks dramatically.&lt;/p&gt;

&lt;p&gt;As builders, we are moving from shipping code to shipping agents. These agents are not just tools; they are autonomous workers integrated into our core business systems. The study's finding that companies are already spending over $1 million on average to manage AI agent security issues shows the financial cost of getting this wrong. We need to stop handing them the equivalent of a master keycard and start building systems that grant access with the precision and speed that these new workers require.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.prnewswire.com/" rel="noopener noreferrer"&gt;The 2026 State of AI Agent Identity Security&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>devtools</category>
      <category>security</category>
    </item>
    <item>
      <title>GitHub's New Certification Is a Spec For the Modern AI Engineer</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Thu, 14 May 2026 06:15:19 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/githubs-new-certification-is-a-spec-for-the-modern-ai-engineer-25ib</link>
      <guid>https://dev.to/albertomontagnese/githubs-new-certification-is-a-spec-for-the-modern-ai-engineer-25ib</guid>
      <description>&lt;p&gt;GitHub just quietly released a new role-based certification, and it's one of the highest-signal documents I've seen for where our jobs are headed. The 'GitHub Certified: Agentic AI Developer' exam is a spec sheet for the skills required to build and ship AI agents in production. It confirms the shift we've all felt: moving from prompt-level hacking to designing, supervising, and operating complex, stateful systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  from prompt engineering to system integration
&lt;/h2&gt;

&lt;p&gt;The skills listed for the new GH-600 exam are not about crafting the perfect prompt. They are about system-level concerns. The exam covers how to "configure tools, permissions, and environments for agents." This is the language of infrastructure and operations, not just conversational design. It signals that the core work is no longer just coaxing a model to produce a good output, but integrating it safely and reliably into a larger software development lifecycle.&lt;/p&gt;

&lt;p&gt;Building a real agent requires you to think about its environment. What tools can it call? What are its permissions? Can it write to the file system? Does it have network access? These aren't model problems; they are application security and architecture problems. The certification's focus here tells you that building a secure, contained environment for your agent is now a baseline competency.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Example: Running an agent in a constrained environment&lt;/span&gt;
&lt;span class="c"&gt;# This isn't from the certification, but illustrates the principle.&lt;/span&gt;

podman run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;-it&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--security-opt&lt;/span&gt; no-new-privileges &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--cap-drop&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ALL &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;none &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-v&lt;/span&gt; ./agent-workspace:/app/workspace:Z &lt;span class="se"&gt;\&lt;/span&gt;
  my-agent-image:latest &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"Analyze the data in /app/workspace/input.csv and write a report to /app/workspace/output.md"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Treating agents as tier-one applications that require consistent, governed environments is the new standard. This is a world away from tweaking a prompt in a playground.&lt;/p&gt;

&lt;h2&gt;
  
  
  managing state and long-running execution
&lt;/h2&gt;

&lt;p&gt;Another key domain in the certification is the ability to "manage memory, state, and long-running execution." This is the single biggest differentiator between a simple AI-powered feature and a true agent. Agents are not stateless functions. They have goals, they have memory of past actions, and they operate over time. This introduces a host of engineering challenges that are familiar to anyone who has built distributed systems.&lt;/p&gt;

&lt;p&gt;How does your agent persist its state? If the process dies, can it resume its work? How do you handle memory growth in a process that might run for hours or days? These are the questions that separate toy projects from production systems. The fact that GitHub is testing for this shows that the industry expects developers to have answers. You are no longer just a model user; you are the operator of a persistent, autonomous process.&lt;/p&gt;

&lt;h2&gt;
  
  
  evaluation, orchestration, and human oversight
&lt;/h2&gt;

&lt;p&gt;The final piece of the puzzle is about reliability and control. The certification requires developers to know how to "evaluate and improve agent performance," "coordinate multi-agent workflows," and "implement guardrails and human-in-the-loop systems."&lt;/p&gt;

&lt;p&gt;This is the senior-level skillset. Evaluating an agent isn't about running a benchmark once. It's about continuous monitoring and creating feedback loops for improvement. Coordinating multi-agent systems is an architecture problem, requiring you to break down complex tasks and manage communication between specialized agents. And most critically, implementing guardrails and HITL systems is an admission that these systems are not perfectly reliable. The most important skill is knowing how to design for failure and ensure a human can intervene when the agent gets lost or goes off the rails.&lt;/p&gt;

&lt;p&gt;The takeaway here is clear. The era of casual experimentation is over. The skills being codified by this certification are about building robust, observable, and controllable AI systems. It's a significant shift in what it means to be a developer in the agentic era. This exam isn't just a way to get a new badge; it's a study guide for staying relevant.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://techcommunity.microsoft.com/t5/github-community-blog/new-github-certified-agentic-ai-developer/ba-p/4134423" rel="noopener noreferrer"&gt;New GitHub Certified: Agentic AI Developer - Microsoft Community Hub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>devtools</category>
      <category>career</category>
    </item>
    <item>
      <title>Anthropic's 'Dangerous' AI and the Hard Reality of Auditing Code</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Wed, 13 May 2026 19:06:59 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/anthropics-dangerous-ai-and-the-hard-reality-of-auditing-code-2j56</link>
      <guid>https://dev.to/albertomontagnese/anthropics-dangerous-ai-and-the-hard-reality-of-auditing-code-2j56</guid>
      <description>&lt;p&gt;Anthropic's latest model, Claude Mythos, was internally deemed too 'dangerously good' at finding security vulnerabilities for a public release. But when tested against the battle-hardened &lt;code&gt;curl&lt;/code&gt; codebase, it exposed the gap between marketing hype and engineering reality, providing a critical lesson for anyone building with AI security tools. The takeaway is not that these models are useless, but that their output is a signal that still requires rigorous human verification.&lt;/p&gt;

&lt;h2&gt;
  
  
  what is claude mythos
&lt;/h2&gt;

&lt;p&gt;Anthropic announced that an internal AI model, Claude Mythos, demonstrated a powerful, emergent capability for discovering and exploiting software vulnerabilities. The capabilities were reportedly so advanced that the company restricted access, providing it only to a select group of organizations to allow them to patch critical flaws before a potential wider release. The model allegedly found thousands of high-severity vulnerabilities across major operating systems and browsers. This raised an immediate question for builders: are we on the verge of fully automated security auditing, or is this another case of over-indexing on a model's potential?&lt;/p&gt;

&lt;h2&gt;
  
  
  the curl test case
&lt;/h2&gt;

&lt;p&gt;The answer came from a real-world test. Daniel Stenberg, creator of &lt;code&gt;curl&lt;/code&gt;, was granted indirect access to a Mythos analysis of his project's 176,000 lines of C code. The model returned five 'confirmed security vulnerabilities'.&lt;/p&gt;

&lt;p&gt;The result after human review was less dramatic. Of the five findings, four were false positives. One was a legitimate, low-severity bug. This outcome on a mature, heavily scrutinized project like &lt;code&gt;curl&lt;/code&gt; is telling. It suggests that while AI can parse massive codebases and identify potential issues at scale, its signal-to-noise ratio is a critical variable. An AI's declaration of a 'confirmed' vulnerability is not the end of an investigation; it is the start.&lt;/p&gt;

&lt;h2&gt;
  
  
  ai output is a signal, not a verdict
&lt;/h2&gt;

&lt;p&gt;For engineers integrating AI into security pipelines, this is the core lesson. These models are powerful pattern-matchers, but they lack the true context and world model of a seasoned security researcher. They will flag code that looks like a known vulnerability pattern, even when idiomatic usage or surrounding logic renders it harmless. A report from a model like Mythos is not a finished list of CVEs. It's a prioritized list of areas for human experts to investigate.&lt;/p&gt;

&lt;p&gt;Your internal tooling and workflow must reflect this. When an AI flags a potential issue, the process should treat it as an assertion to be validated, not a fact to be remediated. Imagine an automated report from a similar tool:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"vulnerability_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"AI-GEN-004-RCE"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"file_path"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"/src/app/utils/parser.c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"line_number"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;242&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"severity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Critical"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"cwe"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CWE-120: Buffer Copy without Checking Size of Input ('Classic Buffer Overflow')"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"confidence"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"High"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"The function `parse_user_input` uses `strcpy` to copy a user-provided buffer `input_buffer` to a fixed-size local variable `dest_buffer`. This is a potential buffer overflow vulnerability if the source buffer exceeds the destination size."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"recommendation"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Replace `strcpy` with `strncpy` or `snprintf` to prevent buffer overflows by specifying the maximum number of bytes to copy."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks plausible. But without a human checking if &lt;code&gt;input_buffer&lt;/code&gt; is sanitized or length-checked upstream, acting on this report alone is premature. The value is not in the AI's conclusion, but in its ability to direct limited human attention to line 242.&lt;/p&gt;

&lt;h2&gt;
  
  
  what this means for builders
&lt;/h2&gt;

&lt;p&gt;The Mythos-on-&lt;code&gt;curl&lt;/code&gt; episode is a necessary recalibration. AI will undoubtedly change security auditing, but it will not eliminate the need for human expertise. It transforms the task from finding a needle in a haystack to sorting a pile of needles and pins. For builders, the mandate is clear: build systems that leverage AI for signal generation, but design workflows that depend on human experts for verification. Do not ship a system that blindly trusts an AI's security assessment. The real danger isn't a rogue AI hacker, but an engineering team that outsources its judgment to one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/" rel="noopener noreferrer"&gt;Anthropic&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>agents</category>
      <category>programming</category>
    </item>
    <item>
      <title>Anthropic on AWS is Not What You Think</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Wed, 13 May 2026 19:01:56 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/anthropic-on-aws-is-not-what-you-think-8cm</link>
      <guid>https://dev.to/albertomontagnese/anthropic-on-aws-is-not-what-you-think-8cm</guid>
      <description>&lt;p&gt;Anthropic's release of the Claude Platform on AWS is the most significant infrastructure shift for builders since model-specific SDKs. It’s not another managed model offering via Bedrock; it’s Anthropic’s full, cutting-edge API stack deployed on AWS infrastructure, accessible through native AWS endpoints. This solves the primary enterprise adoption hurdles—security, billing, and procurement—at the source, making Claude a legitimate alternative to Azure OpenAI for serious AWS shops.&lt;/p&gt;

&lt;h2&gt;
  
  
  what actually changed
&lt;/h2&gt;

&lt;p&gt;On May 11, Anthropic announced the Claude Platform on AWS. Unlike the existing Amazon Bedrock integration, which offers specific Claude models as part of a multi-vendor catalog, this is a dedicated, Anthropic-managed environment running on AWS hardware. For builders, this means you get the best of both worlds: direct access to Anthropic's complete, up-to-the-minute feature set—including the full Messages API, the Files API, Managed Agents, and tool use—while operating within your existing AWS environment.&lt;/p&gt;

&lt;p&gt;The key differences are in the plumbing. You interact with it via native AWS endpoints. Authentication is handled by AWS IAM, not by a separate Anthropic API key you have to manage and rotate. Most importantly, billing is consolidated directly into your AWS account. This isn't a minor convenience; it's a fundamental change that removes massive organizational friction.&lt;/p&gt;

&lt;h2&gt;
  
  
  the enterprise integration tax
&lt;/h2&gt;

&lt;p&gt;For any large organization, adopting a new AI vendor is a procurement and security nightmare. It requires new contracts, new security reviews for data handling, and a separate billing pipeline that finance has to approve. While Bedrock partially solved this by putting various models under a single AWS bill, it often lags behind the native provider's API in terms of features and model availability. You get the convenience, but you sacrifice access to the latest capabilities.&lt;/p&gt;

&lt;p&gt;The new platform collapses this trade-off. A team can now use their existing AWS enterprise agreement, leverage pre-approved IAM roles and policies for access control, and have all of their Claude usage appear as a line item on their monthly AWS bill. The CISO is happy because access is governed by the same robust IAM system used for everything else. The finance department is happy because there isn't a new vendor to onboard. And you, the builder, are happy because you get direct access to the latest from Anthropic without fighting a six-month procurement battle.&lt;/p&gt;

&lt;p&gt;Here’s what invoking a model on this new platform might look like. Note that you're using an AWS SDK like &lt;code&gt;boto3&lt;/code&gt; to call an Anthropic-specific service endpoint, not the generic Bedrock one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Session&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;my-aws-profile&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;region_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;us-east-1&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Note the service name is 'anthropic', not 'bedrock-runtime'
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;anthropic&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke_model&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;modelId&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;claude-opus-4-7&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contentType&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;accept&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anthropic_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bedrock-2023-05-31&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Explain the difference between Anthropic on AWS and Claude on Bedrock.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;response_body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This looks familiar, but the &lt;code&gt;service_name&lt;/code&gt; and &lt;code&gt;modelId&lt;/code&gt; string are doing all the work, routing your request through AWS's front door to Anthropic's dedicated infrastructure.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what for builders
&lt;/h2&gt;

&lt;p&gt;This move signals a new phase in the AI platform wars. It’s no longer just about having the best model; it’s about having the most seamless enterprise deployment story. By embedding its native platform inside AWS, Anthropic is meeting enterprise clients where they are, offering a path of least resistance to adopt its latest technology. It’s a direct challenge to the tight integration of OpenAI models within the Azure ecosystem.&lt;/p&gt;

&lt;p&gt;For engineers and technical leads inside companies heavily invested in AWS, the decision of which frontier model to use just got a lot more interesting. The excuse that "it's not integrated with our cloud" is gone. The friction is gone. Now, the choice between Claude and its competitors can be based purely on capability, performance, and cost—as it should be.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://docs.anthropic.com/claude/reference/changelog" rel="noopener noreferrer"&gt;Claude API Docs - Changelog&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>claude</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
