<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: albe_sf</title>
    <description>The latest articles on DEV Community by albe_sf (@albertomontagnese).</description>
    <link>https://dev.to/albertomontagnese</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3928059%2F8788e7f6-c941-4959-b1cf-18686efc9034.jpg</url>
      <title>DEV Community: albe_sf</title>
      <link>https://dev.to/albertomontagnese</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/albertomontagnese"/>
    <language>en</language>
    <item>
      <title>Snowflake is Bringing the AI Factory to Your Data Warehouse</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Mon, 22 Jun 2026 15:02:19 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/snowflake-is-bringing-the-ai-factory-to-your-data-warehouse-1kp0</link>
      <guid>https://dev.to/albertomontagnese/snowflake-is-bringing-the-ai-factory-to-your-data-warehouse-1kp0</guid>
      <description>&lt;p&gt;The wall between the data warehouse and the AI development environment is coming down. Snowflake’s recent platform announcements aim to make your data cloud the default place to build and run enterprise AI, not just the place where your data sits.&lt;/p&gt;

&lt;h2&gt;
  
  
  AI functions directly in SQL
&lt;/h2&gt;

&lt;p&gt;The most accessible entry point into Snowflake's AI stack is the expansion of Cortex AI functions. These are SQL-callable functions that give analysts and engineers direct access to large language models from providers like OpenAI, Anthropic, and Meta from within a standard query. The key is that this happens inside Snowflake's secure perimeter, eliminating the need to move sensitive data to an external service for inference.&lt;/p&gt;

&lt;p&gt;Functions like &lt;code&gt;SENTIMENT&lt;/code&gt;, &lt;code&gt;SUMMARIZE&lt;/code&gt;, and &lt;code&gt;TRANSLATE&lt;/code&gt; handle common unstructured data tasks. For more complex needs, &lt;code&gt;AI_COMPLETE&lt;/code&gt; provides general access for reasoning and custom prompts, while &lt;code&gt;AI_EXTRACT&lt;/code&gt; can pull structured fields from documents. This approach allows teams to enrich data and automate parts of their pipelines using familiar SQL workflows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="c1"&gt;-- Find all support tickets with negative feedback&lt;/span&gt;
&lt;span class="k"&gt;SELECT&lt;/span&gt;
  &lt;span class="n"&gt;ticket_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;customer_feedback&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;SNOWFLAKE&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;CORTEX&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;SENTIMENT&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;customer_feedback&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;feedback_sentiment&lt;/span&gt;
&lt;span class="k"&gt;FROM&lt;/span&gt;
  &lt;span class="n"&gt;support_tickets&lt;/span&gt;
&lt;span class="k"&gt;WHERE&lt;/span&gt;
  &lt;span class="n"&gt;feedback_sentiment&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This isn't just about convenience. It represents a shift in operational efficiency for tasks like sentiment analysis, entity extraction, and content classification.&lt;/p&gt;

&lt;h2&gt;
  
  
  A unified dev experience
&lt;/h2&gt;

&lt;p&gt;Beyond simple SQL functions, Snowflake is building a more integrated development environment. The introduction of Snowflake Notebooks, now in public preview, provides a single interface for Python, SQL, and Markdown. This environment is natively integrated with the rest of the platform, including Snowpark ML for model development, Streamlit for building data apps, and Cortex AI for LLM access.&lt;/p&gt;

&lt;p&gt;The goal is to shorten the path from prototype to production. By combining tools for data pipelines (like Dynamic Tables and Snowpipe Streaming) with a native notebook experience, developers can build and manage both the data transformations and the AI models in one place. Over 2,900 customers are already using Dynamic Tables to manage production-grade data pipelines.&lt;/p&gt;

&lt;h2&gt;
  
  
  The move toward agents and observability
&lt;/h2&gt;

&lt;p&gt;The highest level of abstraction in the new tooling comes with Cortex Agents. These are designed to handle multi-step, autonomous workflows that can reason across enterprise data and connect with external tools. The platform also introduced Snowflake Intelligence, a natural language interface for business users to ask complex questions without writing SQL.&lt;/p&gt;

&lt;p&gt;To manage this complexity, new observability features are also part of the release. Snowflake Trail, for instance, offers telemetry and distributed tracing to give developers visibility into how code executes within the platform. This becomes critical as applications move from simple queries to multi-step agentic workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  what this means for builders
&lt;/h2&gt;

&lt;p&gt;The center of gravity for AI development is shifting. Instead of moving massive datasets to external compute, the tooling is maturing to bring the compute and the development lifecycle directly to the data. For engineers and data scientists, this means spending less time on infrastructure setup and data movement, and more time building within a governed and secure environment. It makes the data cloud a more active participant in building AI products, rather than a passive repository.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.snowflake.com/" rel="noopener noreferrer"&gt;https://www.snowflake.com/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>devtools</category>
      <category>python</category>
    </item>
    <item>
      <title>Gemma 2's Architecture: More Performance from Less Model</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Fri, 19 Jun 2026 15:02:16 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/gemma-2s-architecture-more-performance-from-less-model-3moc</link>
      <guid>https://dev.to/albertomontagnese/gemma-2s-architecture-more-performance-from-less-model-3moc</guid>
      <description>&lt;p&gt;Google's new Gemma 2 models are a strong signal for where open-source AI is heading. The 27B parameter model delivers performance competitive with models more than twice its size, and the smaller variants punch well above their weight class. This isn't just about a larger training dataset; it’s the result of specific, practical architectural changes that prioritize efficiency.&lt;/p&gt;

&lt;h2&gt;
  
  
  a hybrid attention mechanism
&lt;/h2&gt;

&lt;p&gt;The core of any transformer is the attention mechanism, but standard self-attention has a quadratic complexity that makes it a computational bottleneck. Gemma 2 addresses this by not committing to just one attention strategy. Instead, it alternates between two types in its layers: local sliding window attention and full global attention.&lt;/p&gt;

&lt;p&gt;The local attention layers use a sliding window of 4096 tokens. This allows the model to efficiently process immediate context. Interleaved with these are global attention layers that span the full 8192 token context length. This hybrid approach gives the model both the efficiency of local attention and the comprehensive context awareness of global attention, without paying the full quadratic cost at every single layer.&lt;/p&gt;

&lt;h2&gt;
  
  
  smarter inference and stability
&lt;/h2&gt;

&lt;p&gt;Beyond the hybrid attention, Gemma 2 incorporates several other known techniques to improve performance and efficiency. One of the most significant is Grouped-Query Attention (GQA). Instead of each query head having its own key and value heads, GQA allows multiple query heads to share a single key/value set. This reduces the memory bandwidth required during inference and speeds up generation. The 9B and 27B models both use GQA, while the smallest 2B model uses Multi-Query Attention (MQA), a more aggressive variant.&lt;/p&gt;

&lt;p&gt;Training for the smaller models also got a strategic update. The 2B and 9B models were trained using knowledge distillation from a larger, more capable teacher model rather than just standard next-token prediction. This allows the smaller models to learn more nuanced patterns, leading to better performance for their size. Other stability-focused changes include using a hybrid of post-normalization and pre-normalization with RMSNorm and applying logit soft-capping to prevent instability during training.&lt;/p&gt;

&lt;h2&gt;
  
  
  what this means for builders
&lt;/h2&gt;

&lt;p&gt;The practical takeaway is that state-of-the-art open models are becoming more accessible. The efficiency gains mean you can run a model like Gemma 2 27B on a single NVIDIA H100 GPU or a comparable TPU host, reducing deployment costs. The smaller models are designed to be efficient enough for on-device and consumer-grade hardware.&lt;/p&gt;

&lt;p&gt;For builders, this lowers the barrier to entry for experimenting with and deploying high-quality open models. You can get started with a powerful instruction-tuned model locally using tools like Ollama.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ollama run gemma2:27b
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This trend toward architectural efficiency means the performance floor for open models is rising quickly. We are getting more intelligence per parameter, which is a more sustainable and ultimately more useful direction than simply chasing parameter counts.&lt;/p&gt;

&lt;p&gt;The release of Gemma 2 shows that the path forward for open models isn't just about scaling up. It's about clever architectural synthesis—combining proven techniques like sliding window attention, GQA, and knowledge distillation to create models that are both powerful and practical to run. For engineers building on top of these systems, this is a welcome and important shift.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://arxiv.org/abs/2406.16854" rel="noopener noreferrer"&gt;Gemma 2: Improving Open Language Models at a Practical Size (Technical Report)&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://huggingface.co/collections/google/gemma-2-release-667d73981872114f1eeb3a15" rel="noopener noreferrer"&gt;Gemma 2 on Hugging Face&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Claude 3.5 Sonnet Isn't Just an Upgrade. It's a New Baseline.</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Wed, 17 Jun 2026 15:03:14 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/claude-35-sonnet-isnt-just-an-upgrade-its-a-new-baseline-27be</link>
      <guid>https://dev.to/albertomontagnese/claude-35-sonnet-isnt-just-an-upgrade-its-a-new-baseline-27be</guid>
      <description>&lt;p&gt;Anthropic just reset the price-to-performance curve for frontier models. The new Claude 3.5 Sonnet is not an incremental update; it delivers intelligence exceeding the previous top-tier Claude 3 Opus, but at twice the speed and a fraction of the cost. This isn't just a new model—it's a new baseline for what you should expect from a workhorse AI, especially for complex coding and agentic tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  what changed: flagship intelligence at mid-tier cost
&lt;/h2&gt;

&lt;p&gt;The key takeaway is the compression of the intelligence-speed-cost tradeoff. Claude 3.5 Sonnet outperforms Claude 3 Opus on multiple graduate-level reasoning and coding proficiency benchmarks, including GPQA and HumanEval. But it's priced at the original Sonnet's rate: $3 per million input tokens and $15 per million output tokens.&lt;/p&gt;

&lt;p&gt;For builders, the most significant metric comes from an internal agentic coding evaluation. Given a natural language description of a bug or feature, Claude 3.5 Sonnet solved 64% of the problems. Claude 3 Opus solved 38% on the same test. This isn't just a benchmark win; it's a step-change in reliability for autonomous code manipulation tasks like updating legacy applications or migrating codebases.&lt;/p&gt;

&lt;p&gt;It also operates at twice the speed of Claude 3 Opus, making it viable for more latency-sensitive applications like context-aware customer support and orchestrating multi-step workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  how to use it: api access and the new artifacts ui
&lt;/h2&gt;

&lt;p&gt;Accessing the model is straightforward. It's available through the Anthropic API, as well as on Amazon Bedrock and Google Cloud's Vertex AI. The integration is a simple model string update.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;anthropic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Anthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="c1"&gt;# defaults to os.environ.get("ANTHROPIC_API_KEY")
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20240620&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a Python script to analyze a git repository and identify the top 5 contributors based on commit count.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;More interestingly, Anthropic also launched a new feature called Artifacts on Claude.ai. When you ask the model to generate content like a code snippet, a document, or a website design, it appears in a dedicated window next to the conversation. You can see, edit, and build upon the generated content in real-time. This transforms the interaction from a simple chat to a collaborative workspace, integrating the AI's output directly into your workflow without constant copy-pasting.&lt;/p&gt;

&lt;h2&gt;
  
  
  the vision and agentic coding leap
&lt;/h2&gt;

&lt;p&gt;Beyond raw intelligence, Claude 3.5 Sonnet is now Anthropic's strongest vision model. It surpasses Opus on standard vision benchmarks, showing marked improvement in interpreting charts, graphs, and transcribing text from imperfect images. This has direct implications for applications in logistics, finance, and retail that need to extract structured data from visual inputs.&lt;/p&gt;

&lt;p&gt;The jump in agentic coding performance is the real story for many of us. The ability to independently write, edit, and execute code with sophisticated reasoning is what we've been chasing. The 64% solve rate on Anthropic's internal eval suggests a higher degree of reliability for tasks that require understanding an existing codebase, reasoning about changes, and implementing them correctly. This makes it a more viable candidate for building agents that can genuinely offload development tasks, not just generate isolated snippets.&lt;/p&gt;

&lt;h2&gt;
  
  
  so what this week
&lt;/h2&gt;

&lt;p&gt;The release of a model that is simultaneously better, faster, and cheaper than the previous flagship is a significant event. For builders, it's an immediate signal to re-evaluate your model stack. Workflows that were too expensive or slow with Opus-level models may now be practical with Claude 3.5 Sonnet. The improvements in coding and vision open up new possibilities for more complex, autonomous agents. The tradeoff curve has shifted, and your default model choice for hard problems should probably shift with it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://www.anthropic.com/news/claude-3-5-sonnet" rel="noopener noreferrer"&gt;Introducing Claude 3.5 Sonnet&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>claude</category>
      <category>machinelearning</category>
    </item>
    <item>
      <title>Claude Fable 5 on Databricks is a step-change for agentic workflows</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Wed, 10 Jun 2026 15:03:37 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/claude-fable-5-on-databricks-is-a-step-change-for-agentic-workflows-4bkj</link>
      <guid>https://dev.to/albertomontagnese/claude-fable-5-on-databricks-is-a-step-change-for-agentic-workflows-4bkj</guid>
      <description>&lt;p&gt;Anthropic's Claude Fable 5 is now generally available on Databricks, and it represents a meaningful capability jump for anyone building autonomous agents on enterprise data. This isn't just another incremental model update; it's a new class of model designed for the long-running, complex workflows that have broken previous generations of AI. The key takeaway is that we can now start delegating entire end-to-end workflows that previously required days or weeks of human effort.&lt;/p&gt;

&lt;h2&gt;
  
  
  what it is
&lt;/h2&gt;

&lt;p&gt;Claude Fable 5 is what Anthropic calls a "Mythos-class" model, built for tasks that are too complex or long-running for other models to handle. Databricks has made it available across AWS, Azure, and Google Cloud, accessible through their Unity AI Gateway. This provides a single, governable endpoint for developers to call the model, with all requests and responses logged for auditability.&lt;/p&gt;

&lt;p&gt;Databricks evaluated Fable 5 on its internal &lt;code&gt;OfficeQA Pro&lt;/code&gt; benchmark, which tests models on difficult document question-answering tasks that require file search, web search, and code execution. Fable 5 achieved 57.9% correctness, setting a new state-of-the-art and outperforming the prior flagship, Claude Opus 4.8, by over 20%.&lt;/p&gt;

&lt;h2&gt;
  
  
  what it means for builders
&lt;/h2&gt;

&lt;p&gt;The most significant change for engineers is Fable 5's reliability in delegating to parallel sub-agents. This is a critical function for building complex agentic systems that can, for example, triage production outages or perform deep analysis of a code repository's history. The model is also significantly better at interpreting dense technical images and screenshots, opening up more sophisticated document AI and multimodal workflows.&lt;/p&gt;

&lt;p&gt;However, this is a quality-first model, not an efficiency play. The performance gains come with trade-offs. Compared to Opus 4.8, Fable 5 is roughly 30% slower and generates 2.5 times more output tokens to answer the same question. This has real implications for both latency and cost. You wouldn't use this model for a simple summarization task; you bring it in for the multi-step, asynchronous jobs where correctness is the primary concern.&lt;/p&gt;

&lt;p&gt;Accessing it is straightforward if you're in the Databricks ecosystem. The Unity AI Gateway provides a standardized API, meaning you can swap models without changing your application code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"anthropic.claude-fable-5"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"messages"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Analyze the attached quarterly performance CSV, identify the top 3 variance drivers against the forecast, and generate a python script to visualize the results."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"max_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;4096&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"temperature"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The governance layer is a key part of the offering. Administrators can set fine-grained permissions on which users or services can call the model, and every transaction is logged to Unity Catalog.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what
&lt;/h2&gt;

&lt;p&gt;Claude Fable 5 on Databricks marks a clear milestone: frontier models are now capable enough to be trusted with sustained, autonomous work on high-value enterprise problems. For engineers building internal platforms, this unlocks the ability to create agents that can perform complex debugging, conduct deep research, or manage data workflows with less human supervision than ever before.&lt;/p&gt;

&lt;p&gt;The trade-offs in speed and cost are significant, but for the right class of problem, the performance jump is substantial. This release is less about a single new model and more about the maturation of the toolset for building and governing production-grade AI agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.databricks.com/blog/claude-fable-5-now-available-databricks-fully-governed-through-unity-ai-gateway" rel="noopener noreferrer"&gt;Claude Fable 5 is now available on Databricks, fully governed through Unity AI Gateway&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/news/claude-fable-5-mythos-5" rel="noopener noreferrer"&gt;Claude Fable 5 and Claude Mythos 5&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Snowflake's Arctic Model is a Bet on Enterprise-Specific AI</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Mon, 08 Jun 2026 15:03:53 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/snowflakes-arctic-model-is-a-bet-on-enterprise-specific-ai-1cl9</link>
      <guid>https://dev.to/albertomontagnese/snowflakes-arctic-model-is-a-bet-on-enterprise-specific-ai-1cl9</guid>
      <description>&lt;p&gt;A new large language model from Snowflake, named Arctic, is worth your attention this week. It’s an open-source model focused on enterprise workloads that uses a unique architecture to deliver high performance on specific tasks like SQL and code generation, all while maintaining impressive efficiency. This isn’t just another general-purpose model; its design choices have direct implications for developers building AI-powered tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  a different kind of moe
&lt;/h2&gt;

&lt;p&gt;At its core, Arctic employs a Dense-Mixture-of-Experts (MoE) hybrid transformer architecture. While MoE models are not new, Arctic’s implementation is distinct. It combines a 10B dense transformer model with a large number of 'experts', resulting in 480 billion total parameters.&lt;/p&gt;

&lt;p&gt;However, during inference, it only activates 17 billion of those parameters using a top-2 gating mechanism. This design aims for the best of both worlds: the vast knowledge capacity of a very large model, but the inference efficiency of a much smaller one. The architecture leverages 128 specialized experts, allowing for high performance with fewer active parameters compared to other models. This translates into significant cost and resource savings, a critical factor for deploying AI at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  built for enterprise tasks
&lt;/h2&gt;

&lt;p&gt;Most open-source LLMs are designed for a broad range of general tasks. Arctic is different by design, focusing specifically on enterprise-oriented needs. Its training curriculum was deliberately structured in three stages, with the latter two phases heavily emphasizing enterprise-focused skills with data for code, SQL, and STEM.&lt;/p&gt;

&lt;p&gt;This focus pays off in performance. Arctic shows strong results on benchmarks critical for developer tooling. It performs well on SQL generation (Spider), code generation (HumanEval+ and MBPP+), and instruction following (IFEval). For teams building AI co-pilots for databases or code, this specialized capability makes it a compelling alternative to more generalized models. The model is explicitly designed to be a workhorse for generating SQL queries and various types of code.&lt;/p&gt;

&lt;h2&gt;
  
  
  getting started with arctic
&lt;/h2&gt;

&lt;p&gt;Snowflake has released Arctic with an Apache 2.0 license, providing ungated access to the model weights and code for commercial use. This open approach is a significant advantage for builders who need transparency and the ability to customize.&lt;/p&gt;

&lt;p&gt;You can run the model using various popular frameworks. For instance, getting started with the &lt;code&gt;transformers&lt;/code&gt; library is straightforward. The instruct-tuned version is available directly from Hugging Face.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;transformers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;

&lt;span class="c1"&gt;# Use a pipeline as a high-level helper
&lt;/span&gt;&lt;span class="n"&gt;pipe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text-generation&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Snowflake/snowflake-arctic-instruct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;trust_remote_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a SQL query to find all users who signed up in the last 30 days and have made more than 5 purchases.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# The chat template is handled automatically by the pipeline
&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pipe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_new_tokens&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;256&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;generated_text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Beyond self-hosting, Arctic is also available through services like NVIDIA NIM and can be deployed from Amazon SageMaker JumpStart, offering more managed deployment options.&lt;/p&gt;

&lt;h2&gt;
  
  
  the takeaway
&lt;/h2&gt;

&lt;p&gt;Snowflake Arctic is a practical model for a specific set of problems. Its unique MoE architecture delivers efficiency, while its training is laser-focused on the high-value enterprise tasks of code and SQL generation. For engineers building AI products in these domains, the combination of an open license, strong domain-specific performance, and architectural efficiency makes Arctic a model you should be evaluating this week.&lt;/p&gt;

&lt;h2&gt;
  
  
  sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.phdata.io/blog/what-is-snowflake-arctic/" rel="noopener noreferrer"&gt;https://www.phdata.io/blog/what-is-snowflake-arctic/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.snowflake.com/en/data-cloud/arctic/" rel="noopener noreferrer"&gt;https://www.snowflake.com/en/data-cloud/arctic/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.datacamp.com/tutorial/snowflake-arctic-tutorial" rel="noopener noreferrer"&gt;https://www.datacamp.com/tutorial/snowflake-arctic-tutorial&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Apple's Private Cloud Compute Isn't Just About Privacy. It's a New Infrastructure Layer.</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Fri, 05 Jun 2026 15:03:56 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/apples-private-cloud-compute-isnt-just-about-privacy-its-a-new-infrastructure-layer-3ie2</link>
      <guid>https://dev.to/albertomontagnese/apples-private-cloud-compute-isnt-just-about-privacy-its-a-new-infrastructure-layer-3ie2</guid>
      <description>&lt;p&gt;Apple just detailed Private Cloud Compute (PCC), the infrastructure powering its most intensive AI features. This is more than a privacy play; it’s a new, semi-on-device compute layer built on custom silicon that forces a different way of thinking about building intelligent apps. The conclusion is that the device's trust boundary now extends into the data center.&lt;/p&gt;

&lt;h2&gt;
  
  
  what is private cloud compute
&lt;/h2&gt;

&lt;p&gt;Apple Intelligence operates on a hybrid model. By default, it uses powerful on-device models for tasks. But for more complex requests that need larger models, it can offload work to Private Cloud Compute. This isn't a standard cloud deployment. PCC is a completely new infrastructure tier built with custom Apple silicon server hardware. These servers run a hardened, minimal operating system derived from the foundations of iOS and macOS, designed to present an extremely narrow attack surface.&lt;/p&gt;

&lt;p&gt;The entire system is designed to provide the power of large-scale models without resorting to generic cloud processing of user data. It creates a middle ground between purely on-device computation and the full data exposure common in other cloud AI services. This architecture is Apple's answer to scaling AI capabilities while maintaining its privacy promises.&lt;/p&gt;

&lt;h2&gt;
  
  
  the non-negotiable guarantees
&lt;/h2&gt;

&lt;p&gt;Apple has designed PCC to make several hard guarantees about how it handles data. These aren't just policies; they are enforced by the architecture itself.&lt;/p&gt;

&lt;p&gt;First, all computation is stateless. User data is sent to PCC for the exclusive purpose of fulfilling a specific inference request. The data is never retained, logged, or stored after the request is complete. Writing to persistent storage is removed from the compute nodes.&lt;/p&gt;

&lt;p&gt;Second, user data is cryptographically and practically inaccessible to anyone at Apple. The system is designed so that even staff with physical access to the servers cannot view user data during processing.&lt;/p&gt;

&lt;p&gt;Third, the system is designed for verifiable transparency. Apple states that independent security researchers can inspect the code that runs on PCC servers to verify these privacy claims. The device attests the identity and configuration of the PCC cluster before ever sending a request, ensuring it's talking to a legitimate and secure environment.&lt;/p&gt;

&lt;h2&gt;
  
  
  how it changes your build
&lt;/h2&gt;

&lt;p&gt;As a developer, you don't interact with PCC directly. Your portal into Apple Intelligence is the App Intents framework. The system's AI layer is effectively blind to your application's capabilities until you explicitly declare them through well-structured App Intents. When a user makes a request, the system routes the query to the relevant app based on the intents you have exposed.&lt;/p&gt;

&lt;p&gt;For less intensive tasks, you have direct access to on-device models through the new Foundation Models framework, which lets you integrate capabilities like summarization with just a few lines of Swift.&lt;/p&gt;

&lt;p&gt;This means the high-leverage work is not in choosing a cloud provider, but in meticulously defining your app's core functions as intents. A rich set of intents makes your app a first-class citizen in this new intelligent ecosystem.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;import&lt;/span&gt; &lt;span class="kt"&gt;AppIntents&lt;/span&gt;

&lt;span class="c1"&gt;// Expose the core functionality of an app to Apple Intelligence.&lt;/span&gt;
&lt;span class="kd"&gt;struct&lt;/span&gt; &lt;span class="kt"&gt;CreateReminderIntent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;AppIntent&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;LocalizedStringResource&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"Create a New Reminder"&lt;/span&gt;
    &lt;span class="kd"&gt;static&lt;/span&gt; &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;description&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;IntentDescription&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Creates a new reminder in the user's default list."&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="kd"&gt;@Parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Title"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;String&lt;/span&gt;

    &lt;span class="kd"&gt;@Parameter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Due Date"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;dueDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;

    &lt;span class="kd"&gt;@MainActor&lt;/span&gt;
    &lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;perform&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;throws&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kd"&gt;some&lt;/span&gt; &lt;span class="kt"&gt;IntentResult&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="c1"&gt;// Your app's existing logic for creating a reminder.&lt;/span&gt;
        &lt;span class="c1"&gt;// By wrapping it in an AppIntent, it becomes available to Siri,&lt;/span&gt;
        &lt;span class="c1"&gt;// Shortcuts, and the new system-wide AI.&lt;/span&gt;
        &lt;span class="kt"&gt;ReminderService&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;shared&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;title&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;dueDate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;dueDate&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;// Return a result to the system.&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;value&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Reminder created: &lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code doesn't touch a server. It describes an action. The system then decides whether to fulfill it on-device or via PCC based on the user's request and context.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what for builders
&lt;/h2&gt;

&lt;p&gt;The line between on-device and cloud is blurring. Apple has effectively created a third option that extends the security perimeter of the iPhone into its own data centers. This isn't just another feature; it's a fundamental platform shift.&lt;/p&gt;

&lt;p&gt;For engineers building on Apple's platforms, the takeaway is clear: the most important work for the next few years is not about managing AI infrastructure. It is about building a rich and descriptive vocabulary of App Intents that plug your application directly into the core intelligence of the OS. That is the new surface area for innovation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://security.apple.com/blog/private-cloud-compute/" rel="noopener noreferrer"&gt;Private Cloud Compute: A new frontier for AI privacy in the cloud - Apple Security Research&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developer.apple.com/apple-intelligence/" rel="noopener noreferrer"&gt;Apple Intelligence - Apple Developer&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>devtools</category>
      <category>programming</category>
    </item>
    <item>
      <title>GitHub Copilot's New Desktop App Isn't About Chat. It's About Agents.</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Wed, 03 Jun 2026 15:02:47 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/github-copilots-new-desktop-app-isnt-about-chat-its-about-agents-1p6h</link>
      <guid>https://dev.to/albertomontagnese/github-copilots-new-desktop-app-isnt-about-chat-its-about-agents-1p6h</guid>
      <description>&lt;p&gt;The new GitHub Copilot app, announced at Microsoft Build 2026, is more than just a new place to chat with an AI. It represents a deliberate move to bring agentic workflows into a native desktop experience, shifting the developer assistant from a reactive partner to a proactive orchestrator. This isn't about better autocomplete; it's about changing how you delegate complex, multi-step tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  what just shipped
&lt;/h2&gt;

&lt;p&gt;At its Build 2026 conference, Microsoft unveiled a preview of a native GitHub Copilot desktop app. This moves Copilot out of the IDE and into its own dedicated environment. The key concept here is enabling 'agentic workflows'. Instead of a simple request-response loop for code snippets, the goal is to manage longer, more complex tasks that might involve multiple files, services, and steps.&lt;/p&gt;

&lt;p&gt;This is coupled with the general availability of Microsoft IQ, a new context layer designed to feed AI agents with real-time information from three sources: workplace knowledge from M365 signals (Work IQ), structured business data (Fabric IQ), and web grounding (Web IQ). The combination of a dedicated agent environment and richer, real-time context is the core of the new developer experience.&lt;/p&gt;

&lt;h2&gt;
  
  
  from inline assistant to orchestrator
&lt;/h2&gt;

&lt;p&gt;The practical difference is moving from 'write me a function that does X' to 'refactor this service to use the new authentication pattern'. The latter requires context, planning, and execution across multiple files. The new Copilot app is the interface for managing these types of tasks.&lt;/p&gt;

&lt;p&gt;Microsoft also introduced seven new MAI models, including MAI-Code-1, which is specifically tuned for GitHub and VS Code, and MAI-Thinking-1, a reasoning model. This points to a future where a central orchestrator can delegate specific sub-tasks to specialized models, choosing the best tool for the job, whether it's code generation, reasoning through a plan, or analyzing business data.&lt;/p&gt;

&lt;p&gt;For builders, this looks less like pair programming and more like delegating a ticket to a junior engineer who has full context on your company's stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  bridging prototype to production
&lt;/h2&gt;

&lt;p&gt;Another piece of this puzzle is Project Rayfin, also in preview. It's a managed backend-as-a-service built on Microsoft Fabric that aims to close the gap between a prototype and a production-ready application. It provides developers with a managed backend that works with GitHub-defined workflows.&lt;/p&gt;

&lt;p&gt;Imagine an agent that not only writes the code but also provisions the necessary backend infrastructure based on the application's needs. A developer could initiate a complex task via a command in the new desktop app.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Hypothetical CLI interaction with a future Copilot agent&lt;/span&gt;
copilot:agent:run &lt;span class="nt"&gt;--task&lt;/span&gt; &lt;span class="s2"&gt;"Upgrade the user-auth service to use the new MAI-Code-1 model for token generation and deploy to staging via Project Rayfin."&lt;/span&gt;
&lt;span class="nt"&gt;--context&lt;/span&gt; &lt;span class="s2"&gt;"./services/user-auth/*"&lt;/span&gt;
&lt;span class="nt"&gt;--context&lt;/span&gt; &lt;span class="s2"&gt;"internal-docs/auth-protocol-v3.md"&lt;/span&gt;
&lt;span class="nt"&gt;--set-var&lt;/span&gt; &lt;span class="s2"&gt;"MAI_MODEL=MAI-Thinking-1"&lt;/span&gt;
&lt;span class="nt"&gt;--plan-and-execute&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where the 'agentic' part becomes concrete. It's about defining a high-level goal and providing the necessary context, then letting the agentic system formulate and execute a plan.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what
&lt;/h2&gt;

&lt;p&gt;The era of simple code completion is over. The next frontier for AI developer tools is orchestration and agency. The GitHub Copilot desktop app, combined with new context layers and specialized models, is a clear signal of this shift. As a builder, the takeaway is to start thinking of AI not just as a tool to accelerate individual tasks, but as a system you can delegate entire workflows to. The platforms are being built to support this now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.tomsguide.com/" rel="noopener noreferrer"&gt;Microsoft's Build 2026 recap post&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>devtools</category>
      <category>machinelearning</category>
      <category>programming</category>
    </item>
    <item>
      <title>Anthropic's Dynamic Workflows Aren't Just Another Agent Feature</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Mon, 01 Jun 2026 15:01:48 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/anthropics-dynamic-workflows-arent-just-another-agent-feature-3mj9</link>
      <guid>https://dev.to/albertomontagnese/anthropics-dynamic-workflows-arent-just-another-agent-feature-3mj9</guid>
      <description>&lt;p&gt;Anthropic just shipped Claude Opus 4.8, but the real story isn't the model number. It's a feature called Dynamic Workflows, which orchestrates hundreds of parallel subagents for large-scale projects like codebase migrations. This moves the goalposts for what a coding agent does, shifting from interactive assistance to delegated, autonomous execution.&lt;/p&gt;

&lt;h2&gt;
  
  
  what just changed
&lt;/h2&gt;

&lt;p&gt;The latest flagship model, Claude Opus 4.8, was released with a notable capability for Claude Code called Dynamic Workflows. This feature is designed to manage complex, multi-step tasks by breaking them down and running them as parallel subagents. This is a structural departure from the typical agentic model, which tends to operate serially—it takes a prompt, acts, and waits for the next instruction.&lt;/p&gt;

&lt;p&gt;The key use case mentioned is codebase-scale work, which implies a system that can manage dependencies and context across many files and directories simultaneously. Instead of asking an agent to refactor a single file, you can theoretically define a project-level goal, and the workflow engine will orchestrate the necessary changes across the entire codebase. This suggests a higher level of abstraction where the developer acts as a system architect rather than a micromanager of prompts.&lt;/p&gt;

&lt;h2&gt;
  
  
  from copilot to orchestrator
&lt;/h2&gt;

&lt;p&gt;This changes the nature of the work. For years, AI coding tools have been positioned as copilots. They help with line-by-line suggestions, generating boilerplate, and explaining snippets. More advanced agents can tackle multi-file changes, but the interaction remains fundamentally conversational and sequential. You are still in the driver's seat for every major step.&lt;/p&gt;

&lt;p&gt;Dynamic Workflows point to a different interaction model. By allowing for the definition and parallel execution of sub-tasks, the system takes on the role of a project manager or a technical lead. The developer's job shifts from writing code to defining the architecture of the work itself. This requires a different skill: describing a complex change as a graph of dependent tasks that can be safely parallelized.&lt;/p&gt;

&lt;p&gt;This is the kind of work required for daunting tasks like framework upgrades, API deprecations, or migrating a legacy frontend to a new design system. These are projects that involve thousands of repetitive, yet context-sensitive, changes that are painful to execute manually and difficult to specify in a single prompt.&lt;/p&gt;

&lt;h2&gt;
  
  
  defining a workflow
&lt;/h2&gt;

&lt;p&gt;While the exact implementation details are not public, one can imagine a declarative format, perhaps a YAML or JSON file, that defines the stages of a large-scale refactoring. This configuration would serve as the master plan for the swarm of subagents.&lt;/p&gt;

&lt;p&gt;A migration from an old data-fetching library to a new one might be defined like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# workflow.yaml&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;api-client-migration&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Migrate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;components&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;from&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;legacy&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`ApiService`&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;new&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`GraphQLClient`."&lt;/span&gt;

&lt;span class="c1"&gt;# Phase 1: Identify all call sites of the old service.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;inventory-call-sites&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Scan&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;codebase&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;generate&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;JSON&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;report&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;of&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;files&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;that&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;import&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;and&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`ApiService`."&lt;/span&gt;
  &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;static-analysis&lt;/span&gt;
  &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;target&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;src/utils/ApiService.js"&lt;/span&gt;
  &lt;span class="na"&gt;output&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;migration_plan.json"&lt;/span&gt;

&lt;span class="c1"&gt;# Phase 2: Refactor components in parallel.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;refactor-components&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;For&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;each&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;component&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;in&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;report,&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;replace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`ApiService`&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;calls&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;with&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;`GraphQLClient`&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;queries."&lt;/span&gt;
  &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;parallel-map&lt;/span&gt;
  &lt;span class="na"&gt;input&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;migration_plan.json"&lt;/span&gt;
  &lt;span class="na"&gt;concurrency&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;50&lt;/span&gt; &lt;span class="c1"&gt;# Run up to 50 subagents at once&lt;/span&gt;
  &lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;edit-file&lt;/span&gt;
      &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{item.file}}"&lt;/span&gt;
        &lt;span class="na"&gt;prompt&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Replace&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;data&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;fetching&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;logic&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;here&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;use&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;GraphQLClient.&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;The&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;query&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;is&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;{{item.equivalent_query}}."&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;run-linter&lt;/span&gt;
      &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;file&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{{item.file}}"&lt;/span&gt;
        &lt;span class="na"&gt;fix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;

&lt;span class="c1"&gt;# Phase 3: Run integration tests after all refactoring is complete.&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;run-tests&lt;/span&gt;
  &lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Execute&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;end-to-end&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;test&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;suite&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;to&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;verify&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;the&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;migration."&lt;/span&gt;
  &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;refactor-components"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;tool&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;run-command&lt;/span&gt;
  &lt;span class="na"&gt;params&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;npm&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;run&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;test:e2e"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is speculative, but it illustrates the shift in thinking. The high-value work is in designing the workflow itself—defining the stages, dependencies, and the instructions for each parallel unit of work.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what
&lt;/h2&gt;

&lt;p&gt;For builders, this is a signal to start thinking about automation at a higher level of abstraction. The new frontier of agentic development may be less about crafting the perfect prompt and more about designing robust, automated workflows that can reliably execute complex engineering projects. If this paradigm holds, the most effective AI-powered developers will not be just expert coders, but expert orchestrators.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.anthropic.com/" rel="noopener noreferrer"&gt;https://www.anthropic.com/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>claude</category>
      <category>agents</category>
    </item>
    <item>
      <title>Mistral's Codestral Isn't Another Generalist Model</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Fri, 29 May 2026 15:02:25 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/mistrals-codestral-isnt-another-generalist-model-4j98</link>
      <guid>https://dev.to/albertomontagnese/mistrals-codestral-isnt-another-generalist-model-4j98</guid>
      <description>&lt;p&gt;Mistral AI has released Codestral, a 22B parameter model explicitly for code generation. This is a notable release not because it's the largest model, but because it's a specialized one. The takeaway is that the frontier is shifting from massive, general-purpose models to efficient, task-specific architectures for professional tooling.&lt;/p&gt;

&lt;h2&gt;
  
  
  what is codestral
&lt;/h2&gt;

&lt;p&gt;Codestral is an open-weight 22B model trained on a dataset covering over 80 programming languages, including Python, Java, C++, JavaScript, and more specialized ones like Swift and Fortran. Its defining feature is its focus. Unlike generalist models that handle a wide range of text-based tasks, Codestral is engineered for code-centric workflows: function completion, test generation, and filling in partial code blocks.&lt;/p&gt;

&lt;p&gt;The model is released under a “Mistral AI Non-Production License,” which makes it available for research and testing purposes. This “open-weight” approach allows developers to download and experiment with the model's parameters directly, but the licensing implies constraints on commercial production use.&lt;/p&gt;

&lt;p&gt;One of its key technical capabilities is a fill-in-the-middle (FIM) mechanism, which is critical for IDE-based code completion where latency is a primary concern. This suggests it's optimized for the kind of low-latency, high-frequency interactions common in tools like VSCode and JetBrains.&lt;/p&gt;

&lt;h2&gt;
  
  
  getting access
&lt;/h2&gt;

&lt;p&gt;There are a few ways to use Codestral. For direct integration and IDE tooling, Mistral has provided a dedicated endpoint at &lt;code&gt;codestral.mistral.ai&lt;/code&gt;. This endpoint is intended for developers integrating the model into their tools and is free during a beta period. It is also available on their standard &lt;code&gt;api.mistral.ai&lt;/code&gt; endpoint, where usage is billed per token.&lt;/p&gt;

&lt;p&gt;For local development and experimentation, you can run the model directly. It's available for download from Hugging Face and can be run using tools like Ollama. This allows for offline use and deeper integration into local development environments.&lt;/p&gt;

&lt;p&gt;Here is a basic example of how to interact with the model via the Ollama API after pulling the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# First, pull the model with Ollama&lt;/span&gt;
ollama pull codestral

&lt;span class="c"&gt;# Then, send a request to the local API&lt;/span&gt;
curl http://localhost:11434/api/chat &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "codestral",
  "messages": [
    {
      "role": "user",
      "content": "Write a Python function to calculate the Fibonacci sequence."
    }
  ]
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Integrations are already available in frameworks like LlamaIndex and LangChain for building agentic applications, and in IDE extensions like Tabnine and Continue.dev.&lt;/p&gt;

&lt;h2&gt;
  
  
  why it matters for builders
&lt;/h2&gt;

&lt;p&gt;The release of a dedicated, high-performance code model from a major lab is significant. It signals a move toward a multi-model future where developers will likely route tasks to specialized systems rather than relying on a single, monolithic AI. For code generation, a model trained specifically on code and fluent in dozens of languages offers a performance and latency advantage over a generalist counterpart.&lt;/p&gt;

&lt;p&gt;The 22-billion parameter size is also an intentional choice. It is large enough to be powerful but small enough to be efficient for its target use cases, particularly code completion, where milliseconds matter. Internal evaluations cited in the announcement suggest it significantly reduces latency for autocomplete while maintaining quality.&lt;/p&gt;

&lt;p&gt;However, the non-production license is a critical detail. While it encourages experimentation and research, it means teams looking to embed this in a commercial product need to carefully evaluate the terms. This is a different path from fully open-source models and represents a hybrid strategy for commercializing foundational models.&lt;/p&gt;

&lt;p&gt;For engineers building AI-powered developer tools, Codestral is a new primitive to work with. It's a powerful, specialized engine for code tasks that can be run locally or accessed via a fast, dedicated API. The focus now shifts to how we build intelligent applications on top of these specialized models.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://mistral.ai/" rel="noopener noreferrer"&gt;Mistral AI Announcement&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>machinelearning</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Anthropic's New Security Tooling is a Wake-Up Call for Agent Builders</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Wed, 27 May 2026 15:04:09 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/anthropics-new-security-tooling-is-a-wake-up-call-for-agent-builders-5gkf</link>
      <guid>https://dev.to/albertomontagnese/anthropics-new-security-tooling-is-a-wake-up-call-for-agent-builders-5gkf</guid>
      <description>&lt;p&gt;Anthropic just shipped a security guidance plugin and a self-hosted sandbox for Claude. This isn't just another incremental feature drop; it's a clear signal that the next phase of AI development is about hardening the agent stack. The takeaway is that security is moving from a manual review afterthought to a critical, automated first pass, and you should be building your systems accordingly.&lt;/p&gt;

&lt;h2&gt;
  
  
  what just shipped
&lt;/h2&gt;

&lt;p&gt;Two new security-focused features for Claude were announced: a security guidance plugin and a self-hosted sandbox. The plugin acts as a proactive vulnerability scanner for developers as they write code. Anthropic reported using it internally and seeing a 30-40% decrease in security-related comments on pull requests, suggesting it serves as an effective lightweight first pass before a full human code review.&lt;/p&gt;

&lt;p&gt;The second component is a self-hosted sandbox, currently in public beta. This allows Claude Managed Agents to operate within a user-controlled environment, including connecting to a user's private servers. This moves agent execution from a multi-tenant cloud environment to your own infrastructure, a significant change for handling sensitive tasks.&lt;/p&gt;

&lt;h2&gt;
  
  
  why this matters for your agent stack
&lt;/h2&gt;

&lt;p&gt;For the past year, building agents has been an exercise in prompt engineering and orchestration logic. Security has often been reduced to a line in a system prompt like "You are a helpful assistant and you will not perform harmful actions." This approach is brittle and insufficient for production systems.&lt;/p&gt;

&lt;p&gt;Anthropic's move signals a necessary shift from prompt-based security to infrastructure-based security. A local, user-controlled sandbox is a fundamental primitive for running agent-generated code safely. It provides a contained environment where an agent can execute tasks, interact with files, and run code without having access to the host system or network by default. This is table stakes for any serious enterprise use case.&lt;/p&gt;

&lt;p&gt;The security plugin reframes AI-generated code. Instead of treating it as a magical, opaque output, it treats it like any other code written by a junior developer: something to be linted, scanned, and analyzed for common pitfalls before it ever gets to a human reviewer. It makes security proactive, not reactive.&lt;/p&gt;

&lt;h2&gt;
  
  
  integrating security analysis into the workflow
&lt;/h2&gt;

&lt;p&gt;Adopting this model means building security checks directly into your agent's code generation and execution loop. The goal is to catch issues before they are ever executed. While the exact implementation of Anthropic's plugin isn't public, you can imagine how it fits into a CI/CD pipeline or a local development environment.&lt;/p&gt;

&lt;p&gt;Here is a hypothetical configuration for a pre-commit hook that uses an AI security scanner on staged Python files. This is the kind of automated, low-friction check that the new tooling enables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .pre-commit-config.yaml&lt;/span&gt;
&lt;span class="na"&gt;repos&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
&lt;span class="pi"&gt;-&lt;/span&gt;   &lt;span class="na"&gt;repo&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;local&lt;/span&gt;
    &lt;span class="na"&gt;hooks&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt;   &lt;span class="na"&gt;id&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;claude-security-scan&lt;/span&gt;
        &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Claude Security Scanner&lt;/span&gt;
        &lt;span class="na"&gt;entry&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;bash -c 'claude-sec-scanner --level=high --fail-on-critical --scope=diff &amp;lt;your_files&amp;gt;'&lt;/span&gt;
        &lt;span class="na"&gt;language&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;system&lt;/span&gt;
        &lt;span class="na"&gt;types&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;python&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
        &lt;span class="na"&gt;stages&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;commit&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This approach automates the first pass of a security review. It doesn't replace a human expert, but it filters out the low-hanging fruit, freeing up senior engineers to focus on more complex architectural issues. The result is a faster, more secure development cycle.&lt;/p&gt;

&lt;h2&gt;
  
  
  the sandbox is the real story
&lt;/h2&gt;

&lt;p&gt;The most significant part of this announcement is the user-controlled sandbox. For any organization working with proprietary code, customer data, or private infrastructure, allowing an external AI model to execute arbitrary code has been a non-starter. A self-hosted sandbox connected to private servers inverts the trust model. Instead of trusting the model provider's environment, you define the environment and its boundaries.&lt;/p&gt;

&lt;p&gt;This unlocks the ability to build agents that can securely perform actions on internal systems. An agent could, for example, be given sandboxed access to a staging database to run diagnostics, or permission to interact with an internal code repository to refactor code, all without that data ever leaving your control.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what
&lt;/h2&gt;

&lt;p&gt;The frontier of AI is no longer just about building larger models with higher benchmark scores. It is increasingly about building the professional-grade tooling required to ship products that use those models, safely and reliably. Anthropic is providing a clear template for how to think about agent security.&lt;/p&gt;

&lt;p&gt;As a builder, your focus should be shifting. The interesting work is less about novel agent architectures and more about the boring, critical infrastructure needed to run them in production. How do you containerize agent execution? How do you define fine-grained permissions for tool use? How do you automate security analysis for generated code? These are the problems that need to be solved to move agents from demos to deployed products, and this recent release shows one major lab is thinking the same way.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://www.securityweek.com/anthropic-releases-new-claude-sandbox-security-guidance-plugin/" rel="noopener noreferrer"&gt;Anthropic Releases New Claude Sandbox, Security Guidance Plugin - SecurityWeek&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>devtools</category>
      <category>programming</category>
    </item>
    <item>
      <title>Google's Gemini 3.5 Flash Isn't For Chat. It's For Agents.</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Mon, 25 May 2026 15:01:49 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/googles-gemini-35-flash-isnt-for-chat-its-for-agents-22d3</link>
      <guid>https://dev.to/albertomontagnese/googles-gemini-35-flash-isnt-for-chat-its-for-agents-22d3</guid>
      <description>&lt;p&gt;Google shipped Gemini 3.5 Flash on May 19, the first model in its new 3.5 series. [4] The release is not just another incremental update; it’s a deliberate shift in strategy. Google is framing this model as 'agent-first, not chatbot-first,' a clear signal that the focus is moving from conversational quality to autonomous tool-use and coding. [4]&lt;/p&gt;

&lt;h2&gt;
  
  
  what shipped
&lt;/h2&gt;

&lt;p&gt;Gemini 3.5 Flash was announced at Google I/O 2026 and, unlike many recent releases, went straight to general availability. [4, 15] It's accessible now for developers through the Gemini API and Google AI Studio, and for enterprise customers in the Gemini Enterprise Agent Platform. [15] This is the initial release from the Gemini 3.5 family, positioned as a workhorse model for developers building agentic systems. [13]&lt;/p&gt;

&lt;p&gt;The model is engineered for speed and efficiency, but Google's performance claims place it above its previous-generation Pro model. [13] This combination of speed and capability is aimed squarely at enabling complex, multi-step tasks that provide tangible utility. [13]&lt;/p&gt;

&lt;h2&gt;
  
  
  an agent-first architecture
&lt;/h2&gt;

&lt;p&gt;The most significant aspect of this release is the framing. Google's announcement emphasized the model's strengths in long-horizon tool-use and coding over traditional chat benchmarks. [4] The company claims Gemini 3.5 Flash outperforms Gemini 3.1 Pro on key benchmarks for agentic and coding tasks, including a 76.2% score on Terminal-Bench 2.1. [13]&lt;/p&gt;

&lt;p&gt;This focus matters because it reflects the broader industry's maturation from chatbots to agents. The engineering challenge is no longer just about generating fluent text, but about building systems that can plan, execute, and self-correct over a series of actions. Google is explicitly designing and marketing this model for that purpose. It's part of a larger ecosystem push that includes tools like the Managed Agents API, which provides secure, Google-hosted environments for running custom agents. [13]&lt;/p&gt;

&lt;h2&gt;
  
  
  pricing for value, not volume
&lt;/h2&gt;

&lt;p&gt;While the 'Flash' branding implies speed and low cost, the pricing tells a different story. At $1.50 per million input tokens and $9.00 per million output tokens, Gemini 3.5 Flash is significantly more expensive than previous Flash models like 3.1 Flash-Lite. [15] This price point is closer to the Gemini 3.1 Pro tier. [15]&lt;/p&gt;

&lt;p&gt;This suggests Google is not competing for the cheapest possible text generation. Instead, it is pricing the model based on the value of the agentic tasks it can perform. For developers, this means 3.5 Flash is likely not the right choice for high-volume, low-complexity chat applications. It is intended for higher-value workflows where its advanced reasoning and coding capabilities can justify the cost.&lt;/p&gt;

&lt;p&gt;Here is a simple configuration for accessing the model via the API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;google.generativeai&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="c1"&gt;# Configure with your API key
&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;configure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Set up the model
&lt;/span&gt;&lt;span class="n"&gt;generation_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_p&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.95&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;top_k&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;max_output_tokens&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;65536&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;generation_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;generation_config&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Start a chat session
&lt;/span&gt;&lt;span class="n"&gt;convo&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[])&lt;/span&gt;

&lt;span class="n"&gt;convo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Your agentic prompt here...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;convo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;last&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  the so-what for builders
&lt;/h2&gt;

&lt;p&gt;Gemini 3.5 Flash is a clear statement of direction from Google. The future of its AI platform is centered on agents that can automate complex work. For engineers and builders, this means the tools and models are now being explicitly optimized for these more sophisticated use cases.&lt;/p&gt;

&lt;p&gt;The release of Gemini 3.5 Flash isn't just another model to evaluate. It's a signal to start thinking about your own product roadmaps in terms of agentic workflows. The core infrastructure to support these systems is coming online, and the models are being built specifically to power them.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/blog/products/ai-machine-learning/innovations-from-google-io-26-on-google-cloud" rel="noopener noreferrer"&gt;Innovations from Google I/O 26 on Google Cloud&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://codersera.com/blog/ai-models-released-in-may-2026-complete-roundup/" rel="noopener noreferrer"&gt;AI Models Released in May 2026: Complete Roundup&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>devtools</category>
    </item>
    <item>
      <title>Google just commoditized the agent stack with a single API call</title>
      <dc:creator>albe_sf</dc:creator>
      <pubDate>Fri, 22 May 2026 15:03:03 +0000</pubDate>
      <link>https://dev.to/albertomontagnese/google-just-commoditized-the-agent-stack-with-a-single-api-call-3092</link>
      <guid>https://dev.to/albertomontagnese/google-just-commoditized-the-agent-stack-with-a-single-api-call-3092</guid>
      <description>&lt;p&gt;Google's release of Managed Agents in the Gemini API is the signal to pay attention to this week. It packages the messy, stateful, and insecure parts of building agents into a single API endpoint, backed by a new, cost-effective frontier model, Gemini 3.5 Flash. The takeaway is that the infrastructure for running autonomous agents in secure, isolated environments is now a utility.&lt;/p&gt;

&lt;h2&gt;
  
  
  what actually shipped
&lt;/h2&gt;

&lt;p&gt;On May 19, 2026, Google released two things that matter for builders: Gemini 3.5 Flash and the public preview of Managed Agents for the Gemini API. Gemini 3.5 Flash is positioned as a model optimized for performance on agentic and coding tasks. It's the engine.&lt;/p&gt;

&lt;p&gt;The more significant release is Managed Agents. This is the platform. It gives developers the ability to build and deploy autonomous, stateful agents that run in secure, Google-hosted Linux sandbox environments. Instead of managing your own infrastructure for code execution and state, you can now spin up an agent via an API call. The first available general-purpose agent is &lt;code&gt;antigravity-preview-05-2026&lt;/code&gt;, which can plan, reason, write and execute code, manage files, and browse the web inside its container.&lt;/p&gt;

&lt;h2&gt;
  
  
  from a local cli to a server-side platform
&lt;/h2&gt;

&lt;p&gt;This release coincides with a strategic shift. Google is transitioning its popular &lt;code&gt;Gemini CLI&lt;/code&gt; to a new &lt;code&gt;Antigravity CLI&lt;/code&gt;. This isn't just a rename. It reflects a move from a local terminal utility to a client for a unified, server-side agent platform. The new CLI is built in Go for better performance and supports asynchronous workflows, letting you orchestrate multiple agents on complex tasks without locking your terminal.&lt;/p&gt;

&lt;p&gt;This transition acknowledges that real agentic work involves multiple agents and shared context, which outgrew the initial CLI's scope. By unifying the backend into the Antigravity platform, improvements to the core agent harness are automatically available to the CLI, the desktop app, and the API. For developers, this means the agent you prototype in the terminal shares the same foundation as the one you deploy to production.&lt;/p&gt;

&lt;h2&gt;
  
  
  how you will use this
&lt;/h2&gt;

&lt;p&gt;Building with this new API means focusing less on the infrastructure of agent execution. You are no longer primarily responsible for the security of running model-generated code or persisting state between long-running tasks. You define the task and the tools, and the managed agent handles the execution loop within its sandboxed environment.&lt;/p&gt;

&lt;p&gt;A request to the new Interactions API might look conceptually like this. You provide the model, the agent definition, and the user's high-level task, and the platform manages the multi-step execution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.ai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;

&lt;span class="c1"&gt;# Configure the managed agent with a specific toolset and model
&lt;/span&gt;&lt;span class="n"&gt;file_processing_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ManagedAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/my-project/agents/file-processor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# The platform provides the secure code execution environment
&lt;/span&gt;    &lt;span class="n"&gt;harness&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;antigravity-preview-05-2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_reader&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;data_transformer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report_generator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Start a stateful session to perform a multi-step task
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;file_processing_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start_session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# The agent plans and executes steps inside its isolated sandbox
&lt;/span&gt;&lt;span class="n"&gt;final_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Analyze the quarterly sales data in /uploads, identify the top three regions, and generate a PDF summary.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sales_summary.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The key change is moving from a request-response loop that you manage to a persistent, stateful agent that you task. The &lt;code&gt;antigravity-preview-05-2026&lt;/code&gt; agent harness provides the core capabilities of file management, web browsing, and code execution out of the box.&lt;/p&gt;

&lt;h2&gt;
  
  
  the so-what for builders
&lt;/h2&gt;

&lt;p&gt;The move toward managed, server-side agents is a significant abstraction layer. For the last year, building a truly autonomous agent meant wrestling with Docker containers, file system permissions, and state management. Google is now offering to handle that plumbing. This lowers the barrier to entry for shipping sophisticated agentic workflows.&lt;/p&gt;

&lt;p&gt;This doesn't eliminate the hard problems of agentic reasoning and reliability. But it does commoditize the execution environment, letting you focus on the agent's actual logic and purpose. It's a platform bet that the future of AI development is less about prompting a model and more about directing a stateful worker.&lt;/p&gt;

&lt;h2&gt;
  
  
  Sources
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/" rel="noopener noreferrer"&gt;Release notes | Gemini API - Google AI for Developers&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://developers.googleblog.com/2026/05/an-important-update-transitioning-gemini-cli-to-antigravity-cli.html" rel="noopener noreferrer"&gt;An important update: Transitioning Gemini CLI to Antigravity CLI&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>agents</category>
      <category>devtools</category>
    </item>
  </channel>
</rss>
