<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Karl Weinmeister</title>
    <description>The latest articles on DEV Community by Karl Weinmeister (@kweinmeister).</description>
    <link>https://dev.to/kweinmeister</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2926299%2F7c094fc1-b557-4030-b220-fd4fc43ed1bd.jpeg</url>
      <title>DEV Community: Karl Weinmeister</title>
      <link>https://dev.to/kweinmeister</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kweinmeister"/>
    <language>en</language>
    <item>
      <title>Google Antigravity SDK: The developer guide</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Tue, 09 Jun 2026 15:01:34 +0000</pubDate>
      <link>https://dev.to/googleai/google-antigravity-sdk-the-developer-guide-4o8m</link>
      <guid>https://dev.to/googleai/google-antigravity-sdk-the-developer-guide-4o8m</guid>
      <description>&lt;p&gt;The &lt;a href="https://antigravity.google/docs/sdk-overview?utm_campaign=CDR_0x2b6f3004_default_b521271009&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Antigravity SDK&lt;/a&gt; is a Python framework for building and running autonomous agents. It decouples your agent’s logic from where it runs, letting you focus on what the agent does while the SDK manages execution and state.&lt;/p&gt;

&lt;p&gt;The Python SDK interfaces with a bundled Go harness over WebSockets. The local Go harness runs the core agentic loop and manages sandboxed tool execution. Python acts as the control plane where you configure tools, safety policies, and lifecycle hooks.&lt;/p&gt;

&lt;p&gt;This guide outlines the SDK’s architecture one layer at a time, referencing the official &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity" rel="noopener noreferrer"&gt;source repository&lt;/a&gt;. Note that the SDK is currently pre-v1.0 and subject to change.&lt;/p&gt;

&lt;h3&gt;
  
  
  Where Antigravity fits in Google’s AI stack
&lt;/h3&gt;

&lt;p&gt;Google’s AI stack offers multiple levels of abstraction for building with Gemini. Choosing the right one depends on how much control you need over the execution loop.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;The&lt;/strong&gt; &lt;a href="https://ai.google.dev/gemini-api/docs" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemini API&lt;/strong&gt;&lt;/a&gt; is stateless. You make an API call and get a response. You manage the entire loop.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent Development Kit&lt;/strong&gt;&lt;/a&gt; sits one level up. With the ADK, you design the event loops, pick the foundation models, and control how agents route messages to each other.&lt;/li&gt;
&lt;li&gt;The &lt;a href="https://antigravity.google/product/antigravity-sdk" rel="noopener noreferrer"&gt;&lt;strong&gt;Antigravity SDK&lt;/strong&gt;&lt;/a&gt; is a pre-packaged runtime tightly integrated with &lt;a href="https://ai.google.dev/?utm_campaign=CDR_0x2b6f3004_default_b521271009&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini&lt;/a&gt;. You don’t build the agentic loop; you’re given one. Your role is to govern it.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Getting started
&lt;/h3&gt;

&lt;p&gt;Install the package with &lt;code&gt;pip install google-antigravity&lt;/code&gt;, ensuring that &lt;code&gt;GEMINI_API_KEY&lt;/code&gt; is set in your environment. Then you’re ready to build your first agent!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.antigravity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LocalAgentConfig&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Define a tool function with a descriptive docstring
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Gets the current weather for a location.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The weather in &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; is sunny, 72°F.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="c1"&gt;# 2. Register the tool in the agent configuration
&lt;/span&gt;    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalAgentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;system_instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful weather assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 3. Initialize the agent and query it
&lt;/span&gt;    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s the weather in San Diego?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What’s happening here? The Agent context manager starts the Go harness, establishes a WebSocket connection, and registers the &lt;code&gt;get_weather&lt;/code&gt; function as an available tool. The model automatically decides when to invoke it based on the user’s prompt. When the async with block exits, the harness shuts down and all connections are closed.&lt;/p&gt;

&lt;h3&gt;
  
  
  The three-layer architecture
&lt;/h3&gt;

&lt;p&gt;The SDK separates concerns into three layers, each with a distinct responsibility.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fau0l0170zdvtjdpcagio.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fau0l0170zdvtjdpcagio.png" width="799" height="423"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 1:&lt;/strong&gt; &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/blob/main/google/antigravity/agent.py" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;and&lt;/strong&gt; &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity/types.py" rel="noopener noreferrer"&gt;&lt;strong&gt;LocalAgentConfig&lt;/strong&gt;&lt;/a&gt;. The high-level entry point. Manages configuration, session lifecycle, tool wiring, hooks, and triggers. This is where you spend most of your time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 2:&lt;/strong&gt; &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity/conversation" rel="noopener noreferrer"&gt;&lt;strong&gt;Conversation&lt;/strong&gt;&lt;/a&gt;. The stateful session manager. Wraps the connection and handles message history accumulation, context window compaction, and token usage tracking (including Gemini’s “thinking tokens”).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Layer 3:&lt;/strong&gt; &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity/connections" rel="noopener noreferrer"&gt;&lt;strong&gt;Connection&lt;/strong&gt;&lt;/a&gt; &lt;strong&gt;and&lt;/strong&gt; &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity/connections" rel="noopener noreferrer"&gt;&lt;strong&gt;ConnectionStrategy&lt;/strong&gt;&lt;/a&gt;. The transport abstraction. For local development, LocalConnection communicates via WebSockets with the Go harness. This layer is what makes it possible to eventually swap in remote backends without changing your application code.&lt;/p&gt;

&lt;p&gt;Now let’s look at what you can build on top of those three layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tools and MCP
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0icei4zfbsrgy2b3s1rb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0icei4zfbsrgy2b3s1rb.png" width="799" height="309"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Built-in tools
&lt;/h4&gt;

&lt;p&gt;The Go harness ships with optimized native tools for standard OS interactions: &lt;code&gt;view_file&lt;/code&gt;, &lt;code&gt;edit_file&lt;/code&gt;, &lt;code&gt;create_file&lt;/code&gt;, &lt;code&gt;list_directory&lt;/code&gt;, &lt;code&gt;search_directory&lt;/code&gt;, &lt;code&gt;run_command&lt;/code&gt;, and &lt;code&gt;generate_image&lt;/code&gt;. These run inside the harness process, not in Python, so they’re fast and sandboxed.&lt;/p&gt;

&lt;h4&gt;
  
  
  Custom Python tools
&lt;/h4&gt;

&lt;p&gt;If you need the agent to call your business logic, you write a standard Python function. The SDK’s ToolRunner uses reflection to inspect type hints and parse docstrings, generating the Gemini FunctionDeclaration automatically.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;lookup_customer_tier&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Looks up a customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s subscription tier.

    Args:
        email: The customer&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s registered email address.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;tier&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;The customer is on the &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;tier&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; plan.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalAgentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;lookup_customer_tier&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  ToolContext for stateful tools
&lt;/h4&gt;

&lt;p&gt;Sometimes a tool needs to remember things across invocations in the same conversation, like a pagination cursor or a running counter. Passing that state through the LLM wastes tokens and bloats the context window.&lt;/p&gt;

&lt;p&gt;The SDK provides &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/blob/main/google/antigravity/tools/tool_context.py" rel="noopener noreferrer"&gt;ToolContext&lt;/a&gt;, a conversation-scoped key-value store. Add &lt;code&gt;ctx: ToolContext&lt;/code&gt; to your function signature and the SDK injects it automatically. The model never sees the parameter.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.antigravity.tools.tool_context&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ToolContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Processes the next batch of server logs.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;log_cursor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;fetch_logs&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cursor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;limit&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;log_cursor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cursor&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;logs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  MCP integration
&lt;/h4&gt;

&lt;p&gt;The SDK has native support for the &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; using both Stdio transport and Streamable HTTP. Point your agent at an MCP server and it for access to its exposed tools.&lt;/p&gt;

&lt;p&gt;Because MCP tools are integrated at the ToolRunner level, they’re governed by the exact same safety policies as built-in and custom tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Lifecycle hooks
&lt;/h3&gt;

&lt;p&gt;The SDK treats agent lifecycles through composable middleware using &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity/hooks" rel="noopener noreferrer"&gt;hooks&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9yv0omxn3lr7oiziynur.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9yv0omxn3lr7oiziynur.png" width="800" height="243"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A common security flaw in custom agent frameworks is the &lt;a href="https://en.wikipedia.org/wiki/Time-of-check_to_time-of-use" rel="noopener noreferrer"&gt;Time-Of-Check to Time-Of-Use&lt;/a&gt;, or TOCTOU, vulnerability. A security hook approves a tool call’s arguments, then a subsequent middleware mutates those arguments before execution. Antigravity prevents this by categorizing hooks into three archetypes, enforced by the type system.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decide hooks&lt;/strong&gt; are read-only and blocking. They inspect incoming data (like a pending tool call) and return HookResult(allow=True/False). They can’t modify the payload. If any Decide hook denies, execution short-circuits. Example: &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity/hooks" rel="noopener noreferrer"&gt;PreToolCallDecideHook&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inspect hooks&lt;/strong&gt; are read-only and non-blocking. They receive data after an event and run concurrently. They can’t block the main flow. Example: &lt;code&gt;PostToolCallHook&lt;/code&gt; (writing tool outputs to external systems).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Transform hooks&lt;/strong&gt; are modifying and blocking. They receive data, mutate it, and pass the transformed payload back. Example: &lt;code&gt;OnToolErrorHook&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;OnToolErrorHook&lt;/code&gt; is particularly useful. When a tool throws an exception, instead of crashing the entire loop or dumping a raw Python traceback into the model’s context, you intercept the error and feed strategic recovery guidance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.antigravity.hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;FallbackHook&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OnToolErrorHook&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Intercepts tool errors and returns recovery guidance.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HookContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Optional&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="nf"&gt;return &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[System: Invalid parameters. &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Try &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;search_directory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; to find the correct ID.]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalAgentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hooks&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;FallbackHook&lt;/span&gt;&lt;span class="p"&gt;()])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can stack these hook types together to build a middleware pipeline. For example, you could include rate-limiting via Decide hooks, audit logging via Inspect hooks, and crash recovery via Transform hooks.&lt;/p&gt;

&lt;h3&gt;
  
  
  Safety policies
&lt;/h3&gt;

&lt;p&gt;Giving an autonomous agent access to your system requires guardrails. The SDK employs a declarative, priority-based &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity/hooks" rel="noopener noreferrer"&gt;policy engine&lt;/a&gt; that evaluates every single action at the runtime hook level.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrpgd52n73btlji7se5l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhrpgd52n73btlji7se5l.png" width="800" height="225"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Out of the box, the SDK takes a strict security stance. If you spin up an agent with zero configuration, it defaults to &lt;code&gt;confirm_run_command()&lt;/code&gt;: the agent can read and write files, but shell execution requires explicit approval.&lt;/p&gt;

&lt;p&gt;Policies evaluate top-down using a priority model. You configure rules with &lt;code&gt;policy.allow()&lt;/code&gt;, &lt;code&gt;policy.deny()&lt;/code&gt;, and &lt;code&gt;policy.ask_user()&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.antigravity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LocalAgentConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.antigravity.hooks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;policy&lt;/span&gt;

&lt;span class="n"&gt;policies&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="c1"&gt;# Block dangerous arguments instantly
&lt;/span&gt;    &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deny&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;when&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rm &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CommandLine&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;# Ask the human for any other shell command
&lt;/span&gt;    &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ask_user&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;run_command&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;my_cli_prompt_function&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;# Allow safe tools silently
&lt;/span&gt;    &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;allow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;view_file&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="c1"&gt;# Deny everything else
&lt;/span&gt;    &lt;span class="n"&gt;policy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deny&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalAgentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;policies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;policies&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Human-in-the-loop
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;policy.ask_user()&lt;/code&gt; builder pauses the execution loop, invokes your custom handler, and waits for approval before continuing.&lt;/p&gt;

&lt;h4&gt;
  
  
  Disabling vs. denying
&lt;/h4&gt;

&lt;p&gt;There’s an important distinction between disabling vs denying tools. &lt;code&gt;CapabilitiesConfig.disabled_tools&lt;/code&gt; physically removes a tool’s JSON Schema from the context window before sending the prompt to Gemini. The model doesn’t know the tool exists, and you save input tokens. &lt;code&gt;policy.deny()&lt;/code&gt; keeps the tool visible but blocks it at runtime. The model attempts to use it, gets an error message, and learns why it was blocked. It costs tokens for the failed attempt, but enables dynamic, argument-based restrictions and lets the model adapt.&lt;/p&gt;

&lt;h3&gt;
  
  
  Background triggers
&lt;/h3&gt;

&lt;p&gt;True autonomous systems monitor their environment and alert you proactively. The SDK’s &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/google/antigravity/triggers" rel="noopener noreferrer"&gt;triggers&lt;/a&gt; are long-lived async tasks that run alongside the agent session, reacting to external events.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjayzp2k38e3c62rebd5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqjayzp2k38e3c62rebd5.png" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you start an Agent context, the TriggerRunner spawns a separate asyncio task for each registered trigger. A crashing trigger won’t take down the agent. A busy agent won’t block the triggers.&lt;/p&gt;

&lt;p&gt;Each trigger receives a TriggerContext. When it notices something in the outside world, it calls &lt;code&gt;ctx.send(“Message”)&lt;/code&gt; to inject a notification into the agent’s conversation history. The agent reacts as if the user had typed it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.antigravity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LocalAgentConfig&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.antigravity.triggers&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;every&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TriggerContext&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;monitor_queue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;TriggerContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;tickets&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;fetch_pagerduty_alerts&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;tickets&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[System Alert]: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tickets&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; new P0 alerts detected.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalAgentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;triggers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;every&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;monitor_queue&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The SDK also ships &lt;code&gt;triggers.on_file_change()&lt;/code&gt; for OS-level file watching (great for local coding assistants) and &lt;code&gt;@triggers.trigger&lt;/code&gt; for custom async listeners like GitHub webhook receivers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Streaming and thoughts
&lt;/h3&gt;

&lt;p&gt;When an agent is executing a multi-step task, waiting for a final output can make the application feel frozen.&lt;/p&gt;

&lt;p&gt;The SDK addresses this by streaming execution events in real time. Instead of blocking, &lt;code&gt;await agent.chat()&lt;/code&gt; immediately returns a &lt;code&gt;ChatResponse&lt;/code&gt; object. This object acts as a shared, memory-cached buffer.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozapsow80nhpy3xkx85a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fozapsow80nhpy3xkx85a.png" width="799" height="256"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Unlike standard Python generators, which are exhausted once read, ChatResponse lets you attach multiple independent cursors to the same stream. This allows you to route different aspects of the same agent turn concurrently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Main text stream&lt;/strong&gt; (e.g., rendering markdown chunks to your frontend UI)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Chain-of-thought stream&lt;/strong&gt; (e.g., logging the agent’s internal reasoning to a developer console)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tool-call stream&lt;/strong&gt; (e.g., displaying a live status widget as the agent invokes tools)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Write a short story.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Stream raw text tokens
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;token&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The response.thoughts stream exposes the model’s Chain-of-Thought reasoning in real-time. Token costs are tracked with &lt;code&gt;response.usage_metadata.thoughts_token_count&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;response.tool_calls&lt;/code&gt; stream yields strongly-typed ToolCall objects as soon as the agent dispatches them, so your UI can render updates instantly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Subagents
&lt;/h3&gt;

&lt;p&gt;One of the most common pitfalls in autonomous agents is context window bloat. The SDK solves this through hierarchical delegation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gdbm0nymphjwhg7cv00.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0gdbm0nymphjwhg7cv00.png" width="798" height="145"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Instead of doing all the work in a single thread, the main agent invokes the built-in &lt;code&gt;start_subagent&lt;/code&gt; tool. This prompts the harness to spin up a fresh agent session with a clean context window to handle the subtask in isolation. The subagent works through the problem using its own tools and MCP servers, then shuts down. It returns only a synthesized summary of its findings, keeping the main agent’s context window clean and focused on high-level orchestration.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.antigravity&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;LocalAgentConfig&lt;/span&gt;

&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LocalAgentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;system_instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a lead developer. Delegate heavy research to subagents.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Use a subagent to research the /docs directory and &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;write a synthesized lesson plan based on what it finds.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;chat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;text&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To prevent privilege escalation, safety policies and hooks cascade hierarchically. If the main agent is restricted from running terminal commands, those same restrictions automatically apply to any subagents it spawns. You can also intercept and inspect subagent lifecycles using the same hook middleware (PreToolCallDecideHook and PostToolCallHook) that governs regular tool calls.&lt;/p&gt;

&lt;h3&gt;
  
  
  What will you build?
&lt;/h3&gt;

&lt;p&gt;Building an agent loop is relatively straightforward, but securing and monitoring it in production is where challenges typically begin. The Antigravity SDK bridges this gap by decoupling your agent’s logic from its execution environment.&lt;/p&gt;

&lt;p&gt;To get started, review the &lt;a href="https://antigravity.google/docs/sdk-overview?utm_campaign=CDR_0x2b6f3004_default_b521271009&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;SDK overview docs&lt;/a&gt; and clone the &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python" rel="noopener noreferrer"&gt;source repository&lt;/a&gt;. Then try out one of the &lt;a href="https://github.com/google-antigravity/antigravity-sdk-python/tree/main/examples" rel="noopener noreferrer"&gt;examples&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Stay tuned for the next agent I’ll build with the Antigravity SDK! Share with me what you’re building on &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>mcpserver</category>
      <category>googlegemini</category>
      <category>python</category>
      <category>agents</category>
    </item>
    <item>
      <title>A Practical Guide to Evaluating Multi-Turn Agent Trajectories</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Wed, 27 May 2026 17:33:04 +0000</pubDate>
      <link>https://dev.to/googleai/a-practical-guide-to-evaluating-multi-turn-agent-trajectories-103b</link>
      <guid>https://dev.to/googleai/a-practical-guide-to-evaluating-multi-turn-agent-trajectories-103b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9y553biysejsf15q8gxl.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9y553biysejsf15q8gxl.jpeg" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Would you let an AI agent run in your terminal for hours, executing hundreds of tools, without being able to see what it is doing under the hood?&lt;/p&gt;

&lt;p&gt;Harnesses like &lt;a href="https://github.com/google/antigravity" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; or &lt;a href="https://docs.anthropic.com/en/docs/agents-and-tools/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; can run for hours without intervention. If you’re driving one of these systems, you’re in the driver’s seat: you pick the base model, configure the harness, add skills, and plug in &lt;a href="https://modelcontextprotocol.io" rel="noopener noreferrer"&gt;MCP servers&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;But how do you know if your configuration is working? Response-based grading won’t cut it. You need trajectory-level evaluation that goes beyond analyzing just the final answer. This post walks through a telemetry-driven framework for measuring multi-turn agent systems.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evaluating a Trajectory
&lt;/h3&gt;

&lt;p&gt;Standard LLM evaluation grades a single response to a single prompt, assessing factors like factual correctness and semantic relevance.&lt;/p&gt;

&lt;p&gt;Evaluating only the final output limits what you can see. An agent can stumble into the right answer despite a broken intermediate plan. Likewise, a minor formatting bug at the very end can mask an otherwise successful run.&lt;/p&gt;

&lt;p&gt;To evaluate an agent, you need to examine the entire trajectory: every prompt, thought, tool call, and state change across dozens of turns.&lt;/p&gt;

&lt;p&gt;Take a typical &lt;a href="https://cloud.google.com/kubernetes-engine?utm_campaign=CDR_0x2b6f3004_default_b516743609&amp;amp;utm_medium=external&amp;amp;utm_source=other" rel="noopener noreferrer"&gt;GKE&lt;/a&gt; cluster deployment trajectory. It chains multiple steps from gcloud to kubectl commands beforereporting success:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vsaqvya560bs34j1mgs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8vsaqvya560bs34j1mgs.png" width="800" height="65"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The compounding decay of sequential decision-making explains why this matters. If your agent completes ksteps, overall success probability is the product of each step’s reliability. If each step is 95% reliable, look at how quickly your overall success rate drops as the number of steps increases:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0wjeep6t22c1mur0bus.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb0wjeep6t22c1mur0bus.png" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is why performing well in single-turn demos doesn’t necessarily translate into real-world success. One failure can derail your agent or get it stuck in a loop.&lt;/p&gt;

&lt;p&gt;Public benchmarks such as &lt;a href="https://huggingface.co/papers/2311.12983" rel="noopener noreferrer"&gt;GAIA&lt;/a&gt; and &lt;a href="https://www.swebench.com/" rel="noopener noreferrer"&gt;SWE-bench&lt;/a&gt; measure general capabilities, and may not correlate with performance for your use case. Another pitfall is that &lt;a href="https://rdi.berkeley.edu/blog/trustworthy-benchmarks-cont/" rel="noopener noreferrer"&gt;researchers have shown&lt;/a&gt; that agents can game these benchmarks without actually solving the tasks. A well-defined, customized evaluation trajectory can address these challenges.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimization Metrics
&lt;/h3&gt;

&lt;p&gt;When tuning your agentic stack, tracking success rates with resource consumption is critical.&lt;/p&gt;

&lt;p&gt;Your agent’s &lt;strong&gt;success rate&lt;/strong&gt; is the cornerstone, but binary pass/fail is insufficient. You need structured milestones that provide signal even when a run fails part of the way through. When a run fails, you award partial credit for the furthest milestone reached. That gradient gives you an actionable signal for prompt and tool adjustments.&lt;/p&gt;

&lt;p&gt;Long-running agents are token amplifiers. One prompt can trigger dozens of sequential LLM calls. Because the full conversation history is re-sent each turn, input &lt;strong&gt;token usage&lt;/strong&gt; can grow quadratically. That quadratic growth drives both cost and latency. Monitoring factors like cache hit ratios and total step counts is essential for determining whether your agent is production-viable.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duration&lt;/strong&gt; , considered in clock time or number of steps, also figures into the cost analysis. A cheaper model may enter endless execution loops or fail entirely, forcing developer intervention. Meanwhile, a premium model that completes the task in fewer turns can cost less overall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tracing with OpenTelemetry
&lt;/h3&gt;

&lt;p&gt;When high-level metrics slip, you need to know why. &lt;a href="https://cloud.google.com/learn/what-is-opentelemetry?utm_campaign=CDR_0x2b6f3004_default_b516743609&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;OpenTelemetry&lt;/a&gt; exports from your agent harness to capture standardized traces, metrics, and logs for every step.&lt;/p&gt;

&lt;p&gt;At the &lt;strong&gt;interaction&lt;/strong&gt; level, you can log prompt length and turn counters. This is where you monitor plan coherence and detect loops. You can also verify result utilization (does the agent actually &lt;em&gt;use&lt;/em&gt; the tool’s output in its next planning step, or does it ignore it and rely on hallucinated memories?). Be careful about enabling &lt;code&gt;OTEL_LOG_USER_PROMPTS&lt;/code&gt; without privacy controls in place.&lt;/p&gt;

&lt;p&gt;At the &lt;strong&gt;LLM request&lt;/strong&gt; level, capture cache tokens to compute cost and model efficiency. This is where you verify tool selection accuracy: is the agent choosing the right tool? Or is it running a broad web search when it should be running a local database query?&lt;/p&gt;

&lt;p&gt;At the &lt;strong&gt;tool&lt;/strong&gt; level, log structural parameters like the MCP server name and argument payloads, pinpointing exactly which tool call failed. This is where you validate argument extraction (do the arguments match the target JSON schema?) and track active error recovery. When a tool fails, does the agent gracefully recover and try an alternative, or does it just crash?&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimizing your API Spend
&lt;/h3&gt;

&lt;p&gt;Long agent sessions are going to accumulate cost quickly. Combining the right strategies can reduce API spend significantly, while maintaining or even improving reliability.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/google-cloud/a-developers-guide-to-model-routing-1f21ecc34d60" rel="noopener noreferrer"&gt;Model routing&lt;/a&gt; analyzes prompt complexity to route routine tasks to cheaper models. &lt;a href="https://platform.claude.com/docs/en/build-with-claude/compaction" rel="noopener noreferrer"&gt;Context compaction&lt;/a&gt; uses verbatim compaction to remove low-signal lines from history, preserving exact code signatures and error codes. &lt;a href="https://ai.google.dev/gemini-api/docs/caching?utm_campaign=CDR_0x2b6f3004_default_b516743609&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Prompt caching&lt;/a&gt; natively caches stable prefixes (such as system rules and tool schemas) at the model level.&lt;/p&gt;

&lt;p&gt;One catch to keep in mind: prefix caching is sequential. Any change in the middle of your prompt invalidates the cache for everything that follows. How can you address this?&lt;/p&gt;

&lt;p&gt;First, relocate dynamic data to the tail by moving system messages like progress tracking to the end of your prompt. Wrapping them in custom XML tags like &lt;code&gt;&amp;lt;system-reminder&amp;gt;&lt;/code&gt; ensures the model still parses them correctly, isolating churn to the tail and leaving your large static system prompt and tool definitions fully cached.&lt;/p&gt;

&lt;p&gt;Second, sort your tool definitions alphabetically. This keeps tool schemas and subagent configurations byte-identical across runs, allowing different user sessions to share the same cache prefix.&lt;/p&gt;

&lt;p&gt;Finally, freeze the clock by avoiding real-time clock injections. Freezing the &lt;code&gt;datetime&lt;/code&gt; at task start (e.g., “Thursday, April 3, 2026”) ensures that minor clock ticks do not bust the cache.&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing Context Bloat
&lt;/h3&gt;

&lt;p&gt;In long-running tasks, verbose terminal logs and large file dumps clog the context window. This bloat drives up API costs and degrades model quality because the current goal gets buried in stale noise. Keeping an agent stable over multi-hour runs requires active context management.&lt;/p&gt;

&lt;p&gt;Left unmanaged, token usage grows linearly until you hit the context wall. You can tackle this at two levels: the harness and the agent architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiskp7zvg95b96o742mlb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiskp7zvg95b96o742mlb.png" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Harness level compaction&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;Modern harnesses implement auto-compaction to protect the context window. They don’t just drop the oldest messages. Instead, they use strategies like truncating massive stdout payloads, stripping out verbatim file dumps once they are no longer needed, and replacing long conversational exchanges with AI-generated summaries. For example, Claude Code triggers compaction at 95% context usage by default, but you can tune this using the &lt;code&gt;CLAUDE_AUTOCOMPACT_PCT_OVERRIDE&lt;/code&gt; parameter. Antigravity uses a similar approach, focusing on preserving the most recent working state while compressing the history.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Focusing your agent&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;While harness compaction is a great safety net, you can build active compression directly into your agent’s toolset using a focus agent pattern.&lt;/p&gt;

&lt;p&gt;You can expose two tools to the LLM to “start focus” and “complete focus.” When your agent is about to start a complex sub-task, it can call start focus to checkpoint the context. Then, once it finishes, it can call complete focus to summarize what it learned. The harness can then prune all of the messy details like logs from the context that is no longer needed.&lt;/p&gt;

&lt;p&gt;This creates a sawtooth pattern in your token usage. You keep the context window focused on the active problem instead of dragging the entire history of the session along for the ride:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs036056bhln8z7cf4h3g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs036056bhln8z7cf4h3g.png" width="800" height="447"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Applying what you’ve learned
&lt;/h3&gt;

&lt;p&gt;Building agents that run for hours requires a different playbook than single-turn LLM development. Performance decays over long sequences, so trace-level telemetry is how you see what’s actually happening. Alphabetical tool sorting, relocating dynamic data, context compaction, and model routing together can cut API costs.&lt;/p&gt;

&lt;p&gt;With the right evaluation strategy in place, you can deploy agents confidently in production. I’d love to compare notes on how you’re measuring your stack. Find me on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>largelanguagemodels</category>
      <category>machinelearning</category>
      <category>agents</category>
      <category>generativeaitools</category>
    </item>
    <item>
      <title>On-Device AI with the Google AI Edge Gallery and Gemma 4</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Mon, 06 Apr 2026 21:40:03 +0000</pubDate>
      <link>https://dev.to/googleai/on-device-ai-with-the-google-ai-edge-gallery-and-gemma-4-ena</link>
      <guid>https://dev.to/googleai/on-device-ai-with-the-google-ai-edge-gallery-and-gemma-4-ena</guid>
      <description>&lt;p&gt;Until recently, running an LLM on your phone meant one thing: chat. You could have a conversation or maybe summarize some text. You were back to the cloud the moment you needed the model to do something more.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/google-ai-edge/gallery" rel="noopener noreferrer"&gt;Google AI Edge Gallery&lt;/a&gt; app, updated with the release of the &lt;a href="https://blog.google/technology/developers/gemma-4/?utm_campaign=CDR_0x2b6f3004_default_b500092006&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemma 4&lt;/a&gt; open-weight model family, shows what’s now possible. It can generate structured code and control device settings with natural language, all running offline on your phone. This post covers the Gallery’s key features, walks through building a custom Agent Skill, and shows how to transition to &lt;a href="https://cloud.google.com/?utm_campaign=CDR_0x2b6f3004_default_b500092006&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud&lt;/a&gt; when you’re ready to try larger model variants.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/7vYh-TE2J4o"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemma 4 for Edge AI
&lt;/h3&gt;

&lt;p&gt;Let’s start with a brief introduction to Gemma 4, and how it makes agentic AI at the edge possible.&lt;/p&gt;

&lt;p&gt;The Gemma 4 family includes two edge-optimized variants that the Gallery app runs natively: &lt;strong&gt;Gemma 4 E2B&lt;/strong&gt; (Effective 2 Billion parameters) and &lt;strong&gt;Gemma 4 E4B&lt;/strong&gt; (Effective 4 Billion). “Effective” is the keyword: these models use a per-layer embedding architecture that keeps memory footprints tiny, while punching well above their weight class in reasoning benchmarks. All of the Gemma 4 models are fully open-weight, shipping under the &lt;a href="https://www.apache.org/licenses/LICENSE-2.0" rel="noopener noreferrer"&gt;Apache 2.0 license&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;What makes these models useful beyond chat is a combination of three capabilities. First, they’ve been fine-tuned for structured output. Given a tool schema, they reliably emit parsable JSON. Second, a 128K context window, accelerated locally via &lt;a href="https://github.com/google-ai-edge/LiteRT-LM" rel="noopener noreferrer"&gt;LiteRT-LM&lt;/a&gt;, gives the model enough memory to handle long conversations and multi-step interactions without losing track of earlier context. Third, multimodal vision lets E2B and E4B process images and output bounding box coordinates for UI elements, opening the door to screen-aware applications.&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/rUMvZd8m7vo"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  The Google AI Edge Gallery
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/google-ai-edge/gallery" rel="noopener noreferrer"&gt;Google AI Edge Gallery&lt;/a&gt; is an open-source app designed to showcase what on-device generative AI can actually do. It’s available right now on both major mobile platforms:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fef04q5bt9abmkqvg7hr3.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fef04q5bt9abmkqvg7hr3.jpeg" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once installed, you can download Gemma 4 E2B or E4B models directly within the app from &lt;a href="https://huggingface.co/litert-community" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; and see what a fully offline LLM can do on your hardware. The app is &lt;a href="https://github.com/google-ai-edge/gallery" rel="noopener noreferrer"&gt;entirely open-source&lt;/a&gt; (Kotlin on Android, Swift on iOS), so you can study the implementation, fork it, or use it as a reference for integrating &lt;a href="https://github.com/google-ai-edge/LiteRT-LM" rel="noopener noreferrer"&gt;LiteRT-LM&lt;/a&gt; into your own mobile apps.&lt;/p&gt;

&lt;p&gt;If you want to build function calling into your own Android app, the repo’s &lt;a href="https://github.com/google-ai-edge/gallery/blob/main/Function_Calling_Guide.md" rel="noopener noreferrer"&gt;Function Calling Guide&lt;/a&gt; walks through the Kotlin patterns for cloning the Gallery, defining custom ActionType enums, annotating tools with &lt;code&gt;@Tool&lt;/code&gt; and &lt;code&gt;@ToolParam&lt;/code&gt;, and wiring up performAction handlers. iOS developers can reference the same architectural patterns with the open-source Swift implementation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctq5fkig7xjvcl0nwdp6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fctq5fkig7xjvcl0nwdp6.png" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;
Google AI Edge Gallery UI on iOS



&lt;h3&gt;
  
  
  Prompt Lab
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/google-ai-edge/gallery/wiki/4.-Using-Core-AI-Capabilities" rel="noopener noreferrer"&gt;Prompt Lab&lt;/a&gt; gives you single-turn prompt execution with granular control over temperature, top-k, and other generation parameters. It ships with several task templates: Freeform Prompt, Summarize Text, Rewrite Tone, and Code Snippet.&lt;/p&gt;

&lt;p&gt;To try it out, select Code Snippet, choose Python, and type: &lt;em&gt;“Print the numbers 1 through 10.”&lt;/em&gt; The model generates working code on-device:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;11&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That’s a trivial example, but the point is what’s happening underneath: the model parsed a natural language instruction, selected the correct language target, and emitted structured, executable output. Swap the prompt for something harder (&lt;em&gt;“Write a function that fetches JSON from a URL and retries with exponential backoff”&lt;/em&gt;) and you’ll see the same pattern hold up.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0q2rmq3sl9ymjam1ddm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0q2rmq3sl9ymjam1ddm.png" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;
Prompt Lab UI on iOS



&lt;h3&gt;
  
  
  Agent Skills
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/google-ai-edge/gallery/tree/main/skills" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt; feature is where things get interesting. Skills are modular tool packages: each one gives the model a new capability without bloating the system prompt with instructions it doesn’t need for the current task.&lt;/p&gt;

&lt;p&gt;Each skill is defined by a SKILL.md file containing metadata and instructions. The LLM reviews available skill names and descriptions appended to its system prompt, and if a user’s request aligns with a skill, it invokes it automatically. Built-in skills include Wikipedia lookups, interactive maps, QR code generation, and mood tracking. You can load custom skills three ways: from the &lt;a href="https://github.com/google-ai-edge/gallery/tree/main/skills/featured" rel="noopener noreferrer"&gt;community-featured gallery&lt;/a&gt;, via a URL, or by importing from a local file.&lt;/p&gt;

&lt;p&gt;For developers who want to build their own skills, the architecture supports two execution paths: &lt;strong&gt;JavaScript skills&lt;/strong&gt; (custom logic running inside a hidden webview, with full access to the web ecosystem including fetch(), CDN libraries, and even WebAssembly) and &lt;strong&gt;Native App Intents&lt;/strong&gt; (leveraging built-in OS capabilities — currently sending email and text messages out of the box, with the ability to add more by &lt;a href="https://github.com/google-ai-edge/gallery/tree/main/Android/src/app/src/main/java/com/google/ai/edge/gallery/customtasks/agentchat/IntentHandler.kt" rel="noopener noreferrer"&gt;extending the app’s source code&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnta6yb5n3qkh0swvj3vr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnta6yb5n3qkh0swvj3vr.png" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;
Agent Skills UI on iOS



&lt;h3&gt;
  
  
  Mobile Actions and Beyond
&lt;/h3&gt;

&lt;p&gt;The Gallery also includes &lt;strong&gt;Mobile Actions,&lt;/strong&gt; a feature powered by a fine-tuned &lt;a href="https://huggingface.co/google/functiongemma-270m" rel="noopener noreferrer"&gt;FunctionGemma 270M&lt;/a&gt; model, that demonstrates offline device controls. These include toggling the flashlight, adjusting volume, or launching apps, all triggered by natural language.&lt;/p&gt;

&lt;p&gt;Other workspaces include &lt;strong&gt;AI Chat with Thinking Mode&lt;/strong&gt; (multi-turn conversations where you can toggle the model’s step-by-step reasoning visualization, currently supported for the Gemma 4 family), &lt;strong&gt;Ask Image&lt;/strong&gt; (multimodal object recognition and visual Q&amp;amp;A using your camera or photo gallery), &lt;strong&gt;Audio Scribe&lt;/strong&gt; (on-device voice transcription and translation), and &lt;strong&gt;Model Management &amp;amp; Benchmark&lt;/strong&gt; for profiling how each model performs on your specific hardware.&lt;/p&gt;

&lt;p&gt;For a full walkthrough of every feature, check the &lt;a href="https://github.com/google-ai-edge/gallery/wiki" rel="noopener noreferrer"&gt;Project Wiki&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ifgxyvx1xwi4so2he32.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ifgxyvx1xwi4so2he32.png" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;
Mobile Actions UI on iOS



&lt;h3&gt;
  
  
  Scaling to the Cloud
&lt;/h3&gt;

&lt;p&gt;The Edge Gallery shows you what Gemma 4 can do at the edge. When you’re ready for more power, every model in the Gemma 4 family shares the same &lt;a href="https://ai.google.dev/gemma/docs/capabilities/text/function-calling-gemma4?utm_campaign=CDR_0x2b6f3004_default_b500092006&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;chat template, tokenizer, and function-calling format&lt;/a&gt;. The prompts and skills you develop locally will work the same way with a larger Gemma 4 model running in the cloud.&lt;/p&gt;

&lt;p&gt;Google Cloud provides an &lt;a href="https://cloud.google.com/run/docs/run-gemma-on-cloud-run?utm_campaign=CDR_0x2b6f3004_default_b500092006&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;official guide for deploying Gemma 4 on Cloud Run&lt;/a&gt; using a prebuilt &lt;a href="https://docs.vllm.ai/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; container with GPU support, and &lt;a href="https://cloud.google.com/vertex-ai?utm_campaign=CDR_0x2b6f3004_default_b500092006&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Vertex AI&lt;/a&gt; offers managed endpoints with fine-tuning capabilities for enterprise deployments. The &lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt; provides the orchestration framework for building production agents on top of either target.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxonqoqol15e586qpyant.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxonqoqol15e586qpyant.png" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;
Gemma 4 in the Vertex AI Model Garden



&lt;h3&gt;
  
  
  Getting Started
&lt;/h3&gt;

&lt;p&gt;On-device AI just got a lot more capable. The Google AI Edge Gallery makes it easy to see for yourself. Here’s my roadmap to get started:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Download the&lt;/strong&gt; &lt;a href="https://github.com/google-ai-edge/gallery" rel="noopener noreferrer"&gt;&lt;strong&gt;Google AI Edge Gallery&lt;/strong&gt;&lt;/a&gt; on &lt;a href="https://play.google.com/store/apps/details?id=com.google.ai.edge.gallery" rel="noopener noreferrer"&gt;Android&lt;/a&gt; or &lt;a href="https://apps.apple.com/us/app/google-ai-edge-gallery/id6749645337" rel="noopener noreferrer"&gt;iOS&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Try the Code Snippet template&lt;/strong&gt; in the &lt;a href="https://github.com/google-ai-edge/gallery/wiki/4.-Using-Core-AI-Capabilities" rel="noopener noreferrer"&gt;Prompt Lab&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build a custom Agent Skill&lt;/strong&gt; by following the &lt;a href="https://github.com/google-ai-edge/gallery/tree/main/skills" rel="noopener noreferrer"&gt;Skills guide&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Head to the&lt;/strong&gt; &lt;a href="https://console.cloud.google.com/?utm_campaign=CDR_0x2b6f3004_default_b500092006&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Console&lt;/strong&gt;&lt;/a&gt; to spin up a larger Gemma 4 variant on &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_default_b500092006&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; or &lt;a href="https://cloud.google.com/vertex-ai?utm_campaign=CDR_0x2b6f3004_default_b500092006&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Vertex AI&lt;/a&gt; for your backend agent.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you build something cool with the Google AI Edge Gallery, I’d love to hear about it. You can find me on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;.&lt;/p&gt;




</description>
      <category>aiondevice</category>
      <category>android</category>
      <category>ios</category>
      <category>gemma</category>
    </item>
    <item>
      <title>How to Use the Gemini Deep Research API in Production</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Wed, 04 Mar 2026 16:08:05 +0000</pubDate>
      <link>https://dev.to/googleai/how-to-use-the-gemini-deep-research-api-in-production-3cif</link>
      <guid>https://dev.to/googleai/how-to-use-the-gemini-deep-research-api-in-production-3cif</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgvykoqjjzof8mwmrtxc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvgvykoqjjzof8mwmrtxc.png" alt="Cover image" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;How many of us have gone down the research rabbit hole? Way too many tabs, links, and notes in the pursuit of knowledge? It’s all useful stuff, but time-consuming and distracting.&lt;/p&gt;

&lt;p&gt;Since I discovered the &lt;a href="https://ai.google.dev/gemini-api/docs/deep-research?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini Deep Research Agent&lt;/a&gt;, I haven’t turned back. And best of all, it has a powerful and straightforward API to kick off research programmatically. Let’s explore how to use it, and the patterns for including this in a production architecture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Async changes everything
&lt;/h3&gt;

&lt;p&gt;A single research task can trigger dozens of search queries and take several minutes to complete. The asynchronous &lt;a href="https://ai.google.dev/gemini-api/docs/interactions?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Interactions API&lt;/a&gt; provides a polling-based interface with a required &lt;code&gt;background=True&lt;/code&gt; parameter to check on progress.&lt;/p&gt;

&lt;p&gt;If you’ve ever worked with a &lt;a href="https://cloud.google.com/pubsub/docs/overview?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Pub/Sub&lt;/a&gt; pipeline or job queue, this will feel familiar.&lt;/p&gt;

&lt;h3&gt;
  
  
  Meet the Interactions API
&lt;/h3&gt;

&lt;p&gt;The Interactions API is a newer, unified interface for working with Gemini models and agents. It replaces the older &lt;code&gt;generateContent&lt;/code&gt; pattern for scenarios that need state management, tool orchestration, or background execution.&lt;/p&gt;

&lt;p&gt;You create an interaction, point it at the deep research agent, and tell it to run in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Launch the research agent in the background
&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research the history and future of Solid State Batteries.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;deep-research-pro-preview-12-2025&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That call returns immediately with an interaction ID. The agent is now off doing its thing, autonomously planning search queries, reading pages, and iterating on its analysis. Your application is free to do whatever it needs to do in the meantime.&lt;/p&gt;

&lt;h3&gt;
  
  
  Polling for results
&lt;/h3&gt;

&lt;p&gt;Now you need a way to check whether the agent has finished. The status field tells you everything you need to know:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# The full research report is ready
&lt;/span&gt;        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="c1"&gt;# Still working. Check again in 10 seconds.
&lt;/span&gt;    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Taking it to production with Cloud Run
&lt;/h3&gt;

&lt;p&gt;In a notebook, a while True loop gets the job done. In production, you want something that scales, recovers from failures, and doesn’t burn compute waiting. Google Cloud offers three Cloud Run compute models that each map to a different integration pattern with the Deep Research agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cloud Run service: webhook-triggered research
&lt;/h3&gt;

&lt;p&gt;A &lt;a href="https://cloud.google.com/run/docs/overview/what-is-cloud-run?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run service&lt;/a&gt; works when you want to trigger research from an HTTP request. The service accepts the request, kicks off the agent, stores the interaction ID, and returns immediately. A separate mechanism (a &lt;a href="https://cloud.google.com/scheduler/docs/overview?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Scheduler&lt;/a&gt; cron, a &lt;a href="https://cloud.google.com/workflows/docs/overview?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Workflow&lt;/a&gt;, or a callback) handles checking the results later.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;fastapi&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;FastAPI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ResearchRequest&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="nd"&gt;@app.post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/research&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;start_research&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ResearchRequest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep-research-pro-preview-12-2025&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Store the ID for later retrieval (e.g., in Firestore or Cloud SQL)
&lt;/span&gt;    &lt;span class="nf"&gt;save_interaction_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;req&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;interaction_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;started&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cloud Run job: batch research tasks
&lt;/h3&gt;

&lt;p&gt;A &lt;a href="https://cloud.google.com/run/docs/overview/what-is-cloud-run?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run job&lt;/a&gt; is a natural fit for one-shot or scheduled research. Jobs execute code and stop, which maps cleanly to “launch, poll, write, exit.” If you have a batch of research topics, you can fan them out as parallel job tasks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run_research_job&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;RESEARCH_TOPIC&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Default research topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep-research-pro-preview-12-2025&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Poll until done
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Write the report to Cloud Storage and exit
&lt;/span&gt;            &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-research-reports&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;upload_from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;RuntimeError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Research failed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;run_research_job&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Cloud Run worker pool: continuous research dispatcher
&lt;/h3&gt;

&lt;p&gt;The most interesting option for a production pipeline is a &lt;a href="https://cloud.google.com/run/docs/deploy-worker-pools?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run worker pool&lt;/a&gt;. Worker pools are designed for continuous, non-HTTP, pull-based background processing. They don’t need a public endpoint, they don’t autoscale by default (you bring your own logic), and they cost &lt;a href="https://cloud.google.com/blog/products/serverless/exploring-cloud-run-worker-pools-and-kafka-autoscaler?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;up to 40% less&lt;/a&gt; than instance-billed services.&lt;/p&gt;

&lt;p&gt;If you’re building a system that continuously pulls research requests from a &lt;a href="https://cloud.google.com/pubsub/docs/overview?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Pub/Sub&lt;/a&gt; subscription, dispatches them to the agent, and writes completed reports to storage, a worker pool is purpose-built for that pattern.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pubsub_v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;subscriber&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pubsub_v1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SubscriberClient&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;subscription_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;projects/my-project/subscriptions/research-requests&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deep-research-pro-preview-12-2025&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Poll until done, then write results
&lt;/span&gt;    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-research-reports&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;upload_from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;nack&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Retry later
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt;
        &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Pull messages continuously (worker pool stays alive)
&lt;/span&gt;&lt;span class="n"&gt;streaming_pull&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;subscriber&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subscribe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;subscription_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;handle_message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;streaming_pull&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Grounding with your own data
&lt;/h3&gt;

&lt;p&gt;Web research is powerful, but sometimes you need the agent to work with private data or internal documents. The Deep Research agent supports a file search tool for exactly this. Think of it as RAG, but orchestrated automatically by the agent rather than wired up manually.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;interaction&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Compare our 2025 fiscal year report against current public web news.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;deep-research-pro-preview-12-2025&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;file_search_store_names&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;FILE_SEARCH_STORE_NAME&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;}]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is where the architecture gets interesting for enterprise use cases. The agent can combine internet research with grounded analysis of your internal documents, all within a single research task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Stateful follow-ups
&lt;/h3&gt;

&lt;p&gt;After a research task completes, you can ask follow-up questions that reference the original research context without re-running the entire workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;follow_up&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Can you elaborate on the key findings?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-pro-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;previous_interaction_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;interaction&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;follow_up&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Getting started
&lt;/h3&gt;

&lt;p&gt;This &lt;a href="https://colab.research.google.com/github/kweinmeister/notebooks/blob/master/deep_research.ipynb" rel="noopener noreferrer"&gt;Deep Research notebook&lt;/a&gt; walks you through the entire flow, from setting up the client to launching research tasks. For pricing details, check the &lt;a href="https://ai.google.dev/gemini-api/docs/pricing#pricing-for-agents?utm_campaign=CDR_0x2b6f3004_default_b488870862&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini API pricing page&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Ready to stop Googling and start delegating? Grab the notebook and run your first deep research task. I’d love to hear what you build with it. Come find me on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt; and share what research tasks you’re automating.&lt;/p&gt;




</description>
      <category>googlecloudrun</category>
      <category>deepresearch</category>
      <category>pubsub</category>
      <category>asynchronousprogramming</category>
    </item>
    <item>
      <title>Skills Made Easy with Google Antigravity and Gemini CLI</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Thu, 26 Feb 2026 16:52:06 +0000</pubDate>
      <link>https://dev.to/googleai/skills-made-easy-with-google-antigravity-and-gemini-cli-4chb</link>
      <guid>https://dev.to/googleai/skills-made-easy-with-google-antigravity-and-gemini-cli-4chb</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9jkxr5hhcd0zeogdlxdf.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9jkxr5hhcd0zeogdlxdf.jpeg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When you ask an AI assistant a question, you have two choices: hope its training is current, or burn through tokens reading documentation. What if you could give your agent the right answer, right away?&lt;/p&gt;

&lt;p&gt;That’s the power of &lt;strong&gt;Agent Skills&lt;/strong&gt;. Skills are reusable packages of knowledge that extend what your agent can do without overwhelming its context window. Defined with a &lt;code&gt;SKILL.md&lt;/code&gt; file, they allow you to teach your agent how to accomplish tasks consistently. Instead of forcing an agent to process an entire library’s worth of documentation at once, Skills act as &lt;strong&gt;on-demand expertise&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You can learn more about the open standard at &lt;a href="https://agentskills.io" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt; and discover community capabilities at &lt;a href="https://skills.sh" rel="noopener noreferrer"&gt;skills.sh&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In this post, we’ll explore how to manage these skills in the &lt;a href="https://geminicli.com/" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt;, a powerful terminal-native AI assistant, and &lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt;, an advanced agentic coding assistant.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing skills
&lt;/h3&gt;

&lt;p&gt;Both the Gemini CLI and Antigravity access skills by reading them from standard directories on your local machine. To add new skills, you can drop them into these locations:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5adxw6li6ear7x0zuye5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5adxw6li6ear7x0zuye5.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Managing Skills in Gemini CLI
&lt;/h3&gt;

&lt;p&gt;Gemini CLI offers built-in &lt;a href="https://geminicli.com/docs/cli/skills/" rel="noopener noreferrer"&gt;skill management&lt;/a&gt;. You can use either interactive slash commands during a session, or terminal commands:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnq8aia6riu2ya4dkf8s4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnq8aia6riu2ya4dkf8s4.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;These commands makes it easy to pull in skills from a Git repository or local directory, and manage whether they are active for your current project.&lt;/p&gt;

&lt;p&gt;For example, if you want to install a specific skill located inside a subdirectory of a larger repository (like Firebase’s &lt;code&gt;firebase-ai-logic-basics&lt;/code&gt;), you can use the &lt;code&gt;--path&lt;/code&gt; flag:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkbym3egq6v2d06fugjp.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdkbym3egq6v2d06fugjp.gif"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gemini skills &lt;span class="nb"&gt;install &lt;/span&gt;https://github.com/firebase/agent-skills.git — path skills/firebase-ai-logic-basics
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To audit which skills are currently loaded into your agent’s context, you can run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gemini skills list
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command provides a clear overview of all discovered skills across your workspace and global environments, showing their descriptions and file locations so you know exactly what expertise your agent has access to.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unified management with the skills tool
&lt;/h3&gt;

&lt;p&gt;While Gemini CLI has robust built-in tools, what if you want to manage skills across &lt;em&gt;both&lt;/em&gt; Gemini CLI and Antigravity simultaneously? Managing them by hand across the different &lt;code&gt;~/.gemini/skills/&lt;/code&gt; and &lt;code&gt;~/.gemini/antigravity/skills/&lt;/code&gt; directories can get tedious.&lt;/p&gt;

&lt;p&gt;That’s where the open-source CLI tool from &lt;a href="https://github.com/vercel-labs/skills" rel="noopener noreferrer"&gt;vercel-labs/skills&lt;/a&gt; shines. It uses a &lt;a href="https://en.wikipedia.org/wiki/Symbolic_link" rel="noopener noreferrer"&gt;symlink&lt;/a&gt; approach to easily install, update, and remove skills centrally, sharing them across multiple agents without duplicating files.&lt;/p&gt;

&lt;h3&gt;
  
  
  Getting Started with skills
&lt;/h3&gt;

&lt;p&gt;The easiest way to begin with the unified CLI is by using the add command. You can add the &lt;code&gt;-a&lt;/code&gt; or &lt;code&gt;--agent&lt;/code&gt; parameter for each client you’d like to add the skill to.&lt;/p&gt;

&lt;p&gt;For example, suppose you want to equip your agent with deep knowledge of Firebase to help build full-stack apps. You could run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add firebase/agent-skills &lt;span class="nt"&gt;-a&lt;/span&gt; gemini-cli &lt;span class="nt"&gt;-a&lt;/span&gt; antigravity
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flxdcra24h6btiue19tbg.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flxdcra24h6btiue19tbg.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;⚠️ Note that the skill will be added to the Gemini CLI even without the &lt;code&gt;-a&lt;/code&gt; parameter, as it supports the default &lt;code&gt;~/.agents/skills&lt;/code&gt; global directory. The extra parameter provided here for clarity to show both clients in one command.&lt;/p&gt;

&lt;p&gt;This installs the skill and instantly makes it available to both Gemini and Antigravity. By adding firebase/agent-skills, your agents can reliably build and deploy apps with Firebase Auth, Firestore, and more. For more details on how this skill works, read &lt;a href="https://firebase.blog/posts/2026/02/ai-agent-skills-for-firebase" rel="noopener noreferrer"&gt;Introducing Agent Skills for Firebase&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you’re looking for skills related to a specific technology, you can search for them directly from your terminal. For instance, if you’re building a mobile app, you might want to find capabilities related to Flutter. You can use the find command to discover relevant skills:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills find flutter
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zeqtayx9pmwubkb2boy.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1zeqtayx9pmwubkb2boy.gif"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This command searches the community skills registry and returns a list of matching capabilities, displaying the most popular ones first alongside their installation commands. You can quickly copy those commands to add the expertise directly to your active agents.&lt;/p&gt;

&lt;h3&gt;
  
  
  Keeping your agent’s context clean
&lt;/h3&gt;

&lt;p&gt;It’s easy to get excited and install dozens of skills. While progressive disclosure means your agent isn’t reading the &lt;em&gt;entire&lt;/em&gt; instruction manual for every skill on every prompt, simply loading the names, descriptions, and metadata of 50 different skills can still clutter the initial context window, leading to confusion or degraded performance.&lt;/p&gt;

&lt;p&gt;To keep your agents focused and efficient, make sure to keep your essential skills up-to-date with your chosen tool’s update commands. More importantly, if you find you aren’t using a skill anymore, take a moment to disable or remove it (e.g., &lt;code&gt;/skills disable &amp;lt;name&amp;gt;&lt;/code&gt; in Gemini CLI or &lt;code&gt;npx skills remove &amp;lt;name&amp;gt;&lt;/code&gt;) to free up that precious context space.&lt;/p&gt;

&lt;p&gt;By managing skills in &lt;a href="https://geminicli.com/" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt; and &lt;a href="https://antigravity.google/" rel="noopener noreferrer"&gt;Antigravity&lt;/a&gt; with the &lt;a href="https://skills.sh/docs/cli" rel="noopener noreferrer"&gt;skills CLI&lt;/a&gt;, you can tailor and organize your environment to your liking. To get more hands-on experience building skills, you can try out the &lt;a href="https://codelabs.developers.google.com/gemini-cli/how-to-create-agent-skills-for-gemini-cli#0" rel="noopener noreferrer"&gt;Agent Skills&lt;/a&gt; codelab.&lt;/p&gt;

&lt;p&gt;Have you built any interesting workflows using Agent Skills? I’d love to hear how you’re extending your agents. Share what you’ve built with me on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;!&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/UVcMo8iV7LU"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




</description>
      <category>gemini</category>
      <category>antigravity</category>
      <category>agenticai</category>
      <category>agents</category>
    </item>
    <item>
      <title>Performance shouldn’t be an afterthought: Hardening the AI-Assisted SDLC</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Mon, 26 Jan 2026 17:31:22 +0000</pubDate>
      <link>https://dev.to/googleai/performance-shouldnt-be-an-afterthought-hardening-the-ai-assisted-sdlc-45c8</link>
      <guid>https://dev.to/googleai/performance-shouldnt-be-an-afterthought-hardening-the-ai-assisted-sdlc-45c8</guid>
      <description>&lt;h3&gt;
  
  
  Performance shouldn’t be an afterthought: Hardening the AI-assisted SDLC
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh7xjx9l2ksb7z4kcgb4v.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fh7xjx9l2ksb7z4kcgb4v.jpeg"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s amazing how quickly you can now build a working application with AI assistance. It’s even more amazing how easily you can harden your application for production. But that’s a step that’s often left out of the “vibe coding” software development lifecycle, or SDLC. I hope to change that.&lt;/p&gt;

&lt;p&gt;Why does it matter? The impact of high latency is lost users, and the impact of excess memory usage is lost budget.&lt;/p&gt;

&lt;p&gt;Study after study shows that your application’s latency directly &lt;a href="https://arxiv.org/pdf/2101.09086" rel="noopener noreferrer"&gt;correlates with user satisfaction&lt;/a&gt;, a key ingredient for business success. Meanwhile, your application’s memory usage impacts your Cloud infrastructure cost. For example, &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_default_b478846417&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; offers &lt;a href="https://docs.cloud.google.com/run/docs/configuring/services/memory-limits?utm_campaign=CDR_0x2b6f3004_default_b478846417&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;memory limits&lt;/a&gt; at various tiers ranging from 512 MiB to 32 GiB. Not to mention, if you underprovision memory, your application reliability will suffer.&lt;/p&gt;

&lt;p&gt;In this post, I’ll walk through steps I recommend that ensure your application is hardened for production. I’ll use &lt;a href="https://antigravity.google" rel="noopener noreferrer"&gt;Google Antigravity&lt;/a&gt; to build an application with &lt;a href="https://github.com/kweinmeister/perplexity-calculator" rel="noopener noreferrer"&gt;sample application code&lt;/a&gt; available on GitHub.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/hEsZt_Gi-UA"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h3&gt;
  
  
  Discovery and Tool Selection
&lt;/h3&gt;

&lt;p&gt;If you aren’t an expert in the tooling ecosystem for your application’s language, use AI to bridge the gap. Avoid guessing and ask for industry standards. For example, you can ask:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“I need to profile a Python application for both CPU execution time and memory leaks. What are the most modern, low-overhead tools available? I know about cProfile, but are there better options with visualization (like flame graphs)?”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;What modern stack might your AI assistant suggest? &lt;a href="https://github.com/plasma-umass/scalene" rel="noopener noreferrer"&gt;scalene&lt;/a&gt; is a high-performance profiler whose standout capability is separating time spent in Python versus native code. To dig into memory details, &lt;a href="https://github.com/bloomberg/memray" rel="noopener noreferrer"&gt;memray&lt;/a&gt; can track allocations in native extensions and generate flame graphs that make it easy to spot areas for improvement. Finally, &lt;a href="https://pypi.org/project/pytest-benchmark/" rel="noopener noreferrer"&gt;pytest-benchmark&lt;/a&gt; is a useful plugin that handles warm-up rounds and statistical analysis automatically.&lt;/p&gt;

&lt;p&gt;If you’re writing code in other languages, the same strategy applies. You might discover &lt;a href="https://github.com/google/pprof" rel="noopener noreferrer"&gt;pprof&lt;/a&gt; for Go, &lt;a href="https://github.com/nodejs/clinic.js" rel="noopener noreferrer"&gt;clinic.js&lt;/a&gt; for Node.js, and other useful tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Establish a Baseline
&lt;/h3&gt;

&lt;p&gt;My use case is &lt;a href="https://huggingface.co/docs/transformers/en/perplexity" rel="noopener noreferrer"&gt;calculating the perplexity&lt;/a&gt; of a given text, which is helpful for AI detection and other use cases. The initial implementation started with a naïve algorithm which processes one token at a time, which isn’t uncommon when you simply ask for a solution.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seq_len&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;current_token&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;input_ids_int64&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# 1. Construct single-token input
&lt;/span&gt;    &lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_ids&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;([[&lt;/span&gt;&lt;span class="n"&gt;current_token&lt;/span&gt;&lt;span class="p"&gt;]])&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;past_key_values&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Run inference for just this token
&lt;/span&gt;    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Optimize for Speed
&lt;/h3&gt;

&lt;p&gt;While this code works, it’s slow. With our tools selected from the research phase, we can ask our AI agent to benchmark the baseline code.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Generate a Python script using pytest-benchmark to benchmark my perplexity function against a baseline. Create a mock dataset to simulate load.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Once we have a benchmark, we can then ask our AI agent to optimize it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Profile this baseline code and suggest an optimized routine. Focus on throughput.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;A standard engineering strategy to address loop overhead is &lt;a href="https://en.wikipedia.org/wiki/Array_programming" rel="noopener noreferrer"&gt;vectorization&lt;/a&gt;. The revised approach feeds the entire sequence to the model in one go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;calculate_perplexity_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1. Encode entire text at once
&lt;/span&gt;    &lt;span class="n"&gt;input_ids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tokenizer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# 2. Single inference call for the whole sequence
&lt;/span&gt;    &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;logits&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="c1"&gt;# Shape: [1, SeqLen, Vocab]
&lt;/span&gt;
    &lt;span class="c1"&gt;# 3. Vectorized loss calculation (No loops)
&lt;/span&gt;    &lt;span class="c1"&gt;# ... numpy vector operations ...
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mean_nll&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In my test environment, this change led to an overall &lt;strong&gt;2.5x&lt;/strong&gt; speed improvement over the naïve loop.&lt;/p&gt;

&lt;h3&gt;
  
  
  Optimize Memory Usage
&lt;/h3&gt;

&lt;p&gt;Unfortunately, this speed came at a cost. By loading all logits for the entire sequence into memory at once, I created an unbounded memory situation. Long documents would cause peak memory usage to spike uncontrollably. I had solved for latency, but in doing so, I had broken cost constraints.&lt;/p&gt;

&lt;p&gt;How could I prompt Antigravity to help?&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Analyze my optimized perplexity routine. The target environment is Google Cloud Run with a strict 2GB memory limit. Identify the peak memory usage and refactor the code to stay under this limit without reverting to the slow loop.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The solution balanced speed and memory, processing data in batches large enough to achieve high throughput but small enough to manage peak memory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;chunk_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;128&lt;/span&gt;
&lt;span class="n"&gt;logits_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;append_tokens&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_ids&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="n"&gt;logits_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;get_logits&lt;/span&gt;&lt;span class="p"&gt;()[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;:])&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logits_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;chunk_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Process this chunk
&lt;/span&gt;        &lt;span class="nf"&gt;_process_logits_chunk&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;logits_list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;targets_list&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Free memory immediately to clip the peak
&lt;/span&gt;        &lt;span class="n"&gt;logits_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Final Thoughts
&lt;/h3&gt;

&lt;p&gt;Before unleashing this process across your codebase, let’s be clear that performance engineering is a rigorous discipline that goes beyond optimizing functions. Industry veteran Brendan Gregg famously warns against the &lt;a href="https://www.brendangregg.com/methodology.html" rel="noopener noreferrer"&gt;Streetlight Anti-Method&lt;/a&gt;: looking for performance problems where it’s easiest, rather than where the problems actually exist.&lt;/p&gt;

&lt;p&gt;Providing your AI assistant the broader context of your application is key, and it’s easy to overlook important details in your prompting. An AI assistant doesn’t know that your production workload is 10 million rows, not the 100 rows in your test script. It can’t see that your database is missing an index or that your network bandwidth is saturated. Most importantly, an AI assistant doesn’t know your intent. If you steer it towards speeding up a query, it will focus on what you asked for, but it likely won’t ask why that data isn’t cached in the first place.&lt;/p&gt;

&lt;p&gt;With those considerations in mind, using AI as a final check is a low-risk, high-reward step. It takes minutes and often catches low-hanging fruit that is overlooked. Then, the next step is maintaining your application’s performance. Consider leveraging tools for &lt;a href="https://cloud.google.com/discover/what-is-application-monitoring?utm_campaign=CDR_0x2b6f3004_default_b478846417&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;continuous application monitoring&lt;/a&gt; to identify regressions and ensure reliability in a live environment.&lt;/p&gt;

&lt;p&gt;I’d love to hear how you’re innovating with your software development lifecycle. Connect with me on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;!&lt;/p&gt;

</description>
      <category>memorymanagement</category>
      <category>googleantigravity</category>
      <category>performance</category>
      <category>ai</category>
    </item>
    <item>
      <title>AI Agent Engineering in Go with the Google ADK</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Tue, 20 Jan 2026 16:41:44 +0000</pubDate>
      <link>https://dev.to/googleai/ai-agent-engineering-in-go-with-the-google-adk-534o</link>
      <guid>https://dev.to/googleai/ai-agent-engineering-in-go-with-the-google-adk-534o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9arbbpsbkm70dpah0ziw.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9arbbpsbkm70dpah0ziw.jpeg" width="800" height="339"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While Python &lt;a href="https://survey.stackoverflow.co/2025/technology#most-popular-technologies-language-prof-ai" rel="noopener noreferrer"&gt;remains popular&lt;/a&gt; for model training and research, the requirements for &lt;em&gt;serving&lt;/em&gt; and &lt;em&gt;orchestrating&lt;/em&gt; AI agents align closely with Go’s strengths: low latency, high concurrency, and type safety.&lt;/p&gt;

&lt;p&gt;Transitioning from a prototype to a production agent introduces engineering challenges that &lt;a href="https://go.dev/doc/install" rel="noopener noreferrer"&gt;Golang&lt;/a&gt; can handle exceptionally well. Go’s static typing eliminates runtime errors when parsing structured LLM outputs. Its &lt;a href="https://go.dev/tour/concurrency/1" rel="noopener noreferrer"&gt;lightweight goroutines&lt;/a&gt;, which start with just a &lt;a href="https://dev.to/jones_charles_ad50858dbc0/in-depth-go-concurrency-a-practical-guide-to-goroutine-performance-nee"&gt;few kilobytes&lt;/a&gt; of stack memory, allow agents to handle thousands of concurrent tool executions without the overhead of heavy thread management.&lt;/p&gt;

&lt;p&gt;In recent years, Go’s adoption for cloud-native microservices has surged: it showed the &lt;a href="https://devecosystem-2025.jetbrains.com/tools-and-trends" rel="noopener noreferrer"&gt;fourth-highest promise&lt;/a&gt; for languages, and maintained a &lt;a href="https://go.dev/blog/survey2024-h2-results" rel="noopener noreferrer"&gt;93% satisfaction rate&lt;/a&gt;. Google’s &lt;a href="https://google.github.io/adk-docs/get-started/go/" rel="noopener noreferrer"&gt;Agent Development Kit&lt;/a&gt;, or ADK, bridges the gap between these architectural advantages and generative AI.&lt;/p&gt;

&lt;p&gt;In this guide, I’ll walk through scaffolding a new project and deploying it as a secure microservice on Google Cloud.&lt;/p&gt;

&lt;h3&gt;
  
  
  Get Started with the Agent Starter Pack
&lt;/h3&gt;

&lt;p&gt;The good news is you don’t need to start from scratch. The &lt;a href="https://github.com/GoogleCloudPlatform/agent-starter-pack" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent Starter Pack&lt;/strong&gt;&lt;/a&gt; is a CLI tool that scaffolds a production-ready folder structure, including CI/CD pipelines, infrastructure configuration, and boilerplate code.&lt;/p&gt;

&lt;p&gt;To get started, just run the create command with &lt;a href="https://docs.astral.sh/uv/getting-started/installation/" rel="noopener noreferrer"&gt;uvx&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;uvx agent-starter-pack create&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;The CLI guides you through an interactive setup. For this project, I selected:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Project Name:&lt;/strong&gt; my-first-go-agent&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Template:&lt;/strong&gt; Option &lt;strong&gt;6&lt;/strong&gt; (Go ADK, Simple ReAct agent)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CI/CD:&lt;/strong&gt; Option &lt;strong&gt;3&lt;/strong&gt; (GitHub Actions)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Region:&lt;/strong&gt; us-central1&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkampj5s8no4veqyslwtp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkampj5s8no4veqyslwtp.png" width="753" height="373"&gt;&lt;/a&gt;&lt;/p&gt;
Agent Starter Pack CLI



&lt;p&gt;The tool automatically authenticates with Google Cloud, enables the necessary Vertex AI APIs, and configures your local environment. Once you see the green &lt;strong&gt;Success!&lt;/strong&gt; message, you’re good to go.&lt;/p&gt;

&lt;h3&gt;
  
  
  Web User Interface
&lt;/h3&gt;

&lt;p&gt;One of the most convenient features of the ADK is the ability to visually debug your agent before deploying it. By running the command &lt;code&gt;make install &amp;amp;&amp;amp; make playground&lt;/code&gt;, you launch a local development server with a built-in UI. Yes, it has a chat window, but it goes way beyond that by tracing events, tool calls, and more.&lt;/p&gt;

&lt;p&gt;In the screenshot below, I’m interacting with the newly created agent. The agent is configured with a &lt;a href="https://ai.google.dev/gemini-api/docs/langgraph-example?utm_campaign=CDR_0x2b6f3004_default_b476693958&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;ReAct&lt;/a&gt; (Reasoning and Acting) pattern — a framework &lt;a href="https://arxiv.org/abs/2210.03629" rel="noopener noreferrer"&gt;introduced by Yao et al. in 2022&lt;/a&gt; that has become foundational in agentic AI. The ReAct pattern’s continuous loop of “Thought,” “Action,” and “Observation” enhances problem-solving and interpretability, making the agent’s decision-making process transparent. It recognized the intent, invoked the get_weather tool, and returned the structured data (“It’s sunny and 72°F”).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwo0ok9ne047inpvdg60h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwo0ok9ne047inpvdg60h.png" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;
Agent Development Kit web user interface



&lt;h3&gt;
  
  
  Understanding the Code
&lt;/h3&gt;

&lt;p&gt;Now that we’ve seen the agent in action, let’s look at the Go code that makes this work. The logic lives in &lt;code&gt;agent/agent.go&lt;/code&gt;. This file handles tool definitions, model configuration, and initialization.&lt;/p&gt;

&lt;p&gt;The ADK uses standard Go structs to define how the Large Language Model (LLM) interacts with your code. For example, to define the input parameters for our weather tool, we simply define a struct with &lt;code&gt;json&lt;/code&gt; and &lt;code&gt;jsonschem&lt;/code&gt;a tags:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;GetWeatherArgs&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;City&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"city" jsonschema:"City name to get weather for"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GetWeatherResult defines the structure of the data returned to the agent after the tool executes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;GetWeatherResult&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;Weather&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"weather"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GetWeather is a standard Golang function that accepts &lt;a href="https://pkg.go.dev/google.golang.org/adk/tool#Context" rel="noopener noreferrer"&gt;tool.Context&lt;/a&gt; and the arguments struct, performing the business logic and returning the result struct.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;GetWeather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="n"&gt;GetWeatherArgs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;GetWeatherResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;GetWeatherResult&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;Weather&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"It's sunny and 72°F in "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;City&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The NewRootAgent function is responsible for assembling and returning the &lt;a href="https://pkg.go.dev/google.golang.org/adk/agent#Agent" rel="noopener noreferrer"&gt;agent.Agent&lt;/a&gt; instance that the application launcher requires. It begins by initializing the model configuration, creating a &lt;code&gt;gemini-2.5-flash&lt;/code&gt; model instance backed by &lt;code&gt;genai.BackendVertexAI&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Next, it bridges the gap between Go code and the LLM by wrapping the local GetWeather function into a &lt;code&gt;[functiontool]&lt;/code&gt;(&lt;a href="https://pkg.go.dev/google.golang.org/adk/tool/functiontool" rel="noopener noreferrer"&gt;https://pkg.go.dev/google.golang.org/adk/tool/functiontool&lt;/a&gt;). This step registers the tool with the name &lt;code&gt;get\_weather&lt;/code&gt; and provides the necessary description for the model’s context. Finally, it constructs the agent using &lt;a href="https://pkg.go.dev/google.golang.org/adk/agent/llmagent#New" rel="noopener noreferrer"&gt;llmagent.New&lt;/a&gt;, which combines the initialized Gemini model, the system instructions that define the agent’s behavior, and the slice of available tools into a single unit.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;NewRootAgent&lt;/code&gt; looks like this (with some error-handling removed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;NewRootAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;gemini&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"gemini-2.5-flash"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ClientConfig&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;Backend&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;BackendVertexAI&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;})&lt;/span&gt;

 &lt;span class="n"&gt;weatherTool&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;functiontool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;functiontool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"get_weather"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;Description&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"Get the current weather for a city."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
 &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;GetWeather&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

 &lt;span class="n"&gt;rootAgent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;llmagent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;New&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;llmagent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Config&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;Name&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"my-first-go-agent"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;Model&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;Description&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"A helpful AI assistant."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;Instruction&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"You are a helpful AI assistant designed to provide accurate and useful information."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="n"&gt;Tools&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="n"&gt;tool&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;weatherTool&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
 &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Testing
&lt;/h3&gt;

&lt;p&gt;The project contains both unit tests for internal logic, and end-to-end tests for server integration.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;agent/agent\_test.go&lt;/code&gt;, the GetWeather function is called with a suite of test cases, and verifies that the output string matches its expectations.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestGetWeather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
 &lt;span class="c"&gt;// tests struct initialized with "San Francisco" and "New York"&lt;/span&gt;

 &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="k"&gt;range&lt;/span&gt; &lt;span class="n"&gt;tests&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="c"&gt;// Pass nil for tool.Context since GetWeather doesn't use it&lt;/span&gt;
   &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;GetWeather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;GetWeatherArgs&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;City&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GetWeather() error = %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
   &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;strings&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Contains&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wantCity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GetWeather() = %v, want city %v in response"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Weather&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;wantCity&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;
  &lt;span class="p"&gt;})&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The end-to-end tests verify that the agent works correctly when running as a server, specifically checking that A2A or Agent-to-Agent protocol support is working correctly. The E2E tests start a real instance of the server, sending HTTP requests to it, and check the responses. Here’s a snippet from &lt;code&gt;e2e/integration/server\_e2e\_test.go&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;TestA2AMessageSend&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;T&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;testing&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Short&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Skip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Skipping E2E test in short mode"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c"&gt;// Start server (local variable to avoid race conditions)&lt;/span&gt;
    &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Starting server process"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;serverProcess&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;startServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="n"&gt;stopServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;serverProcess&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;waitForServer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="m"&gt;90&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Second&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Server failed to start"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;t&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Log&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Server process started"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can run all tests with &lt;code&gt;make test&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;make &lt;span class="nb"&gt;test                      
&lt;/span&gt;go &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; ./agent/... ./e2e/...
&lt;span class="o"&gt;===&lt;/span&gt; RUN TestGetWeather
&lt;span class="o"&gt;===&lt;/span&gt; RUN TestGetWeather/San_Francisco
&lt;span class="o"&gt;===&lt;/span&gt; RUN TestGetWeather/New_York
&lt;span class="nt"&gt;---&lt;/span&gt; PASS: TestGetWeather &lt;span class="o"&gt;(&lt;/span&gt;0.00s&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nt"&gt;---&lt;/span&gt; PASS: TestGetWeather/San_Francisco &lt;span class="o"&gt;(&lt;/span&gt;0.00s&lt;span class="o"&gt;)&lt;/span&gt;
    &lt;span class="nt"&gt;---&lt;/span&gt; PASS: TestGetWeather/New_York &lt;span class="o"&gt;(&lt;/span&gt;0.00s&lt;span class="o"&gt;)&lt;/span&gt;
PASS
ok my-first-go-agent/agent 0.218s
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deployment
&lt;/h3&gt;

&lt;p&gt;The make deploy command automatically builds your application from source using &lt;a href="https://docs.cloud.google.com/docs/buildpacks/overview?utm_campaign=CDR_0x2b6f3004_default_b476693958&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud Buildpacks&lt;/a&gt;, triggered by the &lt;code&gt;--source .&lt;/code&gt; flag. It deploys this image to &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_default_b476693958&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; with several production-optimized flags: &lt;code&gt;--memory “4Gi”&lt;/code&gt; to provide ample RAM for LLM operations, and &lt;code&gt;--no-cpu-throttling&lt;/code&gt; to ensure the CPU remains allocated 24/7. This configuration is particularly valuable for Go applications.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1qm9xk5x9jrhaute6w07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1qm9xk5x9jrhaute6w07.png" width="800" height="362"&gt;&lt;/a&gt;&lt;/p&gt;
`make deploy` builds the container and deploys to Cloud Run



&lt;p&gt;To ensure your agent runs securely, the command is enabled with a strict configuration. It uses &lt;code&gt;--no-allow-unauthenticated&lt;/code&gt; to block all public access by default, requiring Identity and Access Management (IAM) authentication for any requests. It also injects environment variables via &lt;code&gt;--update-env-vars&lt;/code&gt;, including the use of Vertex AI &lt;code&gt;GOOGLE\_GENAI\_USE\_VERTEXAI=True&lt;/code&gt;. After running the command, I have a service URL!&lt;/p&gt;

&lt;p&gt;If you want to view the deployed web UI, I recommend deploying with &lt;code&gt;make deploy IAP=true&lt;/code&gt;. This will handle the steps to &lt;a href="https://docs.cloud.google.com/iap/docs/enabling-cloud-run?utm_campaign=CDR_0x2b6f3004_default_b476693958&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;enable IAP for Cloud Run&lt;/a&gt;. You will also need to &lt;a href="https://docs.cloud.google.com/run/docs/securing/identity-aware-proxy-cloud-run#manage_user_or_group_access?utm_campaign=CDR_0x2b6f3004_default_b476693958&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;provide access to users&lt;/a&gt; within your organization following the instructions in the documentation.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b68rswjmmkvg9r2qb47.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8b68rswjmmkvg9r2qb47.png" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;
Adding a principal to IAP with the Google Cloud Console



&lt;p&gt;With IAP enabled, I can now view the web UI or the deployed &lt;a href="https://google.github.io/adk-docs/a2a/quickstart-consuming/#look-out-for-the-required-agent-card-agent-json-of-the-remote-agent" rel="noopener noreferrer"&gt;Agent Card&lt;/a&gt;. This card serves as your agent’s standard interface, allowing it to be dynamically discovered by other agents, orchestrators, or human-facing UI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wpxkxma9ehimsvjs3op.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4wpxkxma9ehimsvjs3op.png" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  What’s next?
&lt;/h3&gt;

&lt;p&gt;To continue your journey building production AI agents in Golang:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;&lt;strong&gt;ADK Documentation&lt;/strong&gt;&lt;/a&gt;: Complete guides on advanced patterns, multi-agent orchestration, and memory systems&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/GoogleCloudPlatform/agent-starter-pack" rel="noopener noreferrer"&gt;&lt;strong&gt;Agent Starter Pack&lt;/strong&gt;&lt;/a&gt;: Explore templates, including multi-agent systems and complex architectures&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/run/docs?utm_campaign=CDR_0x2b6f3004_default_b476693958&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;&lt;strong&gt;Cloud Run Documentation&lt;/strong&gt;&lt;/a&gt;: Deep dives on performance optimization, scaling strategies, and security best practices&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://go.dev/blog/pipelines" rel="noopener noreferrer"&gt;&lt;strong&gt;Go Concurrency Patterns&lt;/strong&gt;&lt;/a&gt;: Understanding goroutines and channels will help you build more efficient agent tooling&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://docs.cloud.google.com/agent-builder/agent-engine/overview?utm_campaign=CDR_0x2b6f3004_default_b476693958&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;&lt;strong&gt;Vertex AI Agent Engine&lt;/strong&gt;&lt;/a&gt;: For managed agent infrastructure with built-in orchestration and tooling&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As you scale from one agent to many, the engineering decisions we’ve discussed here compound in value. Go’s concurrency model and &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_default_b476693958&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run’s&lt;/a&gt; autoscaling are both necessary ingredients. Share what you’re building with me on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;!&lt;/p&gt;




</description>
      <category>googlecloudrun</category>
      <category>agents</category>
      <category>go</category>
      <category>ai</category>
    </item>
    <item>
      <title>The Six Failures of Text-to-SQL (And How to Fix Them with Agents)</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Tue, 11 Nov 2025 14:23:12 +0000</pubDate>
      <link>https://dev.to/googleai/the-six-failures-of-text-to-sql-and-how-to-fix-them-with-agents-1n0a</link>
      <guid>https://dev.to/googleai/the-six-failures-of-text-to-sql-and-how-to-fix-them-with-agents-1n0a</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslsqdjo6qycm1fbnelev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fslsqdjo6qycm1fbnelev.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’ve written countless SQL queries over the years. Unfortunately, like my golf game, I don’t write SQL enough to be a pro at it. Outside of straightforward SELECT statements, I approach SQL queries iteratively. I’ll inspect the tables, draft a query, and hope for the best. If there are any errors, I’ll go through this loop again.&lt;/p&gt;

&lt;p&gt;While AI models are much better than me at SQL, they aren’t perfect. And that loop I described is just as important for automated approaches to be effective. Text-to-SQL is a &lt;a href="https://arxiv.org/html/2410.01066v1" rel="noopener noreferrer"&gt;deceptively difficult problem&lt;/a&gt; with challenges including linguistic ambiguity and rare SQL operations.&lt;/p&gt;

&lt;p&gt;This is where a multi-agent architecture, built with a framework like Google’s Agent Development Kit (&lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;ADK&lt;/a&gt;), becomes essential. We can build a “virtual data analyst” by composing a team of specialized agents. A SchemaExtractor can find the right tables, a SqlGenerator can write the draft, and a SqlCorrector can critique and fix it. A SequentialAgent acts as the manager, ensuring the process is followed, every single time.&lt;/p&gt;

&lt;p&gt;In this guide, we’ll walk through the six most common failure points for Text-to-SQL and show how to solve each one by building out our team of agents, moving from a simple script to a full-fledged agentic system. We’ll use the sample project &lt;a href="https://github.com/kweinmeister/text-to-sql-agent" rel="noopener noreferrer"&gt;kweinmeister/text-to-sql-agent&lt;/a&gt; to illustrate these solutions.&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/-Vwd_9Lai38"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h3&gt;
  
  
  Problem 1: Agent Order Issues
&lt;/h3&gt;

&lt;p&gt;Here’s the issue with a single &lt;a href="https://google.github.io/adk-docs/agents/llm-agents/" rel="noopener noreferrer"&gt;LlmAgent&lt;/a&gt; that holds all the tools: &lt;em&gt;it&lt;/em&gt; decides the order of operations. It might confidently skip fetching the schema and invent a table name. Or it might try to run a query &lt;em&gt;before&lt;/em&gt; validating it. A single LLM is deciding what to do next, and it can (and will) make mistakes. That’s not a reliable process.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution: SequentialAgent for Order Control
&lt;/h3&gt;

&lt;p&gt;The ADK gives us “&lt;a href="https://google.github.io/adk-docs/agents/workflow-agents/" rel="noopener noreferrer"&gt;Workflow Agents&lt;/a&gt;” for this. These specialized agents don’t use an LLM for flow control. They’re deterministic.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://google.github.io/adk-docs/agents/workflow-agents/sequential-agents/" rel="noopener noreferrer"&gt;SequentialAgent&lt;/a&gt; is the simplest and most powerful one to start with. It runs its sub-agents in the &lt;em&gt;exact&lt;/em&gt; order you list them. Using a sequential agent also separates the concerns of “what to do” (our specialized agents) from “the order to do it in” (the workflow agent).&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://google.github.io/adk-docs/agents/workflow-agents/sequential-agents/" rel="noopener noreferrer"&gt;SequentialAgent&lt;/a&gt; also acts as a guardrail. It turns our best practices (“always get the schema first,” “always validate before running”) into enforced infrastructure, not just suggestions in a prompt.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/kweinmeister/text-to-sql-agent/blob/main/src/texttosql/agent.py" rel="noopener noreferrer"&gt;Code Example&lt;/a&gt;: Defining the Workflow Manager
&lt;/h3&gt;

&lt;p&gt;Let’s define our root agent. Instead of a single LlmAgent, our root_agent will be a SequentialAgent. We’ll start by defining the &lt;em&gt;specialists&lt;/em&gt; as stubs (we’ll build them out in the next sections):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SequentialAgent&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;schema_extractor_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sql_correction_loop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sql_generator_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.callbacks&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;capture_user_message&lt;/span&gt;

&lt;span class="n"&gt;root_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SequentialAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TextToSqlRootAgent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;before_agent_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;capture_user_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;schema_extractor_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;sql_generator_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;sql_correction_loop&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem 2: LLM Schema Hallucinations
&lt;/h3&gt;

&lt;p&gt;This is the classic failure mode. The LLM just doesn’t know your schema.&lt;/p&gt;

&lt;p&gt;A common but flawed fix is to dump the &lt;em&gt;entire&lt;/em&gt; database schema into the prompt. This backfires for two reasons. First, huge enterprise schemas won’t even fit in the context window. Second, even if they did, giving the LLM 100 irrelevant tables to find the 2 relevant ones just drowns it in noise and leads to &lt;em&gt;worse&lt;/em&gt; results.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution: Dedicated Schema-Retrieval Tool
&lt;/h3&gt;

&lt;p&gt;The answer is dynamic retrieval. Don’t give the agent a static block of schema; give it a &lt;em&gt;tool&lt;/em&gt; to &lt;em&gt;fetch&lt;/em&gt; schema. This lets the LLM reason about what it needs first, and &lt;em&gt;then&lt;/em&gt; request only that specific information.&lt;/p&gt;

&lt;p&gt;We can build a simple Python function for this. The ADK makes it easy to turn any function into an agent-callable tool with &lt;a href="https://google.github.io/adk-docs/tools/function-tools/" rel="noopener noreferrer"&gt;FunctionTool&lt;/a&gt;. The agent automatically figures out how to use it from its docstring, a best practice you’ll see in projects like &lt;a href="https://medium.com/@gabi.preda/building-agentic-applications-with-googles-adk-a-hands-on-sql-agent-example-8b30d888293f" rel="noopener noreferrer"&gt;gabrielpreda/adk-sql-agent&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/kweinmeister/text-to-sql-agent/blob/main/src/texttosql/tools.py" rel="noopener noreferrer"&gt;Code Example&lt;/a&gt;: The Schema Tool
&lt;/h3&gt;

&lt;p&gt;&lt;em&gt;💡 In the&lt;/em&gt; &lt;a href="https://github.com/kweinmeister/text-to-sql-agent" rel="noopener noreferrer"&gt;&lt;em&gt;kweinmeister/text-to-sql-agent&lt;/em&gt;&lt;/a&gt; &lt;em&gt;project, the functions are not wrapped as&lt;/em&gt; &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/tools-make-an-agent-from-zero-to-assistant-with-adk?utm_campaign=CDR_0x2b6f3004_default_b459252462&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;&lt;em&gt;tools&lt;/em&gt;&lt;/a&gt;&lt;em&gt;, since they are directly called by a deterministic agent. They are provided centrally in a&lt;/em&gt; &lt;em&gt;tools.py file, so that they can be easily leveraged as tools in a future&lt;/em&gt; &lt;a href="https://google.github.io/adk-docs/agents/llm-agents/" rel="noopener noreferrer"&gt;&lt;em&gt;LlmAgent&lt;/em&gt;&lt;/a&gt;&lt;em&gt;.&lt;/em&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.config&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DB_URI&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;.dialects.dialect&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;DatabaseDialect&lt;/span&gt;

&lt;span class="n"&gt;logger&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;logging&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getLogger&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_schema_into_state&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;dialect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;DatabaseDialect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Loads the DDL and SQLGlot schema into the state dictionary.
    This function relies on the caching mechanism within the dialect object.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loading schema for dialect: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;dialect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;db_uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;DB_URI&lt;/span&gt;
    &lt;span class="c1"&gt;# Error handling code omitted
&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Loading schema from database: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;db_uri&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# The dialect object handles its own caching.
&lt;/span&gt;        &lt;span class="c1"&gt;# The first call to get_ddl will trigger the DB query and cache the DDL.
&lt;/span&gt;        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calling dialect.get_ddl...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema_ddl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dialect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_ddl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DDL loaded successfully&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# The call to get_sqlglot_schema will use the cached DDL if available,
&lt;/span&gt;        &lt;span class="c1"&gt;# then parse it and cache the result.
&lt;/span&gt;        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Calling dialect.get_sqlglot_schema...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlglot_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dialect&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_sqlglot_schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db_uri&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQLGlot schema loaded successfully&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQLGlot schema keys: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sqlglot_schema&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;keys&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error extracting schema: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;schema_ddl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error loading schema: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sqlglot_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;error&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem 3: Query Logic Errors
&lt;/h3&gt;

&lt;p&gt;Even with the right schema, the LLM can still make logical mistakes with complex joins or aggregations. A human analyst would spot the error, critique it (“That join is wrong, you need to use user_id”), and refine it.&lt;/p&gt;

&lt;p&gt;Our &lt;a href="https://google.github.io/adk-docs/api-reference/python/google.adk.agents.html#google.adk.agents.SequentialAgent" rel="noopener noreferrer"&gt;SequentialAgent&lt;/a&gt; is too simple for this. It’s a waterfall. It can’t go backwards and iterate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution: LoopAgent for Iterative Refinement
&lt;/h3&gt;

&lt;p&gt;The ADK has another workflow agent for this: the &lt;a href="https://google.github.io/adk-docs/agents/workflow-agents/loop-agents/" rel="noopener noreferrer"&gt;LoopAgent&lt;/a&gt;. This agent runs its sub-agents &lt;em&gt;iteratively&lt;/em&gt; until a condition is met. It’s perfect for a “generate-and-critique” pattern.&lt;/p&gt;

&lt;p&gt;We don’t have to replace our SequentialAgent. We can enhance it by &lt;a href="https://medium.com/@shins777/adk-workflow-the-core-logic-of-ai-agent-8ce4be5c1c40" rel="noopener noreferrer"&gt;&lt;strong&gt;nesting workflow agents&lt;/strong&gt;&lt;/a&gt;. We’ll replace the single query generation step inside our SequentialAgent with a new LoopAgent. This loop will contain a team of two specialists:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A Writer Agent:&lt;/strong&gt; An LlmAgent that writes the SQL draft.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Critic Agent:&lt;/strong&gt; A &lt;em&gt;second LlmAgent&lt;/em&gt; with a different prompt, whose only job is to correct the writer’s SQL.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is a powerful way to get LLMs to self-correct, which improves the quality of the final query.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/kweinmeister/text-to-sql-agent/blob/main/src/texttosql/agents.py" rel="noopener noreferrer"&gt;Code Example&lt;/a&gt;: Building a “Generate-and-Critique” Loop
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;sql_generator_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sql_generator_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Generates an initial SQL query from a natural language question.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;get_generator_instruction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sql_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;after_model_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;clean_sql_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sql_corrector_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sql_corrector_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Corrects a failed SQL query.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;get_corrector_instruction&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;output_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sql_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
    &lt;span class="n"&gt;after_model_callback&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;clean_sql_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sql_correction_loop&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;LoopAgent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SQLCorrectionLoop&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;sub_agents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;sql_processor_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;sql_corrector_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;max_iterations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem 4: Agent Performance and Cost
&lt;/h3&gt;

&lt;p&gt;We’re now using three LLM-powered agents. This is great for quality, but it’s slow and costs money with every API call.&lt;/p&gt;

&lt;p&gt;What about simple, deterministic steps? Things like validating SQL syntax, formatting data, or cleaning up LLM output. Using a powerful LLM for these jobs is like using a sledgehammer to hang a picture. It’s slow, expensive, and surprisingly unreliable.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution: Custom Agents for Code-Based Logic
&lt;/h3&gt;

&lt;p&gt;The ADK isn’t just for LLMs. You can create a “&lt;a href="https://google.github.io/adk-docs/agents/custom-agents/" rel="noopener noreferrer"&gt;Custom Agent&lt;/a&gt;” by inheriting from &lt;a href="https://google.github.io/adk-docs/api-reference/python/google-adk.html#google.adk.agents.BaseAgent" rel="noopener noreferrer"&gt;BaseAgent&lt;/a&gt; and implementing the _run_async_impl method.&lt;/p&gt;

&lt;p&gt;This agent has no LLM. It runs pure Python code. It’s fast and 100% deterministic. We’ll create a custom agent for our next problem: validation.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/kweinmeister/text-to-sql-agent/blob/main/src/texttosql/agents.py" rel="noopener noreferrer"&gt;Code Example&lt;/a&gt;: Building a Non-LLM ValidationAgent
&lt;/h3&gt;

&lt;p&gt;This agent will use the sqlglot library (which we’ll discuss in detail next) and will be a custom BaseAgent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SQLProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseAgent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Agent that handles the mechanical steps of:
    1. Validating the current SQL.
    2. Executing it ONLY if validation passed.
    3. Escalating to exit the loop on successful execution.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_run_async_impl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InvocationContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncGenerator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] Starting SQL processing.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem 5: Dangerous Query Execution
&lt;/h3&gt;

&lt;p&gt;This is the big one. You can’t execute LLM-generated code directly against your database. Ever. It’s a massive security and stability risk.&lt;/p&gt;

&lt;p&gt;We need a fast, reliable check for syntax errors. What if the LLM produces a query that’s syntactically invalid? Or for the wrong SQL dialect?&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution: Non-Destructive Dry Run with sqlglot
&lt;/h3&gt;

&lt;p&gt;This is where our custom SqlValidationAgent shines. We’ll use the &lt;a href="https://github.com/tobymao/sqlglot" rel="noopener noreferrer"&gt;sqlglot&lt;/a&gt; library, a pure-Python SQL parser and transpiler.&lt;/p&gt;

&lt;p&gt;Why sqlglot? It’s fast and local, building a real Abstract Syntax Tree (AST) which is infinitely more reliable than regex. It’s also dialect-aware, so it can catch syntax errors specific to, say, PostgreSQL.&lt;/p&gt;

&lt;p&gt;We can just wrap sqlglot.parse_one(sql) in a try…except block. If it parses, the syntax is valid. If it throws a ParseError, it’s not. This gives us a fast and cheap validation signal.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/kweinmeister/text-to-sql-agent/blob/main/src/texttosql/agents.py" rel="noopener noreferrer"&gt;Code Example&lt;/a&gt;: Full ValidationAgent Implementation
&lt;/h3&gt;

&lt;p&gt;Here is the full implementation of the SqlValidationAgent we previewed with sqlglot validation.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.agents&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseAgent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InvocationContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Event&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai.types&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Part&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlglot&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sqlglot.expressions&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exp&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;AsyncGenerator&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;SQLProcessor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseAgent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Agent that handles the mechanical steps of:
    1. Validating the current SQL.
    2. Executing it ONLY if validation passed.
    3. Escalating to exit the loop on successful execution.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_run_async_impl&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InvocationContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncGenerator&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] Starting SQL processing.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;
        &lt;span class="n"&gt;dialect&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_dialect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;val_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_sql_validation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dialect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;invocation_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invocation_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;custom_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validation_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;val_result&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;val_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;exec_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;run_sql_execution&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dialect&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;result_event&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Event&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;author&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;invocation_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;invocation_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;custom_metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execution_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;exec_result&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c1"&gt;# If execution succeeds, this is the final answer.
&lt;/span&gt;            &lt;span class="c1"&gt;# Escalate to exit the loop and provide the final content.
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;exec_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;success&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] SQL execution successful. Escalating to exit loop.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;result_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;actions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;escalate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

                &lt;span class="n"&gt;final_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sql_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;final_sql_query&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;final_query&lt;/span&gt;

                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;final_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;result_event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;final_query&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="n"&gt;result_event&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] Skipping execution due to validation failure.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;execution_result&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skipped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;validation_failed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Problem 6: Messy LLM Output
&lt;/h3&gt;

&lt;p&gt;One last thing. LLMs are trained to be helpful conversationalists. So when you ask for a SQL query, you often get this:&lt;/p&gt;

&lt;p&gt;“Sure! Here is the SQL query you asked for: SELECT * FROM users;”&lt;/p&gt;

&lt;p&gt;That conversational fluff will break our SqlValidationAgent every single time. We need a way to programmatically clean the LLM’s output &lt;em&gt;before&lt;/em&gt; it’s passed to the next agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Solution: Callbacks for Post-Processing
&lt;/h3&gt;

&lt;p&gt;We could add another CustomAgent just to strip the text, but that feels a bit heavy for such a simple task.&lt;/p&gt;

&lt;p&gt;The ADK offers a more elegant solution: &lt;strong&gt;Callbacks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;An &lt;a href="https://google.github.io/adk-docs/callbacks/types-of-callbacks/#after-agent-callback" rel="noopener noreferrer"&gt;AfterAgentCallback&lt;/a&gt; is a function you attach to an agent that’s guaranteed to run immediately after the agent finishes. It can even modify the agent’s final output.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;a href="https://github.com/kweinmeister/text-to-sql-agent/blob/main/src/texttosql/callbacks.py" rel="noopener noreferrer"&gt;Code Example&lt;/a&gt;: Attaching a Cleanup Callback
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.adk.core&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;InvocationContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Content&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;cleanup_sql_output&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;InvocationContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;agent_output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Content&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    This callback runs *after* the agent and cleans its output.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;raw_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent_output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;

    &lt;span class="c1"&gt;# Simple regex to find content within ```
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;sql&lt;/span&gt;&lt;span class="bp"&gt;...&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;match = re.search(r"```
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;br&gt;
sql\s*(.&lt;em&gt;?)\s&lt;/em&gt;&lt;br&gt;
&lt;br&gt;
```", raw_text, re.DOTALL | re.IGNORECASE)&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cleaned_text = raw_text
if match:
    cleaned_text = match.group(1)
else:
    # Fallback: simple stripping
    cleaned_text = raw_text.strip().strip("`").strip()

# Add a semicolon if it's missing (another common cleanup)
if not cleaned_text.endswith(";"):
    cleaned_text += ";"

# Return a *new* Content object to *replace* the original output
return Content.from_text(cleaned_text)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;


### Final Architecture

We’ve systematically tackled the six hardest problems in Text-to-SQL, evolving a brittle script into an extensible multi-agent system.

Our final root\_agent is a [SequentialAgent](https://google.github.io/adk-docs/api-reference/python/google.adk.agents.html#google.adk.agents.SequentialAgent) that orchestrates a team of specialists: a schema-fetching agent, a looping agent for iterative query improvement (with its own writer and critic), and a fast, deterministic validation agent using sqlglot.

The point is that modern agent development is about _composition_. You have to choose the right ADK construct for the right task. This table is a cheat sheet for making that decision.

### Agent Design: The “Right Tool for the Job”

![](https://cdn-images-1.medium.com/max/1024/1*OuHQ_Jb7kQ0IaBrlWKVICg.png)

### Conclusion: Building Reliable AI Systems

This pattern of **Specialization** , **Orchestration** , and **Safeguards** is the future of building production-ready AI. It’s not just for SQL, either. You can use this same architecture for autonomous code generation, document analysis, and much more.

So stop trying to build one “super-prompt” and start building teams of specialized agents. Welcome to the world of reliable, agentic systems.

What’s next? Get started in 3 simple steps in the [sample repository](https://github.com/kweinmeister/text-to-sql-agent). If you want a hands-on lab exercise, check out [Build Multi-Agent Systems with ADK](https://codelabs.developers.google.com/codelabs/production-ready-ai-with-gc/3-developing-agents/build-a-multi-agent-system-with-adk?hl=en#0&amp;amp;utm_campaign=CDR_0x2b6f3004_default_b459252462&amp;amp;utm_medium=external&amp;amp;utm_source=blog). To learn about powerful, built-in natural language capabilities in AlloyDB, try out the [AlloyDB AI NL SQL](https://codelabs.developers.google.com/alloydb-ai-nl-sql?hl=en#0&amp;amp;utm_campaign=CDR_0x2b6f3004_default_b459252462&amp;amp;utm_medium=external&amp;amp;utm_source=blog) codelab.

Want to keep the discussion going about multi-agent systems? Connect with me on [LinkedIn](https://www.linkedin.com/in/karlweinmeister/), [X](https://x.com/kweinmeister), or [Bluesky](https://bsky.app/profile/kweinmeister.bsky.social).

* * *
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>googleadk</category>
      <category>relationaldatabases</category>
      <category>sql</category>
      <category>agents</category>
    </item>
    <item>
      <title>Deploy Faster with Terraform: Your Guide to vLLM on GKE with Infrastructure-as-Code</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Sun, 12 Oct 2025 23:24:44 +0000</pubDate>
      <link>https://dev.to/googleai/deploy-faster-with-terraform-your-guide-to-vllm-on-gke-with-infrastructure-as-code-6jh</link>
      <guid>https://dev.to/googleai/deploy-faster-with-terraform-your-guide-to-vllm-on-gke-with-infrastructure-as-code-6jh</guid>
      <description>&lt;p&gt;Somewhere in your AI journey, you’re going to push the limits of what models can do.&lt;/p&gt;

&lt;p&gt;You might need to squeeze out that extra bit of performance, or try to fit a big model right under a GPU’s VRAM limit. All of these situations require tweaking and redeployment. That’s not as simple as it sounds, when the infrastructure includes everything from GPU clusters to storage to networking.&lt;/p&gt;

&lt;p&gt;The solution is to treat your infrastructure the same way you treat your application code. It needs to be versioned in Git. It needs to be tested. And it needs to be deployed through an automated pipeline. This practice, known as &lt;a href="https://cloud.google.com/docs/iac?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Infrastructure as Code&lt;/a&gt;, or IaC, is the foundation of any serious MLOps strategy.&lt;/p&gt;

&lt;p&gt;This article is a practical guide on how to use Terraform for agile ML engineering. I’ll walk through a real-world example of deploying a high-with &lt;a href="https://docs.vllm.ai/en/stable/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; on &lt;a href="https://cloud.google.com/kubernetes-engine?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Kubernetes Engine&lt;/a&gt;. You can follow along with the complete source code on GitHub in the &lt;a href="https://github.com/kweinmeister/vllm-gke-terraform" rel="noopener noreferrer"&gt;vllm-gke-terraform&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;We will use the &lt;a href="https://huggingface.co/Qwen/Qwen3-32B" rel="noopener noreferrer"&gt;Qwen3–32B&lt;/a&gt; model in this article, which can be run on easily accessible &lt;a href="https://cloud.google.com/blog/products/compute/introducing-g2-vms-with-nvidia-l4-gpus?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;NVIDIA L4 GPUs&lt;/a&gt; on Google Cloud. The Terraform script has been tested on larger models, such as the &lt;a href="https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507" rel="noopener noreferrer"&gt;Qwen/Qwen3–235B-A22B-Instruct-2507&lt;/a&gt; on a cluster with 8 H100 GPUs.&lt;/p&gt;

&lt;p&gt;The scripts currently use GKE standard clusters for maximum flexibility. For production workloads where you want to offload node management and focus purely on the application, it’s recommended to leverage GKE &lt;a href="https://cloud.google.com/blog/products/containers-kubernetes/gke-autopilot-now-available-to-all-qualifying-clusters?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Autopilot&lt;/a&gt; capabilities.&lt;/p&gt;

&lt;h4&gt;
  
  
  Declarative Infrastructure
&lt;/h4&gt;

&lt;p&gt;Terraform uses a declarative language (&lt;a href="https://developer.hashicorp.com/terraform/language/syntax/configuration" rel="noopener noreferrer"&gt;HCL&lt;/a&gt;) where you define the desired end state of your infrastructure. You specify what you need, and Terraform’s engine calculates the necessary API calls to make the real-world infrastructure match that state. Before applying any changes, you can run the terraform plan command to see a detailed preview of what Terraform will create, modify, or destroy.&lt;/p&gt;

&lt;p&gt;This allows for a thorough review to ensure the proposed changes align with your intentions, preventing unintended modifications. This declarative model is the key to eliminating configuration drift and ensuring that every environment is provisioned identically, a critical requirement for reproducible experiments.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs" rel="noopener noreferrer"&gt;Terraform provider for Google Cloud&lt;/a&gt; is the interface between Terraform and Google Cloud. For example, the &lt;a href="https://registry.terraform.io/providers/hashicorp/google/latest/docs/resources/container_cluster" rel="noopener noreferrer"&gt;google_container_cluster&lt;/a&gt; resource is used to manage a GKE cluster. You can find the full set of GKE resources &lt;a href="https://cloud.google.com/kubernetes-engine/docs/terraform?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In our project, the &lt;a href="https://github.com/kweinmeister/vllm-gke-terraform/blob/main/gke.tf" rel="noopener noreferrer"&gt;gke.tf&lt;/a&gt; file declares the desired state of a GKE cluster with specific node pools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# gke.tf
resource "google_container_cluster" "qwen_cluster" {
  name = local.cluster_name
  location = var.zone
  project = var.project_id
  # ...
}

resource "google_container_node_pool" "gpu_pools" {
  # ...
  node_config {
    machine_type = each.value.machine_type
    guest_accelerator {
      type = each.value.accelerator_type
      count = each.value.accelerator_count
    }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To manage this, Terraform maintains a state file that maps these definitions to their real-world resources. For team collaboration, using a remote state backend like &lt;a href="https://cloud.google.com/storage?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Storage&lt;/a&gt; is recommended. It provides a centralized source of truth and uses locking mechanisms to prevent conflicting changes. Here’s how to instruct Terraform to use GCS as its backend:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# main.tf
terraform {
  backend "gcs" {
    prefix = "terraform/state/vllm-gke"
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Reusable Modules
&lt;/h4&gt;

&lt;p&gt;Terraform modules are the primary mechanism for abstraction and reuse. MLOps teams can create a library of standardized modules for common components like a GKE cluster or a vector database.&lt;/p&gt;

&lt;p&gt;Modules are made reusable through input variables. This allows an engineer to maintain a single, version-controlled set of Terraform files and use variable files (.tfvars) to launch new, isolated deployments.&lt;/p&gt;

&lt;p&gt;To test a new model, you could simply create a new variable file like llama3-test.tfvars. By overriding a few default values, you can spin up an entirely new, isolated environment to test Llama-3–8B on L4 GPUs:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# my-experiment.tfvars
project_id = "my-gcp-project"
name_prefix = "my-llama3-deployment"
model_id = "meta-llama/Llama-3-8B-Instruct"
gpu_type = "l4"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Running terraform apply -var-file=llama3-test.tfvars makes spinning up parallel experiments a trivial, declarative operation, dramatically increasing a team’s experimental throughput.&lt;/p&gt;

&lt;p&gt;For production systems, this same principle allows for sophisticated, zero-downtime strategies like Blue/Green deployments. A second, parallel “green” version of the entire stack is deployed by instantiating the Terraform configuration with a different set of variables. Once the new environment is fully validated, production traffic can be instantly switched at the load balancer or DNS level. The old “blue” environment can then be decommissioned. By codifying these complex release strategies, the entire deployment process becomes a version-controlled, auditable artifact.&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuring the vLLM Engine
&lt;/h4&gt;

&lt;p&gt;Provisioning hardware consistently is the first step. Configuring software to utilize that hardware efficiently is next.&lt;/p&gt;

&lt;p&gt;The sample project uses the popular &lt;a href="https://docs.vllm.ai/en/stable/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; inference engine. Let’s show how to effectively link Terraform variables to configuration parameters in vLLM.&lt;/p&gt;

&lt;p&gt;In &lt;a href="https://github.com/kweinmeister/vllm-gke-terraform/blob/main/variables.tf" rel="noopener noreferrer"&gt;variables.tf&lt;/a&gt;, the high-level knobs for experiments are defined:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# variables.tf
variable "gpu_memory_utilization" {
  description = "GPU memory utilization ratio"
  type = number
  default = 0.9
}

variable "max_model_len" {
  description = "The maximum model length."
  type = number
  default = 8192
}

variable "vllm_max_num_seqs" {
  description = "The maximum number of sequences (requests) to batch together."
  type = number
  default = 64
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then, the deployment in kubernetes.tf consumes these variables to construct the vLLM server’s startup arguments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# kubernetes.tf
...
container {
  name = "vllm-container"
  args = compact([
    # --- Base Model Arguments ---
    "--model",
    var.model_id,
    "--tensor-parallel-size",
    tostring(local.gpu_config.accelerator_count),

    # --- Performance Tuning from Variables ---
    "--gpu-memory-utilization",
    tostring(var.gpu_memory_utilization),
    "--max-model-len",
    tostring(var.max_model_len),
    "--max-num-seqs",
    tostring(var.vllm_max_num_seqs),
  ])
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Production-Grade Architecture
&lt;/h3&gt;

&lt;p&gt;The &lt;a href="https://github.com/kweinmeister/vllm-gke-terraform" rel="noopener noreferrer"&gt;sample project&lt;/a&gt; showcases a blueprint for a production-grade inference endpoint on GKE designed for both performance and cost-efficiency.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2x3aembmosmnkoiypgxx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2x3aembmosmnkoiypgxx.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://github.com/kweinmeister/vllm-gke-terraform/blob/main/gke.tf" rel="noopener noreferrer"&gt;gke.tf&lt;/a&gt; file provisions a GKE cluster with both &lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/spot-vms?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;spot&lt;/a&gt; and on-demand GPU node pools, which allows for a flexible and cost-effective approach to managing expensive GPU resources. You can read more &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/running-gke-application-spot-nodes-demand-nodes-fallback?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;here&lt;/a&gt; about the strategy to back up spot VMs with an on-demand node pool.&lt;/p&gt;

&lt;p&gt;To avoid re-downloading large models on every pod restart, a &lt;a href="https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;kubernetes_persistent_volume_claim&lt;/a&gt; is created in &lt;a href="https://github.com/kweinmeister/vllm-gke-terraform/blob/main/kubernetes.tf" rel="noopener noreferrer"&gt;kubernetes.tf&lt;/a&gt; to provide a persistent cache for the Hugging Face models. A Kubernetes Job, defined in &lt;a href="https://github.com/kweinmeister/vllm-gke-terraform/blob/main/kubernetes_jobs.tf" rel="noopener noreferrer"&gt;kubernetes_jobs.tf&lt;/a&gt;, is then used to download the specified model into this persistent volume. This job runs to completion before the main vLLM deployment is scaled up, ensuring the model is ready before the inference server starts.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated Workflows
&lt;/h3&gt;

&lt;p&gt;While Terraform itself is a big leap forward from shell scripting, it’s crucial that teams don’t stop there. The next step beyond running manual terraform commands is to embrace an automated, end-to-end CI/CD workflow, often called GitOps. The source control repository becomes the single source of truth for both application code and infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljoz3vquz97smpgo3ise.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fljoz3vquz97smpgo3ise.png"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The sample project includes a basic GitHub Actions workflow that validates the Terraform code on every push and pull request.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# .github/workflows/terraform-validate.yml
name: 'Terraform Validate'
on: [push, pull_request]

jobs:
  validate:
    name: 'Terraform Validate'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: hashicorp/setup-terraform@v3
      - run: terraform fmt -check -recursive
      - run: terraform init -backend=false
      - run: terraform validate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A complete CI/CD pipeline would extend this by running terraform plan on pull requests to preview changes and automatically running terraform apply on merge to the main branch to deploy them. This creates a flywheel where code is pushed and infrastructure is updated without manual intervention.&lt;/p&gt;

&lt;h3&gt;
  
  
  Infrastructure-as-Code is a now an AI Competency
&lt;/h3&gt;

&lt;p&gt;The main takeaway is this: mastering &lt;a href="https://cloud.google.com/docs/iac?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Infrastructure as Code&lt;/a&gt; isn’t an optional “DevOps” skill. It’s a core competency for the modern ML engineer. For any organization serious about productionizing AI, &lt;a href="https://cloud.google.com/docs/terraform?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Terraform on Google Cloud&lt;/a&gt; is the a key step toward building a scalable engineering culture.&lt;/p&gt;

&lt;p&gt;If you’d like to keep learning more, I recommend the step-by-step guide on using a GKE cluster with Terraform: &lt;a href="https://cloud.google.com/kubernetes-engine/docs/quickstarts/create-cluster-using-terraform?utm_campaign=CDR_0x2b6f3004_user-journey_b450531330&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Quickstart: Deploy a workload with Terraform&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;From there, I’d love to hear more about your journey with AI and Cloud infrastructure. Connect on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt; to continue the discussion!&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/qXsAJhIlV9E"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;




</description>
      <category>vllm</category>
      <category>gke</category>
      <category>terraform</category>
      <category>ai</category>
    </item>
    <item>
      <title>A Developer’s Guide to Model Routing</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Mon, 25 Aug 2025 16:26:04 +0000</pubDate>
      <link>https://dev.to/kweinmeister/a-developers-guide-to-model-routing-85m</link>
      <guid>https://dev.to/kweinmeister/a-developers-guide-to-model-routing-85m</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qdnw2dr0rvhbqfq2ntb.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qdnw2dr0rvhbqfq2ntb.jpeg" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Not long ago, building with LLMs meant picking one general-purpose model and sticking with it. Today, the landscape is flooded with thousands of options: large and small, open and closed-source, generalist and specialist, each with unique capabilities and costs.&lt;/p&gt;

&lt;p&gt;This explosion of choice has fundamentally changed how we build AI applications. The one-size-fits-all approach is over.&lt;/p&gt;

&lt;p&gt;Instead, we architect systems that select the best model for each task. This is the idea behind model routing. This architectural pattern can be implemented today, and has the potential to change the economics of model inference. Let’s get into it!&lt;/p&gt;

&lt;h3&gt;
  
  
  Understanding Model Routing
&lt;/h3&gt;

&lt;p&gt;As a developer building with LLMs, you’re constantly juggling three competing priorities: performance, cost, and latency.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Performance (Quality):&lt;/strong&gt; For complex reasoning and creative generation, you might reach for state-of-the-art models like Google’s &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini 2.5 Pro&lt;/a&gt;. These models deliver high-quality, accurate responses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost:&lt;/strong&gt; While premium models deliver state-of-the-art performance, they represent a significant investment. The key to a sustainable AI strategy is to reserve these powerful models for tasks where their advanced capabilities provide a clear return on investment. For more routine queries, smaller, highly efficient models can deliver excellent results at a fraction of the cost. Recent studies show this approach can yield &lt;a href="https://lmsys.org/blog/2024-07-01-routellm/" rel="noopener noreferrer"&gt;cost savings&lt;/a&gt; without significantly degrading performance.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency:&lt;/strong&gt; In interactive applications like chatbots, a fast response time is critical for a positive user experience. Smaller, specialized models can deliver near-instantaneous responses, making them ideal for real-time, conversational AI. By routing interactive queries to these faster models, you can create a more engaging and responsive application.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Relying on a single model forces an unnecessary compromise. Use a top-tier model for everything, and you pay a premium for power you don’t always need. Use a smaller model for everything, and you sacrifice quality on complex queries. So why are we still forcing ourselves to choose just one?&lt;/p&gt;

&lt;p&gt;Model routing is an architectural pattern designed to solve this optimization problem. It involves maintaining a pool of candidate LLMs and routing each incoming prompt to the most suitable model. That’s often the smallest, fastest, and most cost-effective model that can successfully complete the task.&lt;/p&gt;

&lt;h3&gt;
  
  
  Common Routing Patterns
&lt;/h3&gt;

&lt;p&gt;Implementing a model router involves choosing an architectural pattern that determines how routing decisions are made. These patterns exist on a spectrum of complexity and intelligence, from simple, predefined rules to sophisticated, AI-driven classification. We will focus on dynamic routing patterns that assess the content, intent, and complexity of the prompt to select the optimal model.&lt;/p&gt;

&lt;h3&gt;
  
  
  Rule-Based Routing
&lt;/h3&gt;

&lt;p&gt;This is the simplest form of dynamic routing. It uses hard-coded logic, typically a series of if/else statements, to make routing decisions based on simple characteristics of the prompt.&lt;/p&gt;

&lt;p&gt;The rules are based on easily measurable attributes of the prompt, such as the presence of certain keywords, its overall length, or matches against regular expressions. For instance, a system might check for specific terms to identify a task category or measure the prompt’s length to estimate its complexity.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; This approach is predictable, transparent, and fast to execute. It’s an excellent choice for well-defined, simple workflows where task categories can be reliably distinguished by straightforward heuristics.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; Rule-based systems are brittle and inflexible because they lack a true understanding of language. They can be easily confused by semantic nuance, such as negation or context. The system also becomes difficult to maintain and scale as the number of rules grows.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  LLM-Based Routing
&lt;/h3&gt;

&lt;p&gt;This pattern leverages the intelligence of an LLM to perform the routing task itself. A dedicated, often smaller and faster, “router LLM” acts as a classification engine.&lt;/p&gt;

&lt;p&gt;The user’s prompt is fed into the router LLM. The router LLM is given a prompt that instructs it to analyze the query and classify it into predefined categories. To ensure the output is machine-readable, the router LLM is instructed to respond in a structured format like JSON. The application then parses this JSON output to determine which model to call next.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; This is a powerful and flexible approach. The router LLM can understand complex, ambiguous, and nuanced language. It can handle multi-intent queries and can be adapted to new routing tasks simply by updating its system prompt.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; The primary drawback is significant overhead. This method introduces an additional, full LLM API call into the critical path of every request. This adds both cost and latency, which can undermine the goals of optimization the router was intended to achieve.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Semantic Routing
&lt;/h3&gt;

&lt;p&gt;Semantic routing offers a powerful compromise, combining the speed of rule-based systems with the intelligence of LLM-based approaches. It operates on the principle of semantic similarity in vector space and is the core mechanism we’ll implement.&lt;/p&gt;

&lt;p&gt;The process involves four steps. First, routes are defined, each with a name and a list of representative example phrases, or utterances. Next, a text embedding model converts all of these utterances into high-dimensional numerical vectors that capture their semantic meaning, which are then stored in an efficient index. When a new user query arrives, the same embedding model converts it into a vector. Finally, a vector similarity search is performed between the query’s vector and all the utterance vectors in the index, and the route whose utterances are most similar to the query is selected as the winner.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Pros:&lt;/strong&gt; This method is fast, with decision times often in the milliseconds, because it relies on optimized vector math rather than a slow, generative LLM call. It’s highly scalable to thousands of potential routes and is more robust than simple keyword matching because it understands meaning and context. Modern libraries often allow this configuration to be externalized into declarative files like YAML, separating the routing logic from the application code for better maintainability.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cons:&lt;/strong&gt; The effectiveness of a semantic router is highly dependent on the quality and comprehensiveness of the example utterances provided for each route. It can also struggle with contextual, multi-turn conversational queries where the user’s intent is not explicitly stated in their most recent message.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The choice of routing architecture is governed by the “Router Latency Paradox”: a component designed to reduce overall application latency must itself be exceptionally low-latency. An LLM-based router introduces a full inference step to every request, increasing both latency and cost. For this approach to be a net positive, the downstream savings must consistently outweigh its operational overhead, which is a high bar for most interactive applications. Semantic routing, in contrast, replaces this slow inference with a near-instantaneous vector search. This performance difference establishes semantic routing as the default architectural best practice for dynamic, real-time model routing. LLM-based routing is thus reserved for cases where the routing logic is too complex to be captured by semantic similarity alone and the added latency is an acceptable trade-off.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Gemini 2.5 Model Family
&lt;/h3&gt;

&lt;p&gt;To build an effective router, you need a solid grasp of the candidate models in your pool. For our implementation, we’ll use Google’s Gemini 2.5 family, a suite of models with a tiered structure of capability and cost that’s perfect for a routing architecture.&lt;/p&gt;

&lt;p&gt;A key innovation across the Gemini 2.5 family is their capability as “thinking models.” This means they can be configured to perform internal reasoning steps, akin to a chain of thought, before generating a final response. This feature, controllable via an API parameter known as the “thinking budget,” can significantly improve performance and accuracy on complex tasks. This controllable reasoning becomes another powerful dimension for our routing logic to consider.&lt;/p&gt;

&lt;h3&gt;
  
  
  Gemini 2.5 Pro
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capabilities:&lt;/strong&gt; &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-pro?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini 2.5 Pro&lt;/a&gt; is Google’s flagship model, engineered for maximum performance and state-of-the-art accuracy. It’s optimized for the most complex and demanding tasks, including deep logical reasoning, advanced code generation, and sophisticated multimodal understanding across text, images, audio, and video.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Router Use Case:&lt;/strong&gt; This is our designated &lt;strong&gt;“strong” model&lt;/strong&gt;. We’ll route only the most challenging queries here: prompts that involve complex problem-solving, novel algorithm design, in-depth analysis of dense technical documents, or multi-step logical puzzles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking:&lt;/strong&gt; For this model, the “thinking” capability is on by default, as it’s integral to its high-end performance.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Gemini 2.5 Flash
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capabilities:&lt;/strong&gt; &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini 2.5 Flash&lt;/a&gt; is designed to be the best model in the family in terms of its price-to-performance ratio. It offers well-rounded, powerful capabilities that approach those of Pro but at a significantly lower operational cost. It also features a controllable thinking budget.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Router Use Case:&lt;/strong&gt; This is our &lt;strong&gt;“default” or “go-to” model&lt;/strong&gt;. It’s the workhorse that will handle the majority of general-purpose queries. These are tasks that are more complex than simple classification but don’t require the full power (and expense) of Pro. Ideal use cases include general conversation, creative writing, drafting emails, and performing detailed summarizations.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Gemini 2.5 Flash-Lite
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Capabilities:&lt;/strong&gt; As its name suggests, &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/models/gemini/2-5-flash-lite?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini 2.5 Flash-Lite&lt;/a&gt; is the fastest and most cost-efficient model in the 2.5 family. It’s highly optimized for low latency and high-throughput scenarios, making it a cost-effective upgrade from previous generations of Flash models.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Router Use Case:&lt;/strong&gt; This is our &lt;strong&gt;fastest model&lt;/strong&gt;. We’ll route simple, high-volume, and latency-sensitive tasks here. It’s perfect for text classification (e.g., sentiment analysis), simple data extraction (e.g., pulling names and dates from text), translation, and answering straightforward factual questions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Thinking:&lt;/strong&gt; To maximize its speed and cost-efficiency, “thinking” is turned off by default for Flash-Lite. However, it can be optionally enabled, providing granular control for tasks that might need a small boost in reasoning without escalating to the full Flash model.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Implementing a Semantic Router
&lt;/h3&gt;

&lt;p&gt;With the theory covered, let’s get to the code. This section walks through the &lt;a href="https://github.com/kweinmeister/gemini-model-router" rel="noopener noreferrer"&gt;gemini-model-router&lt;/a&gt; project, which builds a semantic router to intelligently distribute queries among the Gemini 2.5 Pro, Flash, and Flash-Lite models. It uses the open-source &lt;a href="https://github.com/aurelio-labs/semantic-router" rel="noopener noreferrer"&gt;semantic-router&lt;/a&gt; library as its engine and serves it all up with &lt;a href="https://fastapi.tiangolo.com/" rel="noopener noreferrer"&gt;FastAPI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flywxo6ybkf9c2d7czg6b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flywxo6ybkf9c2d7czg6b.png" width="800" height="596"&gt;&lt;/a&gt;&lt;br&gt;
&lt;em&gt;Embeddings are created upfront for each route, and then matched to queries at runtime&lt;/em&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  Project Setup
&lt;/h3&gt;

&lt;p&gt;To get started, clone the repository and follow the setup instructions in the &lt;a href="http://readme.md" rel="noopener noreferrer"&gt;README.md&lt;/a&gt; file, which covers creating the .env file and installing the required dependencies from requirements.txt.&lt;/p&gt;
&lt;h3&gt;
  
  
  Centralizing Configuration
&lt;/h3&gt;

&lt;p&gt;A key architectural decision in the gemini-model-router project is the separation of configuration from code. All routing logic, including the routes, their representative utterances, and the specific LLM assigned to each route, is defined in a single &lt;a href="https://github.com/kweinmeister/gemini-model-router/blob/main/router.yaml" rel="noopener noreferrer"&gt;router.yaml&lt;/a&gt; file. This makes the system highly maintainable and easy to modify without changing the application’s Python code.&lt;/p&gt;

&lt;p&gt;The router.yaml file has two main sections:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;encoder&lt;/strong&gt; : Specifies the embedding model to use for converting text to vectors. In this case, it uses Google’s &lt;a href="https://console.cloud.google.com/vertex-ai/publishers/google/model-garden/gemini-embedding-001?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;gemini-embedding-001&lt;/a&gt; via the semantic-router’s &lt;a href="https://docs.aurelio.ai/semantic-router/client-reference/encoders/google" rel="noopener noreferrer"&gt;GoogleEncoder&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;routes&lt;/strong&gt; : A list of route definitions. Each route has:&lt;/li&gt;
&lt;/ol&gt;

&lt;ul&gt;
&lt;li&gt;name: A unique identifier that maps directly to a Gemini model.&lt;/li&gt;
&lt;li&gt;description: A human-readable explanation of the route’s purpose.&lt;/li&gt;
&lt;li&gt;utterances: A list of example phrases that define the semantic space of the route.&lt;/li&gt;
&lt;li&gt;llm: An object specifying the custom class (GoogleLLM), the Python module where it’s defined (main), and the target model ID (e.g., gemini-2.5-pro).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is a snippet from the router.yaml file, defining the route for complex queries. A key parameter in the full configuration is the score_threshold. When the router compares a query to its routes, it calculates a similarity score. By setting the threshold to 0.0, we ensure that the router always selects the route with the highest similarity, effectively guaranteeing that a decision is always made.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# router.yaml
encoder_name: gemini-embedding-001
encoder_type: google
routes:
- name: gemini-2.5-pro
  description: For complex, multi-step tasks requiring deep reasoning, code generation, and analysis of large documents.
  utterances:
  - Develop a comprehensive, multi-year business plan for a direct-to-consumer sustainable
    fashion brand, including financial projections and marketing strategies.
  - Write a Python script to perform sentiment analysis on a large CSV of customer
    reviews, generate visualizations, and create a summary report.
  - Compare and contrast the philosophical implications of determinism and free will
    in the context of advanced artificial intelligence, citing relevant academic sources.
  llm:
    module: main
    class: GoogleLLM
    model: gemini-2.5-pro
#... other routes for flash and flash-lite follow...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Routing Logic
&lt;/h3&gt;

&lt;p&gt;The main.py file contains the FastAPI application that serves the router. It includes several key components that work together to bring the YAML configuration to life.&lt;/p&gt;

&lt;h4&gt;
  
  
  The GoogleLLM Wrapper
&lt;/h4&gt;

&lt;p&gt;The semantic-router library requires a compatible LLM object for each route. To integrate with Google’s GenAI SDK, the project defines a custom GoogleLLM class that inherits from &lt;a href="https://docs.aurelio.ai/semantic-router/client-reference/llms/base" rel="noopener noreferrer"&gt;semantic_router.llms.BaseLLM&lt;/a&gt;. This class acts as a bridge, translating the semantic-router’s call signature into an asynchronous request to the Vertex AI Gemini API.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# main.py (simplified)
from semantic_router.llms import BaseLLM
from google import genai

class GoogleLLM(BaseLLM):
    _client: ClassVar[Optional[genai.Client]] = None

    @classmethod
    def get_client(cls) -&amp;gt; genai.Client:
        if cls._client is None:
            project_id = os.getenv("GOOGLE_CLOUD_PROJECT")
            cls._client = genai.Client(vertexai=True, project=project_id)
        return cls._client

    async def __acall__ (self, messages: List[Message], **kwargs) -&amp;gt; Optional[str]:
        contents = kwargs.get("multimodal_contents", messages[0].content)
        config = kwargs.get("config", self.kwargs.get("config", {}))

        response = await self.get_client().aio.models.generate_content(
            model=self.name,
            contents=contents,
            **config,
        )
        return response.text if response else ""
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  The /query Endpoint
&lt;/h4&gt;

&lt;p&gt;The main API endpoint uses a series of helper functions to route and execute the query. The handle_query function orchestrates the process: it extracts text for routing, determines the best route, and executes the LLM call.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# main.py (simplified)
@app.post("/query", response_model=RouterResponse)
async def handle_query(request: QueryRequest, fastapi_request: Request):
    router = fastapi_request.app.state.router
    default_route = fastapi_request.app.state.default_route_name

    # 1. Extract text and determine the route
    text_for_routing = _get_text_for_routing(request.contents)
    route_choice = _determine_route(router, text_for_routing, default_route)
    chosen_route = router.get(route_choice.name)

    # 2. Execute the call using the LLM from the chosen route
    model_response = await _execute_llm_call(
        chosen_route, request.contents, request.config, text_for_routing
    )

    return RouterResponse(
        route_name=chosen_route.name, model_response=model_response
    )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploying to Production
&lt;/h3&gt;

&lt;p&gt;While FastAPI’s web server &lt;a href="https://www.uvicorn.org/" rel="noopener noreferrer"&gt;uvicorn&lt;/a&gt; is perfect for local development, a production deployment requires a robust, scalable hosting environment. &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; is an ideal choice for this service because it’s a fully managed, serverless platform that takes your containerized application (including the Uvicorn server) and handles all the underlying infrastructure, scaling, and request management.&lt;/p&gt;

&lt;p&gt;To deploy the router, you first need to have the Google Cloud SDK installed and configured. Then, you can deploy the service with a single command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud run deploy gemini-model-router \
  --source . \
  --region us-central1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command builds a container from your source code, pushes it to the Artifact Registry, and deploys it as a public-facing service. Cloud Run handles all the infrastructure, so you can focus on the application logic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Production Best Practices
&lt;/h3&gt;

&lt;p&gt;Deploying a model router to production requires building an observable and resilient system. An API management platform like Google Cloud’s &lt;a href="https://cloud.google.com/apigee/api-management?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Apigee&lt;/a&gt; can serve as a unified and secure gateway to your model routing service. It can provide essential capabilities like enforcing security policies, managing traffic with rate limiting and quotas, and offering deep visibility through analytics and monitoring. Let’s review the key principles needed to move beyond a proof-of-concept.&lt;/p&gt;

&lt;p&gt;First, treat the router as a mission-critical, standalone service. Because it can be a single point of failure and a performance bottleneck, it must be independently scalable and fault-tolerant. Containerize the router and deploy it on a platform like &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt; to ensure high availability, allowing it to scale independently of the applications that consume it.&lt;/p&gt;

&lt;p&gt;Second, you cannot optimize what you cannot measure. Implement comprehensive logging and monitoring for every routing decision. For each request, log the chosen route, similarity score, final model, latency, and estimated cost. This data can be fed into &lt;a href="https://cloud.google.com/observability?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud’s observability suite&lt;/a&gt; to create dashboards for tracking key performance indicators like route distribution, cost per query, and P99 latency. This allows you to set up alerts for anomalies, such as a sudden shift in routing patterns or an increase in fallback rates.&lt;/p&gt;

&lt;p&gt;Third, the initial configuration is just a starting point. True optimization requires a data-driven feedback loop. Collect and review production queries to identify misrouted requests, and use this analysis to refine your route utterances. A/B testing frameworks are invaluable for comparing different routing strategies or model configurations in a live environment to validate improvements.&lt;/p&gt;

&lt;p&gt;Finally, enterprise-grade reliability requires planning for failure. Implement a chain of fallbacks that goes beyond a simple default route. For instance, if a request to gemini-2.5-pro fails, the system should automatically retry with exponential backoff. If that also fails, it should fall back to the next best model, gemini-2.5-flash.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Future of Model Routing
&lt;/h3&gt;

&lt;p&gt;There is a broader trend towards more modular and dynamic AI architectures, and model routing is no exception. The future of model routing could include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Multimodal Routing:&lt;/strong&gt; The next logical step is routing on more than just text. The current router simplifies the problem by extracting the text from a multimodal prompt, but the concept of vector similarity works for any modality you can embed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hierarchical Routing:&lt;/strong&gt; The concept of system-level model routing is a macro-scale analog of what &lt;a href="https://huggingface.co/blog/moe" rel="noopener noreferrer"&gt;Mixture-of-Experts&lt;/a&gt; or MoE architectures do within a single neural network. In an MoE model, an internal “router” network dynamically selects which “expert” sub-networks should process each token of an input sequence. Our external router does the same thing, but its “experts” are entire, independent LLMs. Future systems may employ hierarchical routing, where a top-level semantic router first selects the best specialized MoE model for a task, which then performs its own fine-grained, internal routing to process the request.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Ultimately, model routing is a foundational building block for the next generation of complex, multi-agent AI systems. As we’ve shown, the combination of a powerful model family like &lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/models?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google’s Gemini 2.5&lt;/a&gt;, a serverless platform like &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_user-journey_b440933914&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;, and the open-source &lt;a href="https://github.com/kweinmeister/gemini-model-router" rel="noopener noreferrer"&gt;gemini-model-router&lt;/a&gt; project makes this advanced architecture an achievable engineering task. The tools are here. The patterns are clear.&lt;/p&gt;

&lt;p&gt;It’s time to start building. Share what you’ve built with me on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister" rel="noopener noreferrer"&gt;X&lt;/a&gt;, or &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;!&lt;/p&gt;




</description>
      <category>largelanguagemodels</category>
      <category>googlecloudrun</category>
      <category>routing</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Mastering Agentic Development with Gemini and Roo Code</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Sun, 20 Jul 2025 04:56:25 +0000</pubDate>
      <link>https://dev.to/kweinmeister/mastering-agentic-development-with-gemini-and-roo-code-4j64</link>
      <guid>https://dev.to/kweinmeister/mastering-agentic-development-with-gemini-and-roo-code-4j64</guid>
      <description>&lt;p&gt;The conversation around AI in software development has matured beyond the “AI as a chatbot” and into sophisticated AI agents. We’re moving toward building a living blueprint that can reason about your code in its entirety and evolve with it over time.&lt;/p&gt;

&lt;p&gt;For developers who want a powerful, all-in-one AI experience, Google’s &lt;a href="https://cloud.google.com/gemini/docs/codeassist/overview?utm_campaign=CDR_0x2b6f3004_user-journey_b431570178&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini Code Assist&lt;/a&gt; is a fantastic solution that provides a seamless, out-of-the-box experience, bringing the power of Gemini directly into your workflow.&lt;/p&gt;

&lt;p&gt;For those who love to assemble best-in-class technologies from the open ecosystem, this article is for you. We will explore a production-ready stack for those who want a customized and self-hosted solution. This stack combines the &lt;a href="https://roocode.com/" rel="noopener noreferrer"&gt;Roo Code&lt;/a&gt; VS Code extension, powered by Google’s underlying &lt;a href="https://ai.google.dev/?utm_campaign=CDR_0x2b6f3004_default_b431570178&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini models&lt;/a&gt;, and takes it to the next level with a self-hosted &lt;a href="https://qdrant.tech/" rel="noopener noreferrer"&gt;Qdrant&lt;/a&gt; vector database on &lt;a href="https://cloud.google.com/kubernetes-engine?utm_campaign=CDR_0x2b6f3004_user-journey_b431570178&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Kubernetes Engine&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F944%2F0%2AT6PRKm-ZJFVlSYr6" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F944%2F0%2AT6PRKm-ZJFVlSYr6"&gt;&lt;/a&gt;&lt;/p&gt;
Solution architecture for agentic development with Roo Code, Gemini, and Qdrant



&lt;h3&gt;
  
  
  Solution Components
&lt;/h3&gt;

&lt;p&gt;Roo Code is a VS Code extension that can be thought of as an “AI Dev Team” with modes ranging from Architect to Debug. You can give it a high-level task, like “refactor this module to use the new logging service,” and it will create a plan, identify the necessary code changes, and execute them across multiple files. For a deeper dive, check out the &lt;a href="https://docs.roocode.com/" rel="noopener noreferrer"&gt;Roo Code documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F419%2F0%2ALlT1pB1La4zfEaTX" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F419%2F0%2ALlT1pB1La4zfEaTX"&gt;&lt;/a&gt;&lt;/p&gt;
Using Roo Code to update a project README based on the current codebase



&lt;p&gt;You can take full advantage of Roo Code’s capabilities with the massive context window available in Gemini models. This allows Roo Code to hold a vast amount of code in its “short-term memory,” enabling it to understand the intricate relationships between files and modules and to generate code that is consistent with the entire project. You can learn more about the Gemini API in the &lt;a href="https://ai.google.dev/gemini-api/docs?utm_campaign=CDR_0x2b6f3004_default_b431570178&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;official documentation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;To make the use of this large context window efficient, Roo Code leverages prompt caching, a feature &lt;a href="https://x.com/roo_code/status/1915590059291811873" rel="noopener noreferrer"&gt;now available&lt;/a&gt; in Gemini models. When Roo Code sends the initial instructions and context to the model, Gemini generates an internal representation and returns a cache reference. On subsequent requests, Roo Code can send this cache reference instead of the full prompt, dramatically reducing token usage and improving latency, which is a key feature for making the system both cost-effective and performant.&lt;/p&gt;

&lt;p&gt;For codebase indexing, Roo Code supports Gemini’s gemini-embedding-001 state-of-the-art &lt;a href="https://deepmind.google/research/publications/157741/" rel="noopener noreferrer"&gt;embedding model&lt;/a&gt;. This is crucial for the accuracy of the semantic search, and you can find more information on Gemini’s &lt;a href="https://ai.google.dev/gemini-api/docs/embeddings?utm_campaign=CDR_0x2b6f3004_default_b431570178&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;embedding models here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using Gemini Models in Roo Code
&lt;/h3&gt;

&lt;p&gt;The connection between Roo Code and a model is what enables its agentic capabilities: planning, executing commands, and writing code across your entire project. You can connect to Gemini’s models through the Gemini API or through Google Cloud’s Vertex AI.&lt;/p&gt;

&lt;p&gt;To use the Gemini API, you simply create an API key in &lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Google AI Studio&lt;/strong&gt;&lt;/a&gt;, then in Roo Code’s settings, select the &lt;strong&gt;Google Gemini&lt;/strong&gt; provider, paste your key, and choose a model. For detailed, step-by-step instructions on this process, refer to the &lt;a href="https://docs.roocode.com/providers/gemini" rel="noopener noreferrer"&gt;Roo Code documentation for the Gemini provider&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;For teams and enterprises using Google Cloud, connecting via &lt;strong&gt;Vertex AI&lt;/strong&gt; provides unified billing, IAM permissions, and more. You will create a service account with the “Vertex AI User” role in the Google Cloud Console and download its JSON key file. Within Roo Code’s settings, select the &lt;strong&gt;GCP Vertex AI&lt;/strong&gt; provider, provide the credentials from your JSON key, and enter your Project ID and Region. The &lt;a href="https://docs.roocode.com/providers/vertex" rel="noopener noreferrer"&gt;Roo Code documentation for Vertex AI&lt;/a&gt; provides a complete walkthrough of this setup.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F365%2F0%2A-vyV1YXVRzrlsUcU" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F365%2F0%2A-vyV1YXVRzrlsUcU"&gt;&lt;/a&gt;&lt;/p&gt;
The Vertex AI LLM provider for Gemini in Roo Code



&lt;p&gt;For both connection methods, we recommend starting with &lt;strong&gt;gemini-2.5-pro&lt;/strong&gt; for the best experience. Its powerful reasoning capabilities and large context window are ideal for complex, multi-step tasks. For faster, more cost-effective use, &lt;strong&gt;gemini-2.5-flash&lt;/strong&gt; is an excellent alternative.&lt;/p&gt;

&lt;p&gt;With Roo Code’s reasoning engine now powered by Gemini, the next step is to give it a persistent, long-term memory of your code.&lt;/p&gt;

&lt;h3&gt;
  
  
  Codebase Indexing
&lt;/h3&gt;

&lt;p&gt;Codebase indexing creates a semantic “long-term memory” of your code that the agent can access at any time. This is a multi-stage process that transforms your source code into a searchable knowledge base.&lt;/p&gt;

&lt;h4&gt;
  
  
  Intelligent Chunking
&lt;/h4&gt;

&lt;p&gt;First, Roo Code uses &lt;a href="https://tree-sitter.github.io/tree-sitter/" rel="noopener noreferrer"&gt;Tree-sitter&lt;/a&gt; to parse your code into an Abstract Syntax Tree (AST). This gives it a deep, structural understanding of your code, just like a compiler does. Instead of arbitrarily splitting a file every few hundred lines, the AST is used to intelligently chunk the code into complete, semantic blocks.&lt;/p&gt;

&lt;p&gt;This “semantic chunking” means the pieces of code being indexed are meaningful and self-contained units, such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A complete function or method.&lt;/li&gt;
&lt;li&gt;An entire class or struct definition.&lt;/li&gt;
&lt;li&gt;A specific configuration block.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures that the context isn’t lost by splitting a function in half. For unsupported languages, Roo Code falls back to line-based chunking.&lt;/p&gt;

&lt;h3&gt;
  
  
  Generating Embeddings
&lt;/h3&gt;

&lt;p&gt;Once the code is broken down into these intelligent chunks, the next step is to capture their semantic meaning in a way a machine can understand. This is where Gemini’s gemini-embedding-001 model comes in.&lt;/p&gt;

&lt;p&gt;Each semantic chunk produced by Tree-sitter is fed into the embedding model, which outputs a high-dimensional numerical vector. This vector is the &lt;strong&gt;embedding&lt;/strong&gt;  — a mathematical representation of the code’s meaning. The Gemini embedding model captures fine details with 3072 dimensions in every embedding. For a deeper dive into &lt;a href="https://arxiv.org/pdf/2205.13147" rel="noopener noreferrer"&gt;Matryoshka Representation Learning&lt;/a&gt;, a technique used to train the model, see this video:&lt;/p&gt;

&lt;p&gt;  &lt;iframe src="https://www.youtube.com/embed/VQosEgOw84s"&gt;
  &lt;/iframe&gt;
&lt;/p&gt;

&lt;h3&gt;
  
  
  Storing and Searching Embeddings
&lt;/h3&gt;

&lt;p&gt;With the codebase converted into a collection of semantically-rich embeddings, they need a place to be stored and searched efficiently. Roo Code uses Qdrant, a high-performance vector database, for this purpose.&lt;/p&gt;

&lt;p&gt;When you ask a question, Roo Code’s search tool follows this process:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Query:&lt;/strong&gt; Your natural language query (e.g., “where is our user authentication logic?”) is sent to the Gemini embedding model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Vectorize:&lt;/strong&gt; The model converts your query into an embedding vector.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Search:&lt;/strong&gt; Roo Code performs a vector search in the Qdrant database, looking for the code chunk embeddings that are most similar (i.e., closest in vector space) to your query’s embedding.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve:&lt;/strong&gt; The tool then returns the most relevant code snippets, along with their file paths and similarity scores.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Roo Code also provides a user-friendly interface for configuring the codebase indexer. You can easily select your embedding provider, enter your API keys, and specify the Qdrant URL. The advanced configuration options allow you to fine-tune the search behavior by adjusting the Search Score Threshold and Maximum Search Results. You can also specify which files to ignore by adding patterns to a .rooignore file.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F413%2F0%2AgE-jXByfgTOYcx-8" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F413%2F0%2AgE-jXByfgTOYcx-8"&gt;&lt;/a&gt;&lt;/p&gt;
Indexing a codebase in Roo Code



&lt;h3&gt;
  
  
  From Local to Centralized Indexing
&lt;/h3&gt;

&lt;p&gt;The easiest way to get started is with a local Qdrant instance. As the official &lt;a href="https://qdrant.tech/documentation/quickstart/" rel="noopener noreferrer"&gt;Qdrant Quickstart&lt;/a&gt; shows, you can be up and running in minutes with a single Docker command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker run -p 6333:6333 -v "$(pwd)/qdrant_storage:/qdrant/storage:z" qdrant/qdrant
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For an individual developer, this is a fantastic way to get all the benefits of codebase indexing without any external dependencies.&lt;/p&gt;

&lt;p&gt;As your team grows, managing dozens of individual Docker instances can become cumbersome. This is where a centralized Qdrant instance provides value — not as a single, conflict-prone shared index, but as a managed, cost-effective platform to host a &lt;em&gt;fleet&lt;/em&gt; of personal indexes.&lt;/p&gt;

&lt;p&gt;Google Kubernetes Engine, or GKE, is an excellent choice for this, offering high availability and enterprise-grade security. The principle is the same regardless of the platform: provide a robust, central service to host many isolated environments. You can deploy the infrastructure within minutes using the &lt;a href="https://cloud.google.com/kubernetes-engine/docs/tutorials/deploy-qdrant?utm_campaign=CDR_0x2b6f3004_user-journey_b431570178&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;GKE tutorial for deploying Qdrant&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Using the instructions in the tutorial, you can easily access it from your local system using &lt;a href="https://kubernetes.io/docs/tasks/access-application-cluster/port-forward-access-application-cluster/" rel="noopener noreferrer"&gt;port forwarding&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PROJECT_ID="your-project-id"
REGION="us-central1"

gcloud container clusters get-credentials qdrant-cluster --region "$REGION" --project "$PROJECT_ID"

kubectl port-forward service/qdrant 6333:6333
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Roo Code generates a unique Qdrant collection name by hashing the absolute local workspace path. This means that even when using a central Qdrant instance, each developer’s index is completely isolated. To avoid conflicts, each developer needs to ensure they are using a different path:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Developer A: /Users/alice/projects/my-app&lt;/li&gt;
&lt;li&gt;Developer B: /Users/bob/projects/my-app&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;The future of AI-assisted development is about choice. Whether you prefer a powerful, all-in-one solution like Google’s &lt;a href="https://cloud.google.com/gemini/docs/codeassist/overview?utm_campaign=CDR_0x2b6f3004_user-journey_b431570178&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Gemini Code Assist&lt;/a&gt; for a seamless, integrated experience, or the composable stack detailed in this article, the goal is the same: to create a truly intelligent development environment.&lt;/p&gt;

&lt;p&gt;What will you build with Gemini and Roo Code? Feel free to continue the discussion on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt;.&lt;/p&gt;




</description>
      <category>roocode</category>
      <category>googlegemini</category>
      <category>embedding</category>
      <category>aicodingassistant</category>
    </item>
    <item>
      <title>Getting started with Rust on Google Cloud</title>
      <dc:creator>Karl Weinmeister</dc:creator>
      <pubDate>Thu, 27 Mar 2025 04:29:49 +0000</pubDate>
      <link>https://dev.to/googlecloud/getting-started-with-rust-on-google-cloud-4hln</link>
      <guid>https://dev.to/googlecloud/getting-started-with-rust-on-google-cloud-4hln</guid>
      <description>&lt;p&gt;This post will guide you through deploying a simple “Hello, World!” application on &lt;a href="https://cloud.google.com/run?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run&lt;/a&gt;. You’ll then extend the application by showing how to integrate with Google Cloud services with experimental &lt;a href="https://github.com/googleapis/google-cloud-rust" rel="noopener noreferrer"&gt;Rust client libraries&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I’ll cover the necessary code, Dockerfile configuration, and deployment steps. I’ll also recommend a robust and scalable stack for building web services, especially when combined with Google Cloud’s serverless platform, Cloud Run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AX_eDJ5lRKkKc64Ut" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F1024%2F0%2AX_eDJ5lRKkKc64Ut" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Why Rust and Axum?
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://www.rust-lang.org/" rel="noopener noreferrer"&gt;Rust&lt;/a&gt; has gained significant traction in backend development, earning the title of &lt;a href="https://survey.stackoverflow.co/2024/technology#2-programming-scripting-and-markup-languages" rel="noopener noreferrer"&gt;most-admired language&lt;/a&gt; in the StackOverflow 2024 Developer Survey. This popularity stems from its core strengths: performance, memory safety, and reliability. Rust’s low-level control and zero-cost abstractions enable &lt;a href="https://nnethercote.github.io/perf-book/title-page.html" rel="noopener noreferrer"&gt;highly performant&lt;/a&gt; applications. Its &lt;a href="https://doc.rust-lang.org/book/ch04-00-understanding-ownership.html" rel="noopener noreferrer"&gt;ownership system&lt;/a&gt; prevents common programming errors like data races and null pointer dereferences. In addition, Rust’s strong &lt;a href="https://doc.rust-lang.org/reference/type-system.html" rel="noopener noreferrer"&gt;type system&lt;/a&gt; and compile-time checks catch errors early in the development process, leading to more reliable software.&lt;/p&gt;

&lt;p&gt;The Rust web framework ecosystem is vibrant and evolving. Popular choices include &lt;a href="https://github.com/tokio-rs/axum" rel="noopener noreferrer"&gt;Axum&lt;/a&gt;, &lt;a href="https://rocket.rs/" rel="noopener noreferrer"&gt;Rocket&lt;/a&gt;, and &lt;a href="https://github.com/actix/actix-web" rel="noopener noreferrer"&gt;Actix&lt;/a&gt;. In this post, I’ll showcase &lt;a href="https://github.com/tokio-rs/axum" rel="noopener noreferrer"&gt;Axum&lt;/a&gt;, but you can apply what you’ve learned here to other Rust web frameworks. Axum’s API is clear and composable, making it easy to build web services. Its modular architecture allows developers to select only the necessary components. Axum is built on &lt;a href="https://tokio.rs/" rel="noopener noreferrer"&gt;Tokio&lt;/a&gt;, a popular asynchronous runtime for Rust, which allows it to handle concurrency and I/O operations efficiently.&lt;/p&gt;

&lt;h3&gt;
  
  
  Hello World Application
&lt;/h3&gt;

&lt;p&gt;Let’s start by exploring a basic “Hello, World!” &lt;a href="https://github.com/tokio-rs/axum/tree/main/examples/hello-world" rel="noopener noreferrer"&gt;example&lt;/a&gt; from the official Axum repository. In each section of this blog post, you will enhance the example to leverage Google Cloud capabilities. You can access the final code sample in the &lt;a href="https://github.com/kweinmeister/cloud-rust-example" rel="noopener noreferrer"&gt;cloud-rust-example&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;First, the &lt;a href="https://github.com/tokio-rs/axum/blob/main/examples/hello-world/Cargo.toml" rel="noopener noreferrer"&gt;Cargo.toml&lt;/a&gt; manifest file defines the project’s metadata and dependencies:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[package]&lt;/span&gt;
&lt;span class="py"&gt;name&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"example-hello-world"&lt;/span&gt;
&lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"0.1.0"&lt;/span&gt;
&lt;span class="py"&gt;edition&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"2021"&lt;/span&gt;
&lt;span class="py"&gt;publish&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;

&lt;span class="nn"&gt;[dependencies]&lt;/span&gt;
&lt;span class="py"&gt;axum&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;path&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"../../axum"&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="py"&gt;tokio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;version&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"1.0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="py"&gt;features&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"full"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Within this file, you see:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;[package]&lt;/code&gt;: Contains basic project information like name, version, and the Rust edition. &lt;code&gt;publish = false&lt;/code&gt; prevents accidental publication.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;[dependencies]&lt;/code&gt;: Lists the project’s dependencies — &lt;code&gt;axum&lt;/code&gt; for the web framework and &lt;code&gt;tokio&lt;/code&gt; for asynchronous capabilities.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, let’s examine the core application code, &lt;a href="https://github.com/tokio-rs/axum/blob/main/examples/hello-world/src/main.rs" rel="noopener noreferrer"&gt;src/main.rs&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;use&lt;/span&gt; &lt;span class="nn"&gt;axum&lt;/span&gt;&lt;span class="p"&gt;::{&lt;/span&gt;&lt;span class="nn"&gt;response&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;Html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nn"&gt;routing&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="nd"&gt;#[tokio::main]&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// build our application with a route&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="c1"&gt;// run it&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;TcpListener&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"127.0.0.1:3000"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;.await&lt;/span&gt;
        &lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"listening on {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="nf"&gt;.local_addr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="nn"&gt;axum&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Html&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&amp;amp;&lt;/span&gt;&lt;span class="k"&gt;'static&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;Html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;h1&amp;gt;Hello, World!&amp;lt;/h1&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code sets up a minimal web server using Axum and Tokio. The #[tokio::main] macro enables asynchronous execution. The main function creates a Router to handle requests, defines a single route / that responds with “Hello, World!”, binds the server to 127.0.0.1:3000, and starts the server. The handler function generates the HTML response for the root route.&lt;/p&gt;

&lt;h3&gt;
  
  
  Enhancements for Cloud Run
&lt;/h3&gt;

&lt;p&gt;The basic example above works well for local development, but let’s make some improvements for deploying to Cloud Run. The official example notably does &lt;em&gt;not&lt;/em&gt; include a Dockerfile, which is required for Cloud Run.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Standalone Deployment:&lt;/strong&gt; To make the example standalone and deployable, modify the Cargo.toml file. Change the axum dependency from &lt;code&gt;axum = { path = “../../axum” }&lt;/code&gt; to &lt;code&gt;axum = “0.8”&lt;/code&gt; to use the published version of Axum from &lt;a href="http://crates.io" rel="noopener noreferrer"&gt;crates.io&lt;/a&gt; instead of the local path.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Dynamic Port Configuration:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Cloud Run dynamically assigns a port to your application, which is provided through the PORT environment variable. The original example hardcodes the port to 3000. To make our application Cloud Run-compatible, modify the main function to read the PORT environment variable and use it if available, falling back to a default port such as 8080 if the variable is not set.&lt;/p&gt;

&lt;p&gt;The address should also be changed to 0.0.0.0 to listen on all network interfaces, which is generally preferred for containerized applications.&lt;/p&gt;

&lt;p&gt;Here’s the modified main function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nd"&gt;#[tokio::main]&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// Get the port from the environment, defaulting to 8080&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;env&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"PORT"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.unwrap_or_else&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="s"&gt;"8080"&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;addr&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"0.0.0.0:{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// build our application with a route&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

    &lt;span class="c1"&gt;// run it&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;tokio&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;net&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;TcpListener&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;bind&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;.await&lt;/span&gt;
        &lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="nd"&gt;println!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"listening on {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="nf"&gt;.local_addr&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="nn"&gt;axum&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;serve&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;listener&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.unwrap&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;3. Dockerfile:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;To deploy to Cloud Run, you’ll need a Dockerfile. Here’s a simple one that works well for this example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; rust:1.85.1&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /app&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;cargo build &lt;span class="nt"&gt;--release&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 8080&lt;/span&gt;
&lt;span class="k"&gt;CMD&lt;/span&gt;&lt;span class="s"&gt; ["./target/release/example-hello-world"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This Dockerfile uses the official &lt;a href="https://hub.docker.com/_/rust" rel="noopener noreferrer"&gt;Rust image&lt;/a&gt; as a base, copies the project files, builds the application in release mode, exposes port 8080 (&lt;a href="https://cloud.google.com/run/docs/container-contract#port?utm_campaign=CDR_default_0x80ca756c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;the default port&lt;/a&gt;), and sets the command to run the compiled executable. You can upgrade to the latest Rust image if you’d like.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. .gcloudignore file:&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;You can also add a .gcloudignore file to the project root to exclude unnecessary files (like the target directory containing build artifacts) from the deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.git/
.gitignore
target/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Deploying to Cloud Run
&lt;/h3&gt;

&lt;p&gt;Before deploying, ensure you have the &lt;a href="https://cloud.google.com/sdk/docs/install-sdk?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Google Cloud SDK&lt;/a&gt; installed and configured, and you have &lt;a href="https://console.cloud.google.com/flows/enableapi?apiid=run.googleapis.com&amp;amp;utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;enabled the Cloud Run API&lt;/a&gt; in your Google Cloud project. You’ll also need to be in the root directory of your Axum project (where the Cargo.toml file is located).&lt;/p&gt;

&lt;p&gt;Before attempting your deployment, you can &lt;a href="https://doc.rust-lang.org/cargo/commands/cargo-check.html" rel="noopener noreferrer"&gt;check&lt;/a&gt; the local package and deployment for errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo check
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To deploy directly to Cloud Run &lt;a href="https://cloud.google.com/run/docs/deploying-source-code?utm_campaign=CDR_default_0x80ca756c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;from source&lt;/a&gt;, use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy cloud-rust-example &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s what each part of the command means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;gcloud run deploy cloud-rust-example&lt;/code&gt;: This is the base command to deploy a service to Cloud Run. &lt;code&gt;cloud-rust-example&lt;/code&gt; is the name we’re giving to our service. You can choose a different name.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;—-source .&lt;/code&gt;: This flag tells Cloud Run where to find the source code for your application. The . indicates the current directory. Cloud Run will use the Dockerfile in this directory to build a container image.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;—-region us-central1&lt;/code&gt;: This specifies the Google Cloud region where your service will be deployed. In this case, we’re using us-central1. You can choose a region closer to your users for lower latency.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;—-allow-unauthenticated&lt;/code&gt;: This flag makes your deployed service publicly accessible without requiring authentication. This is convenient for initial testing and simple public services. &lt;strong&gt;For production applications, you should remove this flag and implement proper authentication and authorization.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Cloud Run will automatically build and deploy your application. You will be provided with a service URL in the output. Accessing this URL in your browser will display the “Hello, World!” message.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F573%2F0%2AQJmbsamFXPgavNTB" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F573%2F0%2AQJmbsamFXPgavNTB" width="573" height="135"&gt;&lt;/a&gt;&lt;/p&gt;
Hello world output from / route



&lt;h3&gt;
  
  
  Integrating with Google Cloud Services
&lt;/h3&gt;

&lt;p&gt;Let’s now show how to integrate our application with Google Cloud services. I’ve selected a straightforward scenario that doesn’t require any project configuration to work. You’ll add a new application route &lt;code&gt;/project&lt;/code&gt; that will display information about your project.&lt;/p&gt;

&lt;p&gt;To implement this, you’ll use the &lt;a href="https://github.com/googleapis/google-cloud-rust" rel="noopener noreferrer"&gt;google-cloud-rust&lt;/a&gt; library to interact with the &lt;a href="https://cloud.google.com/resource-manager/docs?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Resource Manager&lt;/a&gt; API and retrieve information about your Google Cloud project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; The google-cloud-rust library is currently experimental. APIs may change, and it’s important to stay updated with the latest releases and documentation.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Add Dependencies&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;First, add the Resource Manager v3 API and &lt;a href="https://docs.rs/reqwest/latest/reqwest/" rel="noopener noreferrer"&gt;reqwest&lt;/a&gt; HTTP client to your Cargo.toml file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;cargo add google-cloud-resourcemanager-v3 reqwest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;Implement the handler&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;There are four key changes we’ll need to make in src/main.rs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Add /project Route:&lt;/strong&gt; A new route &lt;code&gt;/project&lt;/code&gt; will display project information, implemented by &lt;code&gt;project_handler()&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;Router&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;handler&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.route&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"/project"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_handler&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="nf"&gt;.layer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nn"&gt;std&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;sync&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Arc&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;)));&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;project_handler function:&lt;/strong&gt; The project handler will call &lt;a href="https://docs.rs/google-cloud-resourcemanager-v3/latest/google_cloud_resourcemanager_v3/client/struct.Projects.html#method.get_project" rel="noopener noreferrer"&gt;get_project()&lt;/a&gt; to fetch project details. Finally, it formats the project information into an HTML response. Error handling is included to display any errors that occur during the API call.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;project_handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="n"&gt;Extension&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;Arc&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;Projects&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Html&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Project ID not initialized"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;project_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"projects/{}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="nf"&gt;.get_project&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.send&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;project_number&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="py"&gt;.name&lt;/span&gt;&lt;span class="nf"&gt;.strip_prefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"projects/"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.unwrap_or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Unknown"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

            &lt;span class="nf"&gt;Html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="s"&gt;"&amp;lt;h1&amp;gt;Project Info&amp;lt;/h1&amp;gt;&amp;lt;ul&amp;gt;&amp;lt;li&amp;gt;Name: &amp;lt;code&amp;gt;{}&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;ID: &amp;lt;code&amp;gt;{}&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;li&amp;gt;Number: &amp;lt;code&amp;gt;{}&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;&amp;lt;/ul&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="py"&gt;.display_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;project_number&lt;/span&gt;
            &lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;h1&amp;gt;Error getting project info: {}&amp;lt;/h1&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Share client with handler:&lt;/strong&gt; For best performance, any one-time configuration should not reside in the handler. The &lt;a href="https://docs.rs/google-cloud-resourcemanager-v3/latest/google_cloud_resourcemanager_v3/client/struct.Projects.html#" rel="noopener noreferrer"&gt;Projects&lt;/a&gt; client can be initialized in main() and then shared with the handler with Axum’s &lt;a href="https://docs.rs/axum/latest/axum/struct.Extension.html" rel="noopener noreferrer"&gt;Extension&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Add helper function for project metadata&lt;/strong&gt; : To find out the project ID the container is running in, you’ll need to access the &lt;a href="https://cloud.google.com/resource-manager/docs?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;metadata key&lt;/a&gt;. That project ID will then be used to call the Resource Manager API to get more &lt;a href="https://docs.rs/google-cloud-resourcemanager-v3/latest/google_cloud_resourcemanager_v3/model/struct.Project.html" rel="noopener noreferrer"&gt;information about the project&lt;/a&gt;, including its display name and creation time. You can use &lt;a href="https://doc.rust-lang.org/std/sync/struct.LazyLock.html" rel="noopener noreferrer"&gt;LazyLock&lt;/a&gt; to initialize the project only once.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;OnceLock&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;OnceLock&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// ...&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;project_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_project_id&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to get project ID"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="nf"&gt;.set&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="nf"&gt;.expect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Failed to set PROJECT_ID"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="c1"&gt;// ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;get_project_id&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;Result&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;var&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"GOOGLE_CLOUD_PROJECT"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;project_id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nn"&gt;reqwest&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nn"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;new&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"http://metadata.google.internal/computeMetadata/v1/project/project-id"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;
        &lt;span class="nf"&gt;.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.header&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Metadata-Flavor"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"Google"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;.send&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="k"&gt;match&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="nf"&gt;.status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="nf"&gt;.is_success&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nf"&gt;Ok&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="nf"&gt;.text&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="k"&gt;.await&lt;/span&gt;&lt;span class="nf"&gt;.map_err&lt;/span&gt;&lt;span class="p"&gt;(|&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="nf"&gt;.to_string&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;&lt;span class="o"&gt;?&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Metadata server returned error: {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="nf"&gt;.status&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="nf"&gt;Err&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nd"&gt;format!&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Error querying metadata server: {}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  &lt;strong&gt;Set GOOGLE_CLOUD_PROJECT Environment Variable (Locally)&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;For local testing, you’ll need to set the &lt;code&gt;GOOGLE_CLOUD_PROJECT&lt;/code&gt; environment variable to your Google Cloud project ID. You can do this in your terminal before running the application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-project-id
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replace &lt;code&gt;your-project-id&lt;/code&gt; with your actual project ID. Cloud Run will automatically set this environment variable when deployed.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Enable the Resource Manager API&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;If you haven’t already, make sure to enable the &lt;a href="https://console.cloud.google.com/apis/api/cloudresourcemanager.googleapis.com/overview?utm_campaign=CDR_default_0xd368824c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Resource Manager API&lt;/a&gt; within your Google Cloud project.&lt;/p&gt;

&lt;h4&gt;
  
  
  &lt;strong&gt;Provide Resource Manager IAM access&lt;/strong&gt;
&lt;/h4&gt;

&lt;p&gt;You will need to provide the &lt;a href="https://cloud.google.com/resource-manager/docs/access-control-proj#permissions?utm_campaign=CDR_default_0xd368824c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;resourcemanager.projects.get&lt;/a&gt; role to the appropriate &lt;a href="https://cloud.google.com/run/docs/securing/service-identity#types-of-service-accounts?utm_campaign=CDR_0x2b6f3004_default_b403810548&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;Cloud Run service account&lt;/a&gt;. The instructions here use the Compute Engine default service account. If you are running locally, you’ll also need to provide these permissions to your account.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Redeploy to Cloud Run&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Use the same &lt;code&gt;gcloud run deploy&lt;/code&gt; command as before to redeploy your updated application:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy cloud-rust-example &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--region&lt;/span&gt; us-central1 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, when you visit the service URL provided by Cloud Run and navigate to the &lt;code&gt;/project&lt;/code&gt; path, you should see information about your Google Cloud project.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F716%2F0%2AojC486ePfJcfZ30r" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fcdn-images-1.medium.com%2Fmax%2F716%2F0%2AojC486ePfJcfZ30r" width="716" height="440"&gt;&lt;/a&gt;&lt;/p&gt;
Project information output from /project route



&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;This guide demonstrates the process of deploying a Rust Axum application on Cloud Run. I started with a basic “Hello, World!” example from the Axum repository, explained its code, and then showed how to enhance it for Cloud Run compatibility by dynamically configuring the port and creating a Dockerfile. By combining Rust and Axum with Cloud Run’s serverless simplicity, you can efficiently build and deploy robust web services. The sample source code is available in the &lt;a href="https://github.com/kweinmeister/cloud-rust-example" rel="noopener noreferrer"&gt;cloud-rust-example&lt;/a&gt; repository.&lt;/p&gt;

&lt;p&gt;For more information about Cloud Run, I recommend the &lt;a href="https://cloud.google.com/run/docs/quickstarts/build-and-deploy/deploy-service-other-languages?utm_campaign=CDR_default_0x80ca756c&amp;amp;utm_medium=external&amp;amp;utm_source=blog" rel="noopener noreferrer"&gt;quickstart&lt;/a&gt; for building and deploying a web application in the documentation. Also, check out &lt;a href="https://www.youtube.com/watch?v=rOMroL3mhO4" rel="noopener noreferrer"&gt;this video&lt;/a&gt; for a video walkthrough of running Rust on Cloud Run. Feel free to connect on &lt;a href="https://www.linkedin.com/in/karlweinmeister/" rel="noopener noreferrer"&gt;LinkedIn&lt;/a&gt;, &lt;a href="https://x.com/kweinmeister?ref_src=twsrc%5Egoogle%7Ctwcamp%5Eserp%7Ctwgr%5Eauthor" rel="noopener noreferrer"&gt;X&lt;/a&gt;, and &lt;a href="https://bsky.app/profile/kweinmeister.bsky.social" rel="noopener noreferrer"&gt;Bluesky&lt;/a&gt; to continue the discussion!&lt;/p&gt;




</description>
      <category>dockerfiles</category>
      <category>web</category>
      <category>axum</category>
      <category>rust</category>
    </item>
  </channel>
</rss>
