<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Google Developer Experts</title>
    <description>The latest articles on DEV Community by Google Developer Experts (@gde).</description>
    <link>https://dev.to/gde</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Forganization%2Fprofile_image%2F11939%2Fe3080d5b-ecde-42a8-b089-bafecc31fa97.png</url>
      <title>DEV Community: Google Developer Experts</title>
      <link>https://dev.to/gde</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/gde"/>
    <language>en</language>
    <item>
      <title>Local-First AI Done Right: How Gemma 4 E2B and 'Thinking Mode' Powered DiagramFlowAI</title>
      <dc:creator>Carlos Barbero</dc:creator>
      <pubDate>Wed, 13 May 2026 00:08:54 +0000</pubDate>
      <link>https://dev.to/gde/local-first-ai-done-right-how-gemma-4-e2b-and-thinking-mode-powered-diagramflowai-3bop</link>
      <guid>https://dev.to/gde/local-first-ai-done-right-how-gemma-4-e2b-and-thinking-mode-powered-diagramflowai-3bop</guid>
      <description>&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;DiagramFlowAI is a local-first desktop application (macOS, Windows, and Linux) that transforms natural language descriptions into production-ready architecture diagrams. It intelligently generates standard Mermaid syntax for general workflows, or outputs structured commands mapping to official AWS icons for cloud architectures. &lt;/p&gt;

&lt;p&gt;The application solves a very specific tension in modern software engineering: &lt;strong&gt;privacy versus productivity&lt;/strong&gt;. When architects and engineers sketch out internal systems—such as authentication flows, proprietary data pipelines, or secure cloud perimeters—sending that data to a cloud-based LLM endpoint is often a compliance deal-breaker. &lt;/p&gt;

&lt;p&gt;DiagramFlowAI is designed to be completely self-contained. Powered by &lt;code&gt;flutter_gemma&lt;/code&gt; and LiteRT, it runs 100% locally. After the initial model download, it requires zero internet connection, uses no API keys, and has no telemetry. It’s an AI diagramming studio that respects your company’s security posture.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5013qdq16lain3eltg7i.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5013qdq16lain3eltg7i.gif" alt=" " width="326" height="860"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrdthodfk53eay84bnjj.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffrdthodfk53eay84bnjj.gif" alt=" " width="720" height="536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wp1272npzx7xslqlmyt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1wp1272npzx7xslqlmyt.png" alt=" " width="800" height="568"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://github.com/carlosrgomes/DiagramFlowAI" rel="noopener noreferrer"&gt;github.com/carlosrgomes/DiagramFlowAI&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How I Used Gemma 4
&lt;/h2&gt;

&lt;p&gt;Most AI showcases default to the largest model available. I did the exact opposite. I deliberately built DiagramFlowAI around &lt;strong&gt;Gemma 4 E2B and E4B&lt;/strong&gt;—the edge variants—and intentionally skipped the 31B Dense and 26B MoE models. Here is why the smallest variants were the secret to making this desktop app work, and how Gemma 4's "Thinking Mode" unlocked capabilities I didn't expect.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Unfashionable Choice: Small over Large
&lt;/h3&gt;

&lt;p&gt;If you're building a high-throughput backend, the 31B Dense or 26B MoE are obvious choices. However, my deployment constraints pointed in a completely different direction:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Democratic Hardware Requirements:&lt;/strong&gt; A 31B dense model in 4-bit quantization demands around 16-20 GB of RAM. The E4B model comfortably fits within 4-6 GB and runs smoothly on integrated GPUs. That’s the difference between an app anyone can use and a toy restricted to high-end workstations.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Frictionless Onboarding:&lt;/strong&gt; The moment a user has to paste an API key, onboarding conversion plummets. Because E2B and E4B are open weights, users can simply click "download" and start diagramming. No auth walls, no billing setups.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Snappy Cold Starts:&lt;/strong&gt; In a desktop app, the first interaction needs to feel immediate. The E2B model loads and responds in seconds on modern M-series Macs and modern PCs, keeping the user in their flow state.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;To give users flexibility, I built in a toggle between E2B (faster) and E4B (more accurate on complex syntax), rather than hardcoding a single option.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Underrated Superpower: Thinking Mode
&lt;/h3&gt;

&lt;p&gt;If there is one thing every developer building with Gemma 4 should internalize, it's the power of the reasoning trace. The &lt;code&gt;flutter_gemma&lt;/code&gt; SDK exposes Gemma 4's internal reasoning as a distinct stream of &lt;code&gt;ThinkingResponse&lt;/code&gt; chunks. &lt;/p&gt;

&lt;p&gt;For diagram generation, this is a game-changer. Mermaid syntax is notoriously fragile—a stray colon, an unquoted string, or a missing &lt;code&gt;end&lt;/code&gt; tag will break the entire render. Without Thinking Mode, a 4B parameter model will often confidently output syntactically broken DSLs in one shot. &lt;/p&gt;

&lt;p&gt;With Thinking Mode enabled, the model spends a few hundred tokens planning its structure &lt;em&gt;first&lt;/em&gt; ("OK, this is a sequence diagram, I need actor -&amp;gt; participant -&amp;gt; arrow -&amp;gt; response..."). Consequently, the final output is dramatically more reliable.&lt;/p&gt;

&lt;p&gt;In the UI, I expose this trace as a collapsed accordion (e.g., "Thinking · 2.4s"). This subtle UX choice builds user trust and makes the generation wait feel highly productive, without overwhelming them with raw logs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pragmatic Patterns for 4B Models
&lt;/h3&gt;

&lt;p&gt;Fighting with the model for a few weeks led me to a few hard-won architectural patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Treat the System Prompt as a Grammar, Not a Personality:&lt;/strong&gt; Small models pattern-match exceptionally well. My 500-line system prompt isn't about making the AI "helpful"; it's an output contract. I use explicit delimiters (&lt;code&gt;&amp;lt;DIAGRAM&amp;gt;...&amp;lt;/DIAGRAM&amp;gt;&lt;/code&gt;) and provide "syntax cards" showing the most common parser failures (e.g., &lt;code&gt;NEVER write X&lt;/code&gt;). Teaching the model &lt;em&gt;what not to do&lt;/em&gt; prevents entire classes of bugs.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Trust the Contract over Regex:&lt;/strong&gt; Instead of fighting fragile markdown fences with complex Regex, I rely on the XML-style delimiters defined in the system prompt. Even when the model decides to write an explanatory paragraph, the actual code is safely wrapped and easily extracted.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Engineer the Recovery Loop:&lt;/strong&gt; Even with Thinking Mode, complex diagrams might occasionally fail to parse. Instead of trying to prompt-engineer my way to a 100% success rate (which is near impossible at 4B), I built a small ReAct-style retry loop. If the Mermaid parser throws an error, the app feeds the exact error message back into a follow-up turn. The model almost always fixes its syntax on the second attempt. &lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Gemma 4 E2B and E4B prove that you don't need a massive, cloud-hosted LLM to ship a genuinely useful, structured AI application. If you map your deployment constraints, lean hard on the system prompt, enable Thinking Mode, and engineer a smart recovery loop, these edge models become a feature, not a compromise.&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>Gemini API File Search: Enhanced Multimodal Capabilities with Embedding 2, Including Open-Source LINE Bot Implementation</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Tue, 12 May 2026 04:17:48 +0000</pubDate>
      <link>https://dev.to/gde/gemini-api-file-search-enhanced-multimodal-capabilities-with-embedding-2-including-open-source-g72</link>
      <guid>https://dev.to/gde/gemini-api-file-search-enhanced-multimodal-capabilities-with-embedding-2-including-open-source-g72</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faeqghkoo1xi76898n5zl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faeqghkoo1xi76898n5zl.png" alt="image-20260511221639333" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image source: &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Google Blog - Gemini API File Search is now multimodal: build efficient, verifiable RAG&lt;/a&gt;)&lt;/p&gt;

&lt;h1&gt;
  
  
  Recap: RAG Finally Doesn't Need to Build Legos
&lt;/h1&gt;

&lt;p&gt;In the past few years, whenever developers thought about RAG (Retrieval-Augmented Generation), the component list that came to mind probably looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A chunker (langchain? Write it yourself?)&lt;/li&gt;
&lt;li&gt;An embedding model (OpenAI text-embedding-3? Cohere? BGE?)&lt;/li&gt;
&lt;li&gt;A vector database (ChromaDB, FAISS, pgvector, Pinecone… which one to choose is a battle)&lt;/li&gt;
&lt;li&gt;A retrieval + rerank process&lt;/li&gt;
&lt;li&gt;And then the LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not to mention that multimodal RAG needs another layer: How to embed images? Do you need to OCR first? Do you need to split two stores, one for text and one for images? How to calculate scores for mixed text and image search? Just these few questions can take up a sprint.&lt;/p&gt;

&lt;p&gt;Recently, Google released &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Expanded Gemini API File Search for multimodal RAG&lt;/a&gt; on the developer blog, turning the long pipeline above into " &lt;strong&gt;calling a managed API&lt;/strong&gt; ", and &lt;strong&gt;images are natively supported&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article will do two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Explain the new features clearly, including what &lt;strong&gt;Gemini Embedding 2&lt;/strong&gt; is doing behind the scenes.&lt;/li&gt;
&lt;li&gt; Use an &lt;strong&gt;open-source&lt;/strong&gt; LINE Bot (&lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt;) as a live demonstration to see how the new features are combined in actual production code — and share the two typical pitfalls I encountered during debugging to help everyone avoid them.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Three Major Highlights of the New Features
&lt;/h2&gt;

&lt;p&gt;According to the official blog, the core of this upgrade is three things:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. True Multimodal File Search (Native Multimodal File Search)
&lt;/h3&gt;

&lt;p&gt;In the past, File Search was pure text retrieval, and images could only be indexed by OCRing them into text.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“File Search now processes images and text together. Powered by the Gemini Embedding 2 model, the tool understands native image data.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now you can &lt;strong&gt;directly put images into the File Search Store&lt;/strong&gt;, and index them together with text. The engine behind it is &lt;strong&gt;Gemini Embedding 2&lt;/strong&gt; — text, images, videos, audio, and documents &lt;strong&gt;share the same vector space&lt;/strong&gt;, so you can "find text with images", "find images with text", or "find images with images" without having to align the spaces yourself.&lt;/p&gt;

&lt;p&gt;For us product people, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Mixed text and image search is no longer a research topic&lt;/strong&gt;, it's an API call.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;No need to maintain two stores&lt;/strong&gt; (one for text chunks and one for CLIP-style image embeddings).&lt;/li&gt;
&lt;li&gt;  Scientific charts, UI screenshots, reports, photo albums... these &lt;strong&gt;things that used to lose most of their meaning after OCR&lt;/strong&gt; can now retain the original visual information for retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Custom Metadata and Server-side Filtering
&lt;/h3&gt;

&lt;p&gt;Each file you put into the store can now be tagged with key-value labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"U1234abcd..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"department"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Legal"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Final"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the &lt;a href="https://google.aip.dev/160" rel="noopener noreferrer"&gt;google.aip.dev/160&lt;/a&gt; filter syntax (same format as most GCP list APIs) when querying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;metadata_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'department="Legal" AND status="Final"'&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filtering is done &lt;strong&gt;first on Google's side&lt;/strong&gt;, not retrieving a bunch and then discarding. After reducing the noise, the &lt;strong&gt;speed and accuracy will both increase&lt;/strong&gt;, which is a lifesaver for multi-tenant SaaS — one store with metadata filters can separate tenants, without the need to isolate N stores.&lt;/p&gt;

&lt;p&gt;My LINE Bot uses this directly to do &lt;strong&gt;per-user data isolation&lt;/strong&gt;: each time a file is uploaded, it's tagged with the LINE &lt;code&gt;user_id&lt;/code&gt;, and when querying, a filter is applied, so user A will never see user B's data in the Q&amp;amp;A.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Page-level Citations
&lt;/h3&gt;

&lt;p&gt;Each cited snippet in the response will now include the &lt;strong&gt;page number&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“captures the page number for every piece of indexed information.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is super critical for enterprise customers. "AI says Y is mentioned on page X of the contract" vs. "AI says Y is mentioned in the contract" — the former can be directly accepted by legal/auditing, while the latter requires manual effort to flip through the book for verification. Page numbers unlock the final mile of "LLM answers cannot be traced back to the source".&lt;/p&gt;




&lt;h2&gt;
  
  
  The Multimodal Engine: Gemini Embedding 2
&lt;/h2&gt;

&lt;p&gt;The core of the new feature is this &lt;a href="https://deepmind.google/models/gemini/embedding/" rel="noopener noreferrer"&gt;Gemini Embedding 2&lt;/a&gt; model. Quote its specifications for your selection decisions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6qi7ndky7i4xyvit5vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6qi7ndky7i4xyvit5vs.png" alt="image-20260511221801984" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Supported Input&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Text, images, videos, audio, documents&lt;/strong&gt; (same embedding space)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input token limit&lt;/td&gt;
&lt;td&gt;8,192 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output dimensions&lt;/td&gt;
&lt;td&gt;128 ～ 3,072 (using Matryoshka Representation Learning, small dimensions can also maintain similar accuracy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multilingual support&lt;/td&gt;
&lt;td&gt;100+ languages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Several key benchmarks (recall@1):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Text-to-Image Search&lt;/strong&gt;: TextCaps &lt;strong&gt;89.6&lt;/strong&gt; / Docci &lt;strong&gt;93.4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Image-to-Text Search&lt;/strong&gt;: TextCaps &lt;strong&gt;97.4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Multilingual (MTEB)&lt;/strong&gt;: mean &lt;strong&gt;69.9&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Video-Text Matching&lt;/strong&gt;: Vatex ndcg@10 &lt;strong&gt;68.8&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Speech-Text Retrieval&lt;/strong&gt;: MSEB mrr@10 &lt;strong&gt;73.9&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Several key observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Matryoshka is not a buzzword&lt;/strong&gt;: You can store it with 3072 dimensions first, and when running retrieval, switch to 768 dimensions to run faster and maintain quality. Storage/scoring costs can be optimized in stages.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cross-modal scores are very real&lt;/strong&gt;: 97.4% recall@1 (image→text) means that if you have an image and want to find the corresponding descriptive text, you'll find it almost immediately. This can be directly implemented for use cases like "take a picture of a product label and find the corresponding page of the user manual".&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;100+ languages&lt;/strong&gt;: This is a very real difference for the Taiwan/Japan/Korea/Southeast Asia markets.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Developers Really Care About: Price and Access Cost
&lt;/h2&gt;

&lt;p&gt;From the official tutorial article &lt;a href="https://dev.to/googleai/multimodal-rag-with-the-gemini-api-file-search-tool-a-developer-guide-5878"&gt;Multimodal RAG with the Gemini API File Search tool: a developer guide&lt;/a&gt;, there are two sections that developers sensitive to cost should highlight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Fully managed, with no vector database overhead.”&lt;/p&gt;

&lt;p&gt;“Storage and query-time embeddings are free. You only pay for indexing and tokens.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In plain English:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;You don't pay for the vector database&lt;/strong&gt;, nor do you pay for the monthly salary of the people maintaining it.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Storage is free&lt;/strong&gt;, and &lt;strong&gt;embedding calculations at query time are also free&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  You only have two things to pay for: &lt;strong&gt;the embedding fee for the initial indexing&lt;/strong&gt; and &lt;strong&gt;the LLM tokens consumed when generating the answer&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a friendly cost curve for personal side projects and early startups — you don't need to decide on day one "can I afford the baseline of the vector DB".&lt;/p&gt;




&lt;h2&gt;
  
  
  Standard Workflow: 4 SDK calls to complete a RAG
&lt;/h2&gt;

&lt;p&gt;Organized from the dev.to guide, the minimum viable workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Create a store (specify the multimodal embedding model)
&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_search_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-multimodal-rag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding_model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models/gemini-embedding-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Upload files + custom metadata
&lt;/span&gt;&lt;span class="n"&gt;operation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_search_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_to_file_search_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_search_store_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-q1.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q1 Report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;custom_metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;department&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Upload is a long-running operation, needs to poll:
# operation = client.operations.get(operation)
&lt;/span&gt;
&lt;span class="c1"&gt;# 3. Feed file_search as a tool to generate_content
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-flash-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What was the revenue growth rate in the first quarter of last year?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_search&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FileSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;file_search_store_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;metadata_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;department=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; AND year=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;))],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get citations (including page numbers)
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;grounding_metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;grounding_chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# or the corresponding file/page fields
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To provide citations with images to the user, there is also &lt;code&gt;client.file_search_stores.download_media()&lt;/code&gt; that can be called.&lt;/p&gt;

&lt;p&gt;It's no exaggeration, &lt;strong&gt;the entire multimodal RAG is less than 30 lines of code&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo Case: Putting These New Features into a LINE Bot
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax8dlyjqm7ty2z00fv10.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax8dlyjqm7ty2z00fv10.png" alt="image-20260511221916359" width="800" height="1734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzg2z445gd33i5ianvxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzg2z445gd33i5ianvxd.png" alt="image-20260511221851736" width="800" height="1734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's abstract just looking at the SDK examples, so I made it into a LINE Bot that can be put to work, open-sourced at &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Users drop &lt;strong&gt;PDFs / images / text files&lt;/strong&gt; into the LINE chat box → Bot indexes into the File Search Store.&lt;/li&gt;
&lt;li&gt;  Users type questions → Gemini finds answers from the data &lt;strong&gt;uploaded by the user themselves&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Users drop an image and ask a question → The same can be done for image-to-text retrieval.&lt;/li&gt;
&lt;li&gt;  Deployment target: GCP Cloud Run + Cloud Build automatic deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture is very intuitive (key fields):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LINE Webhook&lt;/td&gt;
&lt;td&gt;FastAPI receives message events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCS&lt;/td&gt;
&lt;td&gt;Persists original files (&lt;code&gt;uploads/{user_id}/{message_id}.{ext}&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini File Search Store&lt;/td&gt;
&lt;td&gt;The only index layer (managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom metadata &lt;code&gt;user_id&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Multi-tenant isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FastAPI BackgroundTasks&lt;/td&gt;
&lt;td&gt;Avoid the LINE reply token 30-second limit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Comparing to the three major new features mentioned earlier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Multimodal&lt;/strong&gt;: Users drop images, drop PDFs, all go into the same store, and all consume the same pipeline during search.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Custom metadata&lt;/strong&gt;: Files for each LINE user are tagged with &lt;code&gt;user_id&lt;/code&gt;, filtered during queries, achieving server-side forced isolation.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Page-level citations&lt;/strong&gt;: In the future, to display "the answer comes from XX.pdf page 5" in LINE messages, directly consume &lt;code&gt;grounding_metadata&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire repo is about 600 lines of Python, and it completes a " &lt;strong&gt;your own private multimodal knowledge base chat Bot&lt;/strong&gt; ".&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Battle: commit → automatic online
&lt;/h2&gt;

&lt;p&gt;It's not enough for the open-source example to just run; to demonstrate it at the workshop, it needs to be at the level of "code changes, push to GitHub, and automatically deploy". This time, I asked &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; to be my co-pilot to help me connect CI/CD.&lt;/p&gt;

&lt;p&gt;I only dropped one sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Help me create a Cloud Build connection to GitHub, and trigger a build to deploy to Cloud Run after committing to main."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Code first scanned &lt;code&gt;cloudbuild.yaml&lt;/code&gt;, existing Cloud Run settings, Secret Manager, and Artifact Registry, and listed a "current problem", and then &lt;strong&gt;stopped to ask me a key decision&lt;/strong&gt;: Should I keep the existing service name or change the yaml? Does GitHub need authorization? After I answered, it built the missing resources in one go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build Artifact Registry repo&lt;/span&gt;
gcloud artifacts repositories create linebot &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1

&lt;span class="c"&gt;# Secret migration: move from the current service to Secret Manager (via stdin, don't leave shell history)&lt;/span&gt;
gcloud run services describe linebot-gemini-file-search &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'value(...)'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | gcloud secrets create LINE_CHANNEL_SECRET &lt;span class="nt"&gt;--data-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;-

&lt;span class="c"&gt;# Give Cloud Build / Compute SA the roles needed for deployment&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;role &lt;span class="k"&gt;in &lt;/span&gt;run.admin iam.serviceAccountUser artifactregistry.writer &lt;span class="se"&gt;\&lt;/span&gt;
            secretmanager.secretAccessor storage.objectAdmin logging.logWriter&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;gcloud projects add-iam-policy-binding your-cool-project-id &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:660825558664-compute@developer.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/&lt;/span&gt;&lt;span class="nv"&gt;$role&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;None
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Build trigger&lt;/span&gt;
gcloud builds triggers create github &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linebot-multimodal-rag-main &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repo-owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kkdai &lt;span class="nt"&gt;--repo-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linebot-multimodal-rag &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--branch-pattern&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"^main$"&lt;/span&gt; &lt;span class="nt"&gt;--build-config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cloudbuild.yaml

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only thing that couldn't be automated was &lt;strong&gt;GitHub OAuth authorization&lt;/strong&gt; — Claude Code directly admitted to me that "this step can only be done by clicking in the Console", and provided the URL and step-by-step instructions. After finishing the one-minute click, the trigger ran through.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfalls Record: Two Traps Directly Related to the New Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Hardcoded Model ID is Outdated
&lt;/h3&gt;

&lt;p&gt;The default values in &lt;code&gt;cloudbuild.yaml&lt;/code&gt; and code both write &lt;code&gt;gemini-3.1-flash&lt;/code&gt;, but after looking at the &lt;a href="https://ai.google.dev/gemini-api/docs/models" rel="noopener noreferrer"&gt;Gemini API's current model id list&lt;/a&gt;: there's no such model at all. The correct ID for Gemini 3 Flash is &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this happened&lt;/strong&gt;: multimodal RAG is a very new feature, and related documents, tutorials, and examples are still being created in large numbers, and the naming has also been slightly adjusted. The initial version of the Repo can easily write an id that "looks like it but doesn't actually exist".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Change the entire repo to &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;, and also confirm that the embedding model is &lt;code&gt;models/gemini-embedding-2&lt;/code&gt; (correct, didn't step on the trap). After pushing, Cloud Build automatically triggered, and a new revision went online in three minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Mysterious "Upload has already been terminated"
&lt;/h3&gt;

&lt;p&gt;This trap was directly stepped on the " &lt;strong&gt;image upload&lt;/strong&gt; " path newly supported by File Search Store — it's also the most worth sharing, because it demonstrates that "the error messages of new APIs are sometimes very euphemistic".&lt;/p&gt;

&lt;p&gt;I sent a JPG from LINE to the Bot and clicked "store in database", and the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ Failed to store: 400 Bad Request. {'message': 'Upload has already been terminated.', 'status': 'Bad Request'}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Couldn't see the reason at all. Cloud Logging only had the same error, no stack trace. After looking around on the &lt;a href="https://discuss.ai.google.dev/" rel="noopener noreferrer"&gt;Google AI Developers Forum&lt;/a&gt;, I found that several file types (.md / .xlsx / large CSV) had encountered similar reports.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real culprit&lt;/strong&gt; is hidden in this seemingly innocent code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/gemini_service.py (before modification)
&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mimetypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;guess_extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before Python 3.13, &lt;code&gt;mimetypes.guess_extension("image/jpeg")&lt;/code&gt; &lt;strong&gt;returns &lt;code&gt;.jpe&lt;/code&gt;, not &lt;code&gt;.jpg&lt;/code&gt;&lt;/strong&gt;. The reason is that in the MIME table of the standard library, &lt;code&gt;.jpe&lt;/code&gt; is lexicographically before &lt;code&gt;.jpg&lt;/code&gt;, and this quirk has existed for nearly twenty years.&lt;/p&gt;

&lt;p&gt;Gemini File Search Store doesn't recognize the file extension &lt;code&gt;.jpe&lt;/code&gt;, but the API's message uses "Upload has already been terminated" in a way that is very easy to mislead — at first, I thought it was because the upload size exceeded, or it was choked by concurrency, or there was a race inside the SDK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Take the file extension directly from &lt;code&gt;display_name&lt;/code&gt; (handlers have already been correctly set to &lt;code&gt;image_&amp;lt;id&amp;gt;.jpg&lt;/code&gt;), and use an explicit MIME comparison table as a backup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/gemini_service.py (after modification)
&lt;/span&gt;&lt;span class="n"&gt;_MIME_TO_EXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/webp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.webp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rsplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_MIME_TO_EXT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;mimetypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;guess_extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[BG Store] uploading display_name=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; mime=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
      &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tmp_suffix=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also, add &lt;code&gt;traceback.format_exc()&lt;/code&gt; to the &lt;code&gt;except&lt;/code&gt; part, so that the next time something goes wrong, Cloud Logging will have the full stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The takeaway from this story&lt;/strong&gt;: When you're running on a new modality on a "newly GA'd API", please be sure to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;First confirm on the client side that the filename / file extension you generate is the format expected by the API&lt;/strong&gt;, don't trust the &lt;code&gt;mimetypes&lt;/code&gt; standard library to guess for you.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Write the stack trace into the log&lt;/strong&gt;, otherwise you can't save yourself from the esoteric discussions on the forum like "just change a file".&lt;/li&gt;
&lt;li&gt; Compare the file extension you generate with the &lt;a href="https://ai.google.dev/gemini-api/docs/file-search" rel="noopener noreferrer"&gt;Gemini File Search official supported format list&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary: The Entry Fee for Multimodal RAG, the Lowest in History
&lt;/h2&gt;

&lt;p&gt;This time's Gemini API File Search upgrade compresses a feature line that used to take 3 months to go online into " &lt;strong&gt;dozens of lines of code + a managed API&lt;/strong&gt; " to run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Native multimodal support&lt;/strong&gt;: Text, images, videos, audio, and documents share the same embedding space, goodbye to the OCR transition layer.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Custom metadata + server-side filter&lt;/strong&gt;: Multi-tenant SaaS doesn't need to struggle with how many stores to split.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Page-level citations&lt;/strong&gt;: Enterprise compliance scenarios finally have native grounding.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Friendly to money&lt;/strong&gt;: Storage / query embedding are both free, only pay for indexing + LLM tokens.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cross-modal scores of Embedding 2&lt;/strong&gt;: 97.4% recall@1 is not a demo number, it's the level that can directly support the product.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to directly see a production-shaped end-to-end example: &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt; the entire repo PR welcome, and you're also welcome to use it to modify it into your own domain's RAG application — Notion knowledge base, employee manual Q&amp;amp;A machine, photo album manager, research paper index... probably only imagination will limit you.&lt;/p&gt;

&lt;p&gt;If you want to get started, the recommended reading order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Google official blog: &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Expanded Gemini API File Search for multimodal RAG&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Gemini Embedding 2 specification page: &lt;a href="https://deepmind.google/models/gemini/embedding/" rel="noopener noreferrer"&gt;deepmind.google/models/gemini/embedding&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Developer implementation guide: &lt;a href="https://dev.to/googleai/multimodal-rag-with-the-gemini-api-file-search-tool-a-developer-guide-5878"&gt;Multimodal RAG with the Gemini API File Search tool: a developer guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; My open-source example: &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;github.com/kkdai/linebot-multimodal-rag&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Welcome everyone to try out this very powerful Multimodal RAG support!&lt;/p&gt;

</description>
      <category>api</category>
      <category>gemini</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>Is Brain Float (bf16) Worth it?</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Tue, 12 May 2026 00:12:08 +0000</pubDate>
      <link>https://dev.to/gde/is-brain-float-worth-it-1lim</link>
      <guid>https://dev.to/gde/is-brain-float-worth-it-1lim</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After some basic benchamarking - I realized that vLLM defaults to the standard precision instead of the memory optimized Brain Float data type.&lt;/p&gt;

&lt;p&gt;The full benchmark suite was re-run via MCP and the brain float results were compared to the standard precision results.&lt;/p&gt;

&lt;p&gt;model: google/gemma-4-26B-A4B-it&lt;/p&gt;

&lt;p&gt;✦ The absolute scale benchmark for Gemma 4 (26B-A4B-it) on TPU v6e-4 has successfully completed. &lt;/p&gt;

&lt;p&gt;🏁 Final Benchmark Results&lt;br&gt;
  The sweep confirms that the TPU v6e-4 cluster can handle massive parallel loads, maintaining a peak prefill throughput of nearly 0.5 Million&lt;br&gt;
  tokens/sec at the model's absolute context ceiling.&lt;/p&gt;

&lt;p&gt;┌─────────────┬──────────────────┬──────────────────┬──────────────────┐&lt;br&gt;
  │ Concurrency │ 32K Context TTFT │ 64K Context TTFT │ Peak Prefill TPS │&lt;br&gt;
  ├─────────────┼──────────────────┼──────────────────┼──────────────────┤&lt;br&gt;
  │ 1 User      │ 1.289s           │ 2.032s           │ 32.2K            │&lt;br&gt;
  │ 64 Users    │ 3.994s           │ 7.390s           │ 426.4K           │&lt;br&gt;
  │ 256 Users   │ 10.202s          │ 20.129s          │ 498.2K           │&lt;br&gt;
  │ 512 Users   │ 19.500s          │ 37.649s          │ 489.2K           │&lt;br&gt;
  │ 1024 Users  │ 40.525s          │ 74.611s          │ 479.7K           │&lt;br&gt;
  └─────────────┴──────────────────┴──────────────────┴──────────────────┘&lt;/p&gt;

&lt;p&gt;🔍 Key Findings &amp;amp; Constraints&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Hard 64K Model Limit: Although the vLLM engine was configured for 128K, the google/gemma-4-26B-A4B-it model explicitly rejected requests
  exceeding 65,536 tokens with a Status 400 error. The benchmark confirms 64K is the current physical limit for this model variant.&lt;/li&gt;
&lt;li&gt;Prefill Dominance: The Trillium architecture (v6e) demonstrates incredible prefill efficiency. Even at 1024 concurrent users with 64K
  context (totaling ~67 Million tokens in a single batch), the system maintained a throughput of 479,760 tokens/sec.&lt;/li&gt;
&lt;li&gt;Latency at Scale:

&lt;ul&gt;
&lt;li&gt;Usable Scale: Up to 256 users, the TTFT for full 64K context remains at ~20s, which is excellent for deep reasoning or heavy RAG.&lt;/li&gt;
&lt;li&gt;Extreme Scale: At 1024 users, TTFT rises to ~74s. While the throughput is sustained, the latency makes this tier suitable only for
 non-interactive batch processing.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;📊 Data Artifacts&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Full CSV: extreme_benchmark_results_v2.csv&lt;/li&gt;
&lt;li&gt;Full JSON: extreme_benchmark_results_v2.json&lt;/li&gt;
&lt;li&gt;Progress Log: extreme_benchmark_progress_v2.log&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system is now fully profiled and remains online at &lt;a href="http://34.151.215.200:8000" rel="noopener noreferrer"&gt;http://34.151.215.200:8000&lt;/a&gt; with a 64K context window enabled.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;summarize results                                                                                                                              &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;✦ The absolute scale benchmark for Gemma 4 (26B-A4B-it) on a Cloud TPU v6e-4 (Trillium) cluster demonstrates the system's ability to handle&lt;br&gt;
  massive long-context workloads with industry-leading prefill efficiency.&lt;/p&gt;

&lt;p&gt;🚀 Key Performance Milestones&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Peak Throughput: Reached 498,253 tokens/sec at 256 concurrent users.&lt;/li&gt;
&lt;li&gt;Massive Batching: Successfully processed a single batch of ~67 Million tokens (1024 users × 64K context).&lt;/li&gt;
&lt;li&gt;Prefill Efficiency: The system maintains near-peak throughput (&amp;gt;450K tok/s) even as concurrency increases from 128 to 1024 users.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;📊 Performance Scaling Matrix&lt;br&gt;
  The following table highlights the trade-off between parallel user load and responsiveness (TTFT) at the maximum context depth.&lt;/p&gt;

&lt;p&gt;┌─────────────┬──────────────────┬─────────────┬─────────────────────┐&lt;br&gt;
  │ Concurrency │ 64K Context TTFT │ Prefill TPS │ System Status       │&lt;br&gt;
  ├─────────────┼──────────────────┼─────────────┼─────────────────────┤&lt;br&gt;
  │ 1 User      │ 2.03s            │ 32,213      │ ⚡ Ultra-Fast       │&lt;br&gt;
  │ 64 Users    │ 7.39s            │ 426,396     │ 🟢 Optimal (RAG)    │&lt;br&gt;
  │ 256 Users   │ 20.13s           │ 498,253     │ 🟡 High-Load        │&lt;br&gt;
  │ 512 Users   │ 37.65s           │ 489,235     │ 🟠 Batch Processing │&lt;br&gt;
  │ 1024 Users  │ 74.61s           │ 479,760     │ 🔴 Extreme Load     │&lt;br&gt;
  └─────────────┴──────────────────┴─────────────┴─────────────────────┘&lt;/p&gt;

&lt;p&gt;🔍 Technical Constraints &amp;amp; Observations&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Context Ceiling (64K): Despite engine configuration for 128K, the Gemma-4-26B-A4B-it model weights impose a hard physical limit of 65,536
  tokens. All requests for 128K context returned a Status 400 error.&lt;/li&gt;
&lt;li&gt;Linear scaling to 256 Users: The TPU v6e-4 cluster demonstrates perfect linear scaling up to 256 users. Beyond this point, the system is
  fully saturated, and additional concurrency increases latency without further throughput gains.&lt;/li&gt;
&lt;li&gt;Low Latency RAG: For interactive RAG applications, the 16-64 user tier is the "sweet spot," providing massive 64K context windows with a
  Time-to-First-Token under 8 seconds.&lt;/li&gt;
&lt;li&gt;Hardware Stability: The 4-chip Trillium configuration remained stable throughout the 8-hour stress test, including the injection of over 30
  million tokens in the final tier.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;📁 Data Artifacts&lt;br&gt;
  The full high-resolution dataset is available in the workspace:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;extreme_benchmark_results_v2.csv: Raw metrics for all 88 test combinations.&lt;/li&gt;
&lt;li&gt;extreme_benchmark_progress_v2.log: Execution trace and error logs.&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;did the brain float make a difference                                                                                                          &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;✦ Yes, switching to bfloat16 made a critical difference in both system stability and hardware efficiency:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Functional Difference (The "Unlock")&lt;br&gt;
The most immediate difference was that bf16 (the common shorthand) is actually an invalid argument for the current vLLM TPU backend. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Previous state: Using --dtype bf16 caused the container to exit immediately with code 2.&lt;/li&gt;
&lt;li&gt;Current state: Using --dtype bfloat16 allowed the engine to recognize the data type and successfully initialize the model.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Hardware Efficiency on TPU v6e&lt;br&gt;
Cloud TPU v6e (Trillium) is specifically architected to excel at Brain Floating Point (bfloat16) operations.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory Footprint: Using bfloat16 reduces the model weights and KV cache memory usage by 50% compared to float32. This reduction is exactly
what enabled the massive 64K context window to fit into the 128GB HBM of the 4-chip cluster.&lt;/li&gt;
&lt;li&gt;Throughput: TPUs have dedicated matrix multiplication units (MXUs) that process bfloat16 natively at peak speeds. The results we saw—~0.5
Million tokens/sec—would not have been possible with a 32-bit format due to memory bandwidth bottlenecks.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Numerical Stability&lt;br&gt;
Unlike float16, which has a very narrow range (max value ~65K), bfloat16 has the same exponent range as float32. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;This was crucial for the 64K and 128K context tests. At these extreme lengths, the attention scores can grow quite large; bfloat16 prevents
the "NaN" errors or precision collapses that often occur in float16 during deep context prefill.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Summary: Without the switch to bfloat16, the engine wouldn't start; with it, we achieved peak Trillium performance while maintaining the&lt;br&gt;
  numerical precision needed for long-context reasoning.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;look at the other runs - was this one with bfloat16 better                                                                                     &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Comparing Benchmark Performance: &lt;br&gt;
  Comparing the performance of the current 'bfloat16' benchmark against previous runs to quantify improvements in throughput and latency.&lt;/p&gt;

&lt;p&gt;╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮&lt;br&gt;
│ ✓  Shell head -n 20 comprehensive_benchmark_results.csv; head -n 20 extreme_benchmark_results_v2.csv                                       │&lt;br&gt;
│                                                                                                                                            │&lt;br&gt;
│ ... first 33 lines hidden (Ctrl+O to show) ...                                                                                             │&lt;br&gt;
│ 2026-05-11T17:04:56.795286,google/gemma-4-26B-A4B-it,8192,2,0.6020536170108244,0.6097863299655728,26868.427832622954,success,              │&lt;br&gt;
│ 2026-05-11T17:04:57.467729,google/gemma-4-26B-A4B-it,16384,2,0.6703764809935819,0.6722787949838676,48741.683129818666,success,             │&lt;br&gt;
│ 2026-05-11T17:04:58.376936,google/gemma-4-26B-A4B-it,32768,2,0.8942851985048037,0.9089556289836764,72100.32911427949,success,              │&lt;br&gt;
│ 2026-05-11T17:04:59.587674,google/gemma-4-26B-A4B-it,65500,2,1.1621350165223703,1.210006160021294,108263.91164628007,success,              │&lt;br&gt;
│ 2026-05-11T17:05:01.089462,google/gemma-4-26B-A4B-it,131000,2,,,0.0,failed,"All 2 requests failed. First error: Status 400: {""error"":{"" │&lt;br&gt;
│ message"":""This model's maximum context length is 65536 tokens. However, you requested 1 o"                                               │&lt;br&gt;
│ 2026-05-11T17:05:01.425868,google/gemma-4-26B-A4B-it,1024,4,0.3295201235014247,0.3329592429799959,12301.805960815711,success,              │&lt;br&gt;
│ 2026-05-11T17:05:01.826889,google/gemma-4-26B-A4B-it,2048,4,0.3966410259890836,0.4008637340157293,20435.87210530389,success,               │&lt;br&gt;
│ 2026-05-11T17:05:02.327203,google/gemma-4-26B-A4B-it,4096,4,0.49463606224162504,0.5001165689900517,32760.36231530235,success,              │&lt;br&gt;
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯&lt;br&gt;
  Analyzing bfloat16 Performance Gains: &lt;br&gt;
  Quantifying the performance impact of 'bfloat16' and 64K context window by comparing current and previous benchmark results.&lt;/p&gt;

&lt;p&gt;╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮&lt;br&gt;
│ ✓  Shell Comparing specific context length and concurrency results.                                                                        │&lt;br&gt;
│                                                                                                                                            │&lt;br&gt;
│ 2026-05-11T16:37:14.027070,google/gemma-4-26B-A4B-it,16384,1,0.6328242099843919,0.6337720929877833,25851.564278826365,success              │&lt;br&gt;
│ 2026-05-11T17:04:50.425117,google/gemma-4-26B-A4B-it,16384,1,0.8542688179877587,0.8552422930370085,19157.14427758196,success,              │&lt;br&gt;
│ 2026-05-11T16:37:31.692150,google/gemma-4-26B-A4B-it,32768,16,1.8776373735017842,2.025486672006082,258845.44551494622,success              │&lt;br&gt;
│ 2026-05-11T17:05:23.597761,google/gemma-4-26B-A4B-it,32768,16,1.9288130642489705,2.0350820519961417,257624.993294862,success,              │&lt;br&gt;
│                                                                                                                                            │&lt;br&gt;
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯&lt;/p&gt;

&lt;p&gt;✦ Comparing the bfloat16 run (current) against the previous run, we find that while throughput remains consistently massive, the primary benefit&lt;br&gt;
  of bfloat16 was unlocking extreme context depth rather than increasing raw speed at lower context.&lt;/p&gt;

&lt;p&gt;📊 Head-to-Head Comparison&lt;/p&gt;

&lt;p&gt;┌──────────────────────────┬──────────────────────┬────────────────────────┬───────────────────┐&lt;br&gt;
  │ Config (Context / Users) │ Previous Run (Mixed) │ Current Run (bfloat16) │ Performance Delta │&lt;br&gt;
  ├──────────────────────────┼──────────────────────┼────────────────────────┼───────────────────┤&lt;br&gt;
  │ 16K / 1 User             │ 25,851 tok/s         │ 19,157 tok/s           │ -25% (Latency)    │&lt;br&gt;
  │ 32K / 16 Users           │ 258,845 tok/s        │ 257,624 tok/s          │ ~Stable           │&lt;br&gt;
  │ 64K / 256 Users          │ N/A (Failed)         │ 498,253 tok/s          │ 🚀 Functional Win │&lt;br&gt;
  └──────────────────────────┴──────────────────────┴────────────────────────┴───────────────────┘&lt;/p&gt;

&lt;p&gt;🔍 Key Observations&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Functional Stability: The previous run (and several intermediate attempts) consistently failed or OOM'd when pushing beyond a 32K context
  window. The bfloat16 configuration successfully reached 65,500 tokens at 1024 concurrent users—a workload that represents nearly 67 Million
  active tokens in flight.&lt;/li&gt;
&lt;li&gt;Throughput Plateau: At 32K context with 16 users, the throughput is nearly identical. This suggests the TPU v6e-4's MXU (Matrix Multiply
  Unit) was already reaching its compute-bound peak for that specific matrix size.&lt;/li&gt;
&lt;li&gt;Latency Trade-off: We observed a slight increase in TTFT at very low concurrencies (1 user) with the bfloat16 configuration. This is likely
  due to the overhead of larger KV cache block sizes (increased from 16 to 128 to prevent memory fragmentation in long context).&lt;/li&gt;
&lt;li&gt;The "64K Ceiling": The current run definitiveley proved that while the software is ready for 128K, the current Gemma-4-26B-A4B-it weights
  have a hard-coded physical attention limit at 65,536 tokens.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Conclusion: The bfloat16 run was "better" because it provided deterministic stability at the model's absolute physical limit (64K context) while&lt;br&gt;
  maintaining peak hardware throughput (~0.5M tokens/sec) under massive parallel load.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>devchallenge</category>
      <category>llm</category>
      <category>gemmachallenge</category>
    </item>
    <item>
      <title>MCP Configuration for Google Workspace with Gemini CLI</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Mon, 11 May 2026 01:14:46 +0000</pubDate>
      <link>https://dev.to/gde/mcp-configuration-for-google-workspace-with-gemini-cli-3nd2</link>
      <guid>https://dev.to/gde/mcp-configuration-for-google-workspace-with-gemini-cli-3nd2</guid>
      <description>&lt;p&gt;This article covers the MCP setup and configuration for using Google Workspace and underlying Gemini LLM with Gemini CLI to enhance and extend Workspace operations over MCP.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqlwwvok0eejz30zpc2zv.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqlwwvok0eejz30zpc2zv.jpeg" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What is Google Workspace?
&lt;/h4&gt;

&lt;p&gt;Google Workspace (formerly G Suite) is a cloud-based, subscription-driven productivity and collaboration suite developed by Google, designed for businesses, schools, and organizations. It bundles essential tools — like Gmail, Drive, Docs, Meet, and Calendar — using a custom domain, offering enhanced security, administrative controls, and shared cloud storage.&lt;/p&gt;

&lt;p&gt;More details are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://workspace.google.com/" rel="noopener noreferrer"&gt;Google Workspace: Secure Online Productivity &amp;amp; Collaboration Tools&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Google Workspace? Isn’t MCP overkill?
&lt;/h4&gt;

&lt;p&gt;Native MCP support in Google Workspace opens up new use cases and flexibility with the entire Workspace suite.&lt;/p&gt;

&lt;p&gt;As of the date writing this- Google Workspace MCP support is in Developer Preview. You can sign up for access to the Developer Preview here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developers.google.com/workspace/preview" rel="noopener noreferrer"&gt;Google Workspace Developer Preview Program | Google for Developers&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What is MCP?
&lt;/h4&gt;

&lt;p&gt;Unless you have been living off grid without Internet- MCP is the new universal connector and next “Big Thing”.&lt;/p&gt;

&lt;p&gt;More information is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/discover/what-is-model-context-protocol" rel="noopener noreferrer"&gt;What is Model Context Protocol (MCP)? A guide&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Google MCP Strategy
&lt;/h4&gt;

&lt;p&gt;Google has gone all-in for all the core Cloud services to provide connections over MCP. An overview is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.cloud.google.com/mcp/overview" rel="noopener noreferrer"&gt;Google Cloud MCP servers overview | Google Cloud Documentation&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Gemini CLI
&lt;/h4&gt;

&lt;p&gt;If not pre-installed you can download the Gemini CLI to interact with the source files and provide real-time assistance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @google/gemini-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Testing the Gemini CLI Environment
&lt;/h4&gt;

&lt;p&gt;Once you have all the tools and the correct Node.js version in place- you can test the startup of Gemini CLI. You will need to authenticate with a Key or your Google Account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;▝▜▄ Gemini CLI v0.33.1
    ▝▜▄
   ▗▟▀ Logged in with Google /auth
  ▝▀ Gemini Code Assist Standard /upgrade no sandbox (see /docs) /model Auto (Gemini 3) | 239.8 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Google Skills Repository
&lt;/h4&gt;

&lt;p&gt;Google Skills give your MCP client well known approaches to work with the core Google products.&lt;/p&gt;

&lt;p&gt;The full details are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/level-up-your-agents-announcing-googles-official-skills-repository" rel="noopener noreferrer"&gt;Level Up Your Agents: Announcing Google's Official Skills Repository | Google Cloud Blog&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To install the Skills in Gemini CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills &lt;span class="nb"&gt;install &lt;/span&gt;github.com/google/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  What you talkin ‘bout Willis?
&lt;/h4&gt;

&lt;p&gt;That was a lot of setup! But wait- there is more! So what is different about this lab compared to all the others out there?&lt;/p&gt;

&lt;p&gt;This demo is one of the first deep dives into configuring Workspace for MCP. Gemini CLI provides a complete working environment with a full MCP client. Google Workspace exposes the key features of the platform over the MCP layer.&lt;/p&gt;

&lt;h4&gt;
  
  
  Where do I start?
&lt;/h4&gt;

&lt;p&gt;The strategy for configuring Workspace with MCP is a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;First, the Workspace configuration settings are updated. Then, these settings are used to configure Gemini CLI. Finally- Gemini CLI is used as a MCP client to the Workspace environment. Several samples are run using the Workspace MCP Tools directly from Gemini CLI.&lt;/p&gt;

&lt;h4&gt;
  
  
  Sample CodeLabs
&lt;/h4&gt;

&lt;p&gt;Several Code Labs have been developed to step through Google Workspace MCP configuration. A sample is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://codelabs.developers.google.com/google-workspace-mcp-antigravity#0" rel="noopener noreferrer"&gt;Google Workspace MCP servers in Antigravity | Google Codelabs&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://codelabs.developers.google.com/google-workspace-mcp-gemini-cli#0" rel="noopener noreferrer"&gt;Get started with Google Workspace MCP and Gemini CLI | Google Codelabs&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Workspace Developer Tools
&lt;/h4&gt;

&lt;p&gt;Developer Tools are provided to enable working with Google Workspace:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developers.google.com/workspace/guides/developer-tools" rel="noopener noreferrer"&gt;Google Workspace Developer Tools | Google for Developers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;xbill@penguin:~/workspace&lt;span class="nv"&gt;$ &lt;/span&gt;gemini extensions &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="s2"&gt;"https://github.com/googleworkspace/developer-tools"&lt;/span&gt;
Installing extension &lt;span class="s2"&gt;"google-workspace-developer-tools"&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt;
This extension will run the following MCP servers:
  &lt;span class="k"&gt;*&lt;/span&gt; workspace-developer &lt;span class="o"&gt;(&lt;/span&gt;remote&lt;span class="o"&gt;)&lt;/span&gt;: https://workspace-developer.goog/mcp

The extension you are about to &lt;span class="nb"&gt;install &lt;/span&gt;may have been created by a third-party developer and sourced from a public repository. Google does not vet, endorse, or guarantee the functionality or security of extensions. Please carefully inspect any extension and its &lt;span class="nb"&gt;source &lt;/span&gt;code before installing to understand the permissions it requires and the actions it may perform.
Do you want to &lt;span class="k"&gt;continue&lt;/span&gt;? &lt;span class="o"&gt;[&lt;/span&gt;Y/n]: 
Extension &lt;span class="s2"&gt;"google-workspace-developer-tools"&lt;/span&gt; installed successfully and enabled.
xbill@penguin:~/workspace&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  MCP for Workspace Setup
&lt;/h4&gt;

&lt;p&gt;MCP is available in the core Workspace setup. A getting started guide is available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://developers.google.com/workspace/guides/configure-mcp-servers" rel="noopener noreferrer"&gt;Configure the Google Workspace MCP servers | Google for Developers&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Setup the Basic Gemini CLI Environment
&lt;/h4&gt;

&lt;p&gt;At this point you should have a working Shell environment and a working Gemini CLI installation. All of the relevant code examples and documentation is available in GitHub.&lt;/p&gt;

&lt;p&gt;The next step is to clone the GitHub repository to your local environment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~
git clone https://github.com/xbill9/workspace-mcp
&lt;span class="nb"&gt;cd &lt;/span&gt;workspace-mcp
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;strong&gt;init.sh&lt;/strong&gt; from the cloned directory.&lt;/p&gt;

&lt;p&gt;The script will attempt to determine your shell environment and set the correct variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source &lt;/span&gt;init.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your session times out or you need to re-authenticate- you can run the &lt;strong&gt;set_env.sh&lt;/strong&gt; script to reset your environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source &lt;/span&gt;set_env.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Variables like PROJECT_ID need to be setup for use in the various build scripts- so the &lt;strong&gt;set_env&lt;/strong&gt; script can be used to reset the environment if you time-out.&lt;/p&gt;

&lt;h4&gt;
  
  
  Oauth Configuration
&lt;/h4&gt;

&lt;p&gt;The Google Workspace MCP servers use OAuth 2.0 for secure authentication. You must configure an OAuth consent screen and create a Desktop application client.&lt;/p&gt;

&lt;h3&gt;
  
  
  Set up the OAuth Consent Screen
&lt;/h3&gt;

&lt;p&gt;The Google Workspace MCP servers use OAuth 2.0 for authentication. You must configure the consent screen first.&lt;/p&gt;

&lt;p&gt;In the Google Cloud console, go to Google Auth Platform &amp;gt; Branding. Click Get Started if not configured.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv4fku8g919kklxq2df1k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv4fku8g919kklxq2df1k.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Configure App Information:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;App name: Google Workspace MCP Servers&lt;/li&gt;
&lt;li&gt;User support email: Select your email.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwgkv92m2g589p3ripq9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiwgkv92m2g589p3ripq9.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Configure Audience: Select Internal (or External if Internal is not available):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpsjm5ninvhzgddfswjzi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpsjm5ninvhzgddfswjzi.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Configure Contact Information: Enter your email address.&lt;/p&gt;

&lt;p&gt;Review and agree to the User Data Policy, then click Create.&lt;/p&gt;

&lt;p&gt;If you selected External, go to Audience and add your email under Test users.&lt;/p&gt;

&lt;h4&gt;
  
  
  Oauth Scopes
&lt;/h4&gt;

&lt;p&gt;Go to Data Access &amp;gt; Add or Remove Scopes. Under Manually add scopes, add the following for the servers you want to use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Google Calendar: &lt;a href="https://www.googleapis.com/auth/calendar.calendarlist.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/calendar.calendarlist.readonly&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/calendar.events.freebusy" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/calendar.events.freebusy&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/calendar.events.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/calendar.events.readonly&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google Chat: &lt;a href="https://www.googleapis.com/auth/chat.spaces.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/chat.spaces.readonly&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/chat.memberships.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/chat.memberships.readonly&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/chat.messages.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/chat.messages.readonly&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/chat.users.readstate.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/chat.users.readstate.readonly&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Google Drive: &lt;a href="https://www.googleapis.com/auth/drive.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/drive.readonly&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/drive.file" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/drive.file&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Gmail: &lt;a href="https://www.googleapis.com/auth/gmail.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/gmail.readonly&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/gmail.compose" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/gmail.compose&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;People API: &lt;a href="https://www.googleapis.com/auth/directory.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/directory.readonly&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/userinfo.profile" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/userinfo.profile&lt;/a&gt;, &lt;a href="https://www.googleapis.com/auth/contacts.readonly" rel="noopener noreferrer"&gt;https://www.googleapis.com/auth/contacts.readonly&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Create an OAuth Client ID
&lt;/h4&gt;

&lt;p&gt;Go to Google Auth Platform &amp;gt; Clients &amp;gt; Create Client.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk99af01yhe3nkzvpsq7j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk99af01yhe3nkzvpsq7j.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Select &lt;strong&gt;Desktop app&lt;/strong&gt; as the application type. Name it Gemini CLI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgc29yiwar0ft6lkw0ifk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgc29yiwar0ft6lkw0ifk.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Click Create and copy the Client ID and Client Secret. You will need these in the next step. (the actual values have been hidden)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulw5ofri29edehosetfy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fulw5ofri29edehosetfy.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Gemini CLI Settings
&lt;/h4&gt;

&lt;p&gt;Here is the sample Gemini CLI setup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"gmail"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"httpUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://gmailmcp.googleapis.com/mcp/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"oauth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"clientId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CLIENT_ID}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"clientSecret"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CLIENT_SECRET}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"scopes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/gmail.readonly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/gmail.compose"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"drive"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"httpUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://drivemcp.googleapis.com/mcp/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"oauth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"clientId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CLIENT_ID}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"clientSecret"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CLIENT_SECRET}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"scopes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/drive.readonly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/drive.file"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"calendar"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"httpUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://calendarmcp.googleapis.com/mcp/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"oauth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"clientId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CLIENT_ID}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"clientSecret"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CLIENT_SECRET}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"scopes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/calendar.calendarlist.readonly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/calendar.events.freebusy"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/calendar.events.readonly"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"people"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"httpUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://people.googleapis.com/mcp/v1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"oauth"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"enabled"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"clientId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CLIENT_ID}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"clientSecret"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"${CLIENT_SECRET}"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"scopes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/directory.readonly"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/userinfo.profile"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
          &lt;/span&gt;&lt;span class="s2"&gt;"https://www.googleapis.com/auth/contacts.readonly"&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Initial Connection
&lt;/h4&gt;

&lt;p&gt;Start up Gemini CLI after all the environment variables have been set:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;xbill@penguin:~/workspace-mcp&lt;span class="nv"&gt;$ &lt;/span&gt;gemini

 ▝▜▄ Gemini CLI v0.40.0
   ▝▜▄
  ▗▟▀ Signed &lt;span class="k"&gt;in &lt;/span&gt;with Google /auth
 ▝▀ Plan: Gemini Code Assist Standard /upgrade
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then use the &lt;strong&gt;/mcp list :&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; &amp;gt; /mcp list                                                                                                                                         

Configured MCP servers:

🟢 gmail - Ready (10 tools) (OAuth not authenticated)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The OAuth server has not been explicitly enabled- so it is showing as not authenticated.&lt;/p&gt;

&lt;h4&gt;
  
  
  Oauth MCP Connection Settings
&lt;/h4&gt;

&lt;p&gt;To simplify the Oauth setup — a prebuilt settings.json has been provided for Gemini CLI. The values for the Oauth client need to be set in the current test environment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;save_oauth.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The settings need to be enabled for each service. As an example - for Google Drive:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; &amp;gt; /mcp auth drive

ℹ Starting OAuth authentication for MCP server 'drive'...

╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Authentication required for MCP Server: 'drive.' Opening authentication page in your browser. │
│ │
│ Do you want to continue? │
│ │
│ ● 1. Yes │
│ 2. No │
│ │
│ Enter to select · ↑/↓ to navigate · Esc to cancel │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1pwqbo9hm9y9u2p87o5r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1pwqbo9hm9y9u2p87o5r.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then it will show successful:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybtbw4oigfg3dj447t71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fybtbw4oigfg3dj447t71.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;and Gemini CLI will return:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ℹ Starting OAuth authentication for MCP server 'drive'...

ℹ → Opening your browser for OAuth sign-in...
  If the browser does not open, copy and paste this URL into your browser:
  https://accounts.google.com/o/oauth2/v2/auth?client_id=1056842563084-862rokg1g71ci9mli67el70ejmbkrhik.apps.googleusercontent.com&amp;amp;response_type=code
  &amp;amp;redirect_uri=http%3A%2F%2Flocalhost%3A39387%2Foauth%2Fcallback&amp;amp;state=wedrlIdjzr30cXEequQ2VQ&amp;amp;code_challenge=I4duUiNR0msHZXxgnODkcl-7yzMjeMOcgKCp6KT
  5YLc&amp;amp;code_challenge_method=S256&amp;amp;scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.readonly+https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fdrive.file
  &amp;amp;resource=https%3A%2F%2Fdrivemcp.googleapis.com%2Fmcp%2Fv1
  💡 TIP: Triple-click to select the entire URL, then copy and paste it into your browser.
  ⚠️ Make sure to copy the COMPLETE URL - it may wrap across multiple lines.

ℹ ✅ Successfully authenticated with MCP server 'drive'!

ℹ Restarting MCP server 'drive'...

ℹ Successfully authenticated and reloaded tools for 'drive'
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Repeat the OAuth permissions for each Workspace service.&lt;/p&gt;

&lt;h4&gt;
  
  
  Validate The Setup
&lt;/h4&gt;

&lt;p&gt;Once the MCP OAuth authentication has been enabled for the Workspace services- you can check them with Gemini CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; &amp;gt; /mcp list                                                                                                                                         

Configured MCP servers:

🟢 gmail - Ready (10 tools) (OAuth)
  Tools:
  - mcp_gmail_create_draft
  - mcp_gmail_create_label
  - mcp_gmail_get_thread
  - mcp_gmail_label_message
  - mcp_gmail_label_thread
  - mcp_gmail_list_drafts
  - mcp_gmail_list_labels
  - mcp_gmail_search_threads
  - mcp_gmail_unlabel_message
  - mcp_gmail_unlabel_thread

🟢 drive - Ready (8 tools) (OAuth)
  Tools:
  - mcp_drive_copy_file
  - mcp_drive_create_file
  - mcp_drive_download_file_content
  - mcp_drive_get_file_metadata
  - mcp_drive_get_file_permissions
  - mcp_drive_list_recent_files
  - mcp_drive_read_file_content
  - mcp_drive_search_files

🟢 calendar - Ready (8 tools) (OAuth)
  Tools:
  - mcp_calendar_create_event
  - mcp_calendar_delete_event
  - mcp_calendar_get_event
  - mcp_calendar_list_calendars
  - mcp_calendar_list_events
  - mcp_calendar_respond_to_event
  - mcp_calendar_suggest_time
  - mcp_calendar_update_event

🟢 people - Ready (3 tools) (OAuth)
  Tools:
  - mcp_people_get_user_profile
  - mcp_people_search_contacts
  - mcp_people_search_directory_people

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  All Right Already! Can you actually do something with Workspace?
&lt;/h4&gt;

&lt;p&gt;At this point- Gemini CLI is setup with the Workspace MCP server and has direct access to all the MCP Tools. Any operations with the Tools come back directly into the Gemini CLI Context.&lt;/p&gt;

&lt;h4&gt;
  
  
  Calendar
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; what meetings do I have tomorrow?

  Checking Tomorrow's Calendar: Retrieving calendar events for tomorrow.

╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Action Required │
│ │
│ ? list_events (calendar MCP Server) {"startTime":"2026-05-11T00:00:00Z","endTime":"2026-05-12T00:00:00Z"} │
│ │
│ MCP Server: calendar │
│ Tool: list_events │
│ │
│ MCP Tool Details: │
│ (press Ctrl+O to expand MCP tool details) │
│ Allow execution of MCP tool "list_events" from server "calendar"? │
│ │
│ ● 1. Allow once │
│ 2. Allow tool for this session │
│ 3. Allow all server tools for this session │
│ 4. Allow tool for all future sessions ~/.gemini/policies/auto-saved.toml │
│ 5. No, suggest changes (esc) │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ You have two meetings scheduled for tomorrow, Monday, May 11, 2026:

   1. All Staff Morning Meeting
       * Time: 9:30 AM – 10:30 AM EDT
       * Location: Zoom Meeting
       * Description: Daily staff meeting. The Zoom link and call-in details are included in the event.

   2. [TPU] Office Hours - Europe/America
       * Time: 1:30 PM – 2:00 PM EDT
       * Location: Google Meet (https://meet.google.com/mri-ovop-cen)
       * Description: TPU Office Hours for a Europe/America friendly time.

  Please note that your response status for both meetings is currently marked as "needs action."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Gmail
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; summarize last 3 emails from Richard Seroter's Architecture Musings

  Summarizing Architecture Musings Emails: Searching for and summarizing the last 3 emails from "Richard Seroter's Architecture Musings".

╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Action Required │
│ │
│ ? search_threads (gmail MCP Server) {"query":"from:\"Richard Seroter's Architecture Musings\"","pageSize":3} │
│ │
│ MCP Server: gmail │
│ Tool: search_threads │
│ │
│ MCP Tool Details: │
│ (press Ctrl+O to expand MCP tool details) │
│ Allow execution of MCP tool "search_threads" from server "gmail"? │
│ │
│ ● 1. Allow once │
│ 2. Allow tool for this session │
│ 3. Allow all server tools for this session │
│ 4. Allow tool for all future sessions ~/.gemini/policies/auto-saved.toml │
│ 5. No, suggest changes (esc) │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ Here is a summary of the last three editions of Richard Seroter's Daily Reading List:

  1. Daily Reading List – May 8, 2026 (#780)
   * AI &amp;amp; Cognitive Debt: Discussion on "Cognitive Surrender," where relying too heavily on AI output can lead to cognitive debt.
   * Job Market: Articles debunking the "AI Job Apocalypse" as a fantasy, noting that tech job postings have hit a 3-year high.
   * Engineering Patterns: Modernizing legacy Express apps to Next.js using AI agents and Google Antigravity.
   * Infrastructure: Updates on GKE (faster node startup) and Bigtable (new in-memory tier for sub-millisecond reads).
   * Industry Shifts: Pinecone is moving away from RAG mainstreaming toward pushing knowledge upstream into agent artifacts.

  2. Daily Reading List – May 7, 2026 (#779)
   * AI Monetization: Why traditional SaaS "freemium" models don't work for AI, with a shift toward upselling based on usage intensity and outcomes.
   * Reliability: Lessons from Airbnb on monitoring at scale and breaking circular dependencies in observability stacks.
   * Agent Architectures: Guidance on choosing between REST APIs, MCP, and Agent-to-Agent (A2A) architectures (defaulting to REST unless reasoning
     agents need more).
   * New Tools: Gemini 3.1 Flash-Lite is now GA on the Gemini Enterprise Agent Platform.
   * Kubernetes: A new GKE feature allows using "Pod Snapshots" to eliminate cold start latencies.

  3. Daily Reading List – May 6, 2026 (#778)
   * Agent Skills: A look at twenty agent skills that serve the entire SDLC and force agents through standard engineering stages.
   * AI-Native Orgs: Designing organizations for AI-native engineering involving shorter planning cycles and smaller, more diverse squads.
   * Agentic Coding: A debate on whether agentic coding is a "trap" vs. Google’s success in achieving 6x faster code migrations (TensorFlow to JAX)
     using specialized agents.
   * Cloud Interop: AWS MCP Server is now generally available, highlighting the industry-wide move toward MCP support.
   * Security: New IAM features focused on security, governance, and runtime defense.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Drive
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Action Required │
│ │
│ ? create_file (drive MCP Server) {"textContent":"# Gemini CLI v0.41.0 Release Highlights (May 5, 2026)\n\nThe latest release of Gemini CLI brin… │
│ │
│ MCP Server: drive │
│ Tool: create_file │
│ │
│ MCP Tool Details: │
│ (press Ctrl+O to expand MCP tool details) │
│ Allow execution of MCP tool "create_file" from server "drive"? │
│ │
│ 1. Allow once │
│ 2. Allow tool for this session │
│ 3. Allow all server tools for this session │
│ ● 4. Allow tool for all future sessions ~/.gemini/policies/auto-saved.toml │
│ 5. No, suggest changes (esc) │
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ I've created a new Google Doc with the highlights from the latest Gemini CLI release (v0.41.0).

  You can view it here: Gemini CLI v0.41.0 Release Highlights
  (https://docs.google.com/document/d/1HrDB06S2-mGdORNvItNuOwJidX9zazLgoOCN4eLLPpc/edit?usp=drivesdk)

  Highlights included in the document:
   * Real-time Voice Mode: Now supports both cloud and local backends for terminal-based voice interactions.
   * Secure Environment Loading: Enforced workspace trust and secured .env loading for headless operations.
   * Advanced Shell Validation: Added a core tools allowlist and enhanced validation for improved security.
   * Bonus Recap (v0.40.0): Also included notes on offline search with ripgrep, colorblind-friendly themes, and the new four-tier memory management
     system.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  People
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="err"&gt;Searching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Contacts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Lenovo:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Searching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Lenovo"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;in&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;user's&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;contacts.&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;✓&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;search_contacts&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;(people&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;MCP&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Server)&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"query"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="s2"&gt;"Lenovo"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"results"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Lenovo@enews.lenovo.com"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;│&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;✦&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;I&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;found&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;one&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;contact&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;matching&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Lenovo:&lt;/span&gt;&lt;span class="w"&gt;

   &lt;/span&gt;&lt;span class="err"&gt;*&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Email:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;Lenovo@enews.lenovo.com&lt;/span&gt;&lt;span class="w"&gt;

  &lt;/span&gt;&lt;span class="err"&gt;Is&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;this&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;the&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;contact&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;you&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;were&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;looking&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;for?&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Final Gemini CLI Code Review
&lt;/h4&gt;

&lt;p&gt;As a final step — Gemini CLI was used for a full code review of the project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✦ This project is a developer workspace designed to build, manage, and authorize Model Context Protocol (MCP) agents within a Google Cloud (GCP)
  environment.

  Its primary purpose is to simplify the setup of MCP servers that allow AI models to interact with Google Workspace services (Gmail, Drive,
  Calendar, etc.).

  Key Capabilities
   - Automated Environment Setup: The init.sh script handles GCP project configuration, including enabling necessary APIs like Gmail, Drive, and the
     corresponding MCP APIs.
   - Authentication Management: Scripts like set_adc.sh and save_oauth.sh manage Application Default Credentials (ADC) and OAuth secrets, ensuring
     secure access to your Google data.
   - MCP Authorization Flow: mcp_setup.sh provides the specific commands needed to authorize the Gemini CLI to use these Workspace services via MCP.
   - Workspace Integration: It bridges your GCP infrastructure with the Gemini CLI, enabling tools that can read your email, schedule meetings, or
     search your Drive.

  Core Components
   - GEMINI.md: Contains the "Mandates" and tech stack rules for developing in this workspace.
   - .env Management: Local state (like your GOOGLE_CLOUD_PROJECT ID) is managed through scripts to keep your environment consistent.
   - Initialization Scripts: A suite of bash scripts to handle the "plumbing" of GCP and MCP setup.

  Essentially, it turns your local terminal and Gemini CLI into a powerful agentic platform capable of performing complex tasks across your Google
  Workspace.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;Gemini CLI was configured as a MCP client using the MCP Server for Google Workspace. Then, the MCP connection was used to connect to the Workspace Environment and test basic operations. Finally, Gemini CLI was used for a complete project code review.&lt;/p&gt;

</description>
      <category>googleworkspace</category>
      <category>oauth2</category>
      <category>mcps</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Gemma-4-31B on v6e-4 TPU Benchmarks</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Fri, 08 May 2026 16:57:09 +0000</pubDate>
      <link>https://dev.to/gde/gemma4-5fon</link>
      <guid>https://dev.to/gde/gemma4-5fon</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;model: Gemma-4-31B&lt;/p&gt;

&lt;h1&gt;
  
  
  🚀 Gemma 4 TPU v6e-4 Performance Report
&lt;/h1&gt;

&lt;h2&gt;
  
  
  📋 Deployment Overview
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Model:&lt;/strong&gt; google/gemma-4-31B-it&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hardware:&lt;/strong&gt; Cloud TPU v6e-4 (Trillium)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Runtime:&lt;/strong&gt; v2-alpha-tpuv6e (Flex-start)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TPU Location:&lt;/strong&gt; southamerica-east1-c&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Serving Engine:&lt;/strong&gt; vLLM (v0.20.2rc1.dev111+g8eb401134)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📊 Performance Summary (C1 - C1024)
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Peak Prefill Throughput:&lt;/strong&gt; 463,345 tokens/sec&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avg TTFT (~1.6k tokens):&lt;/strong&gt; 2.597 seconds&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Avg TTFT (16k tokens):&lt;/strong&gt; 4.775 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  📈 Concurrency Scaling Matrix (Mean per Concurrency)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;concurrency&lt;/th&gt;
&lt;th&gt;avg_ttft&lt;/th&gt;
&lt;th&gt;prefill_tps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;0.546599&lt;/td&gt;
&lt;td&gt;14778.3&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;0.562068&lt;/td&gt;
&lt;td&gt;28121.7&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;0.595823&lt;/td&gt;
&lt;td&gt;51869.1&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;0.679816&lt;/td&gt;
&lt;td&gt;88055.5&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;0.872466&lt;/td&gt;
&lt;td&gt;133697&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;1.16488&lt;/td&gt;
&lt;td&gt;191631&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;64&lt;/td&gt;
&lt;td&gt;1.55596&lt;/td&gt;
&lt;td&gt;261802&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;128&lt;/td&gt;
&lt;td&gt;2.15464&lt;/td&gt;
&lt;td&gt;328909&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;256&lt;/td&gt;
&lt;td&gt;3.55723&lt;/td&gt;
&lt;td&gt;352654&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;512&lt;/td&gt;
&lt;td&gt;7.59987&lt;/td&gt;
&lt;td&gt;318854&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1024&lt;/td&gt;
&lt;td&gt;21.005&lt;/td&gt;
&lt;td&gt;240170&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  🔍 Key Findings
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Efficiency Saturated:&lt;/strong&gt; Maximum throughput was achieved at concurrency 256, reaching &lt;strong&gt;463,345 tok/s&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Trillium Scalability:&lt;/strong&gt; The TPU v6e-4 architecture handled 1024 concurrent requests without memory exhaustion, maintaining throughput stability even under extreme queueing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Responsive Context:&lt;/strong&gt; Even at 16k tokens, the TTFT remained under 1 second for low concurrencies (C1-C8).&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  💸 Cost Efficiency
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Estimated Hourly Cost:&lt;/strong&gt; ~.40 (Flex-start rate for v6e-4)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Throughput Efficiency:&lt;/strong&gt; ~308,000,000 tokens per dollar at peak saturation.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Report generated by Gemini CLI on 2026-05-08.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  ⚖️ Competitive Analysis: Dense (31B) vs. MoE (26B A4B)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Metric&lt;/th&gt;
&lt;th&gt;Gemma 4 31B (Dense)&lt;/th&gt;
&lt;th&gt;Gemma 4 26B (MoE)&lt;/th&gt;
&lt;th&gt;Winner&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Model Architecture&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dense (31B parameters)&lt;/td&gt;
&lt;td&gt;Sparse (26B Total / 3.8B Active)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;MoE&lt;/strong&gt; (Efficiency)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Peak Throughput (TPU v6e-4)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;463,345 tok/s&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;~457,000 tok/s&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Dense&lt;/strong&gt; (Slightly)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interactive Latency (TTFT)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;0.314s&lt;/strong&gt; (at C1/128t)&lt;/td&gt;
&lt;td&gt;&amp;lt; 1.200s (Interactive)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Dense&lt;/strong&gt; (Low Load)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Active Compute cost&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;31B params / token&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;3.8B params / token&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;MoE&lt;/strong&gt; (7.5x lower)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Max Context Window&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;64K (Tested to 16K)&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;256K&lt;/strong&gt; (Shared KV Cache)&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;MoE&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;Analysis Summary&lt;/strong&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Throughput Parity:&lt;/strong&gt; Our benchmarks show that the &lt;strong&gt;31B Dense model&lt;/strong&gt; actually matches or slightly exceeds the peak throughput of the &lt;strong&gt;26B MoE model&lt;/strong&gt; on the same TPU v6e-4 hardware. This indicates exceptional hardware-software co-optimization for dense matrix operations in the Trillium architecture.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute Efficiency:&lt;/strong&gt; While throughput is similar, the &lt;strong&gt;MoE model is 7.5x more compute-efficient&lt;/strong&gt; per token generated (activating only 3.8B parameters). In a multi-tenant environment, the MoE model would likely sustain higher concurrent user counts before hitting power or thermal limits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency Advantage:&lt;/strong&gt; The Dense model demonstrates superior snappiness for low-load interactive tasks, with a TTFT of &lt;strong&gt;0.314s&lt;/strong&gt;, which is significantly below the MoE target of 1.2s.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Scaling:&lt;/strong&gt; The MoE model's Shared KV Cache allows it to scale to &lt;strong&gt;256K tokens&lt;/strong&gt;, whereas our Dense stack is currently optimized for high-throughput within the 16K-64K range.&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>ai</category>
      <category>devchallenge</category>
      <category>google</category>
      <category>gemmachallenge</category>
    </item>
    <item>
      <title>Gemma-4-26B on v6e-4 TPU Benchmarks</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Thu, 07 May 2026 20:22:23 +0000</pubDate>
      <link>https://dev.to/gde/gemma-4-26b-on-v6e-4-tpu-benchmarks-3cah</link>
      <guid>https://dev.to/gde/gemma-4-26b-on-v6e-4-tpu-benchmarks-3cah</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/google-gemma-2026-05-06"&gt;Gemma 4 Challenge: Build with Gemma 4&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;model: google/gemma-4-26B-A4B-it&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;gemini "run the context benchmark with concurrency from 1, 2, 4, 8, 16, 32,64, 128, 256, 512, 1024  up to 16384 tokens"                         &lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Extreme Stress Test at 1024 Concurrency: Execute an extreme-load benchmark sweep up to concurrency 1024 to find the breaking point.&lt;/p&gt;

&lt;p&gt;╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮&lt;br&gt;
│ ✓  Shell Execute an extreme-load multi-dimensional benchmark sweep up to concurrency 1024. Steps reduced to 2 (0, 8192, 16384) to manage t… │&lt;br&gt;
│                                                                                                                                             │&lt;br&gt;
│ ... first 89 lines hidden (Ctrl+O to show) ...                                                                                              │&lt;br&gt;
│ |             128 |           256 |   1.14308  |     18554.9   |                                                                            │&lt;br&gt;
│ |            8192 |           256 |   3.15647  |    419771     |                                                                            │&lt;br&gt;
│ |           16384 |           256 |   5.58021  |    457101     |                                                                            │&lt;br&gt;
│ |             128 |           512 |   3.80498  |     11249.3   |                                                                            │&lt;br&gt;
│ |            8192 |           512 |   6.69551  |    374758     |                                                                            │&lt;br&gt;
│ |           16384 |           512 |  10.7299   |    443335     |                                                                            │&lt;br&gt;
│ |             128 |          1024 |  14.7126   |      5878.47  |                                                                            │&lt;br&gt;
│ |            8192 |          1024 |  19.6215   |    267038     |                                                                            │&lt;br&gt;
│ |           16384 |          1024 |  24.6752   |    388684     |                                                                            │&lt;br&gt;
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯&lt;/p&gt;

&lt;p&gt;✦ The extreme-load benchmark sweep up to 1024 concurrency is complete. We have successfully identified the breaking point of the current TPU v6e&lt;br&gt;
  vLLM stack.&lt;/p&gt;

&lt;p&gt;📊 Extreme Load Performance Summary&lt;/p&gt;

&lt;p&gt;┌─────────────┬──────────────────┬──────────────┬───────────────────┐&lt;br&gt;
  │ concurrency │ context (tokens) │ avg_ttft (s) │ batch prefill_tps │&lt;br&gt;
  ├─────────────┼──────────────────┼──────────────┼───────────────────┤&lt;br&gt;
  │ 128         │ 16,384           │ 3.30s        │ 439,833           │&lt;br&gt;
  │ 256         │ 16,384           │ 5.58s        │ 457,101           │&lt;br&gt;
  │ 512         │ 16,384           │ 10.73s       │ 443,335           │&lt;br&gt;
  │ 1024        │ 16,384           │ 24.68s       │ 388,684           │&lt;br&gt;
  └─────────────┴──────────────────┴──────────────┴───────────────────┘&lt;br&gt;
  📉 Breaking Point Analysis&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Saturation Point: The aggregate throughput peaked at 457,101 tokens/second at 256 concurrency. Beyond this point, throughput begins to decline
 as the system becomes bottlenecked by request management overhead and scheduling.&lt;/li&gt;
&lt;li&gt;The "Knee" of the Curve:

&lt;ul&gt;
&lt;li&gt;Throughput Decline: Moving from 256 to 1024 concurrency resulted in a 15% drop in aggregate throughput (457k to 388k).&lt;/li&gt;
&lt;li&gt;Latency Explosion: The Time to First Token (TTFT) grew exponentially, jumping from ~5.6s at concurrency 256 to nearly 25 seconds at
 concurrency 1024.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Small Context Collapse: At 1024 concurrency, even short 128-token requests took 14.7 seconds to start generating, compared to just 0.3s at low
 concurrency. This indicates the server's internal queues are completely saturated.&lt;/li&gt;

&lt;li&gt;Robustness: Remarkably, the server did not crash or OOM even under 1024 concurrent 16k requests (a total of 16.7 million tokens in flight).
 This speaks to the robustness of the vLLM-TPU implementation and the underlying Trillium hardware.&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;🏁 Final Recommendation&lt;br&gt;
  For this google/gemma-4-26B-A4B-it deployment on an 4-chip TPU v6e pod:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Optimal High-Throughput: Target 128-256 concurrency. This yields the highest efficiency (~440k-457k tps) with acceptable latency (3s-6s).&lt;/li&gt;
&lt;li&gt;Optimal Interactive: Target 1-16 concurrency. This keeps TTFT under 1.2s while still processing up to 200k tokens per second.&lt;/li&gt;
&lt;li&gt;Avoid: Concurrencies above 512, as latency becomes prohibitive (&amp;gt;10s) and throughput starts to degrade significantly.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;✦ The visualization of our extreme stress test (up to 1024 concurrency) provides a clear picture of the TPU v6e's performance boundaries:&lt;/p&gt;

&lt;p&gt;📈 Visual Summary&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;Avg TTFT (s) vs. Context Length&lt;br&gt;
The plot shows several distinct curves corresponding to the different concurrency levels. &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Low-Middle Curves: For concurrencies 1–128, the lines remain flat and clustered near the bottom (sub-4s), indicating the system is well within
its operational limits.&lt;/li&gt;
&lt;li&gt;Top Curves: The lines representing 512 and 1024 concurrency show a sharp vertical shift, climbing up to the 25-second mark. This illustrates
the "queue wait" effect, where requests spend significantly more time waiting for compute resources than being processed.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Prefill Throughput (tok/s) vs. Context Length&lt;br&gt;
This plot displays the aggregate system efficiency:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Saturation Peak: You can see the density of the curves increasing as aggregate throughput approaches 450,000–475,000 tokens/second.&lt;/li&gt;
&lt;li&gt;Performance Inversion: At the highest concurrency (1024), the throughput curve actually dips below the 128/256 curves. This visual "crossover"
confirms that the system has moved past its efficiency peak and is now incurring a penalty for managing too many simultaneous requests.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;🏁 Final Project State&lt;br&gt;
  The benchmark suite is now a robust, multi-dimensional performance analysis engine:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Enhanced context_benchmark.py: Supports deep sweeps of prompt length × concurrency.&lt;/li&gt;
&lt;li&gt;Integrated MCP Tool: The run_context_benchmark tool allows triggering these complex stress tests via a single agent command.&lt;/li&gt;
&lt;li&gt;Advanced Visualization: plot_context_benchmark.py handles high-density data and provides ASCII visualization for immediate analysis.&lt;/li&gt;
&lt;li&gt;Optimized Reporting: Automatically generates Markdown, CSV, and JSON outputs for every run.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The project is fully prepared for production capacity planning and further infrastructure tuning.&lt;/p&gt;

&lt;p&gt;xbill@penguin:~/aisprintapr2026/gemma4-benchmark$  python plot_context_benchmark.py    &lt;/p&gt;

&lt;p&gt;--- Context Length vs. Avg TTFT (s) ---&lt;br&gt;
  25 +----------------------------------------------------------------------+&lt;br&gt;
     |                                                       ******         |&lt;br&gt;
     |                                                ****&lt;strong&gt;&lt;em&gt;FT (s) *&lt;/em&gt;&lt;/strong&gt;*** |&lt;br&gt;
     |                                         *******                      |&lt;br&gt;
  20 |                                   ******                             |&lt;br&gt;
     |                            *******                                   |&lt;br&gt;
     |                    ********                                          |&lt;br&gt;
     |            ********                                                  |&lt;br&gt;
  15 |    ********                                                          |&lt;br&gt;
     |***********                                                           |&lt;br&gt;
     |           **********************                                     |&lt;br&gt;
     |                                 *********************                |&lt;br&gt;
     |                                                      ***********     |&lt;br&gt;
  10 |                                                ***********           |&lt;br&gt;
     |                                     ***********                      |&lt;br&gt;
     |                          ***********                                 |&lt;br&gt;
     |                **********                       ****************     |&lt;br&gt;
   5 |      ********** ***************************************              |&lt;br&gt;
     |*****************       ****************        *****************     |&lt;br&gt;
     |        *********************************************************     |&lt;br&gt;
     |*****************************************************************     |&lt;br&gt;
   0 +----------------------------------------------------------------------+&lt;br&gt;
     0      2000    4000    6000    8000  10000   12000   14000   16000   18000&lt;/p&gt;

&lt;p&gt;--- Context Length vs. Prefill Throughput (tok/s) ---&lt;br&gt;
  500000 +------------------------------------------------------------------+&lt;br&gt;
         |                                                                  |&lt;br&gt;
  450000 |                                             ******************** |&lt;br&gt;
         |                             ********************************     |&lt;br&gt;
  400000 |                           **      ************************       |&lt;br&gt;
         |                         ** *********            ******  ****     |&lt;br&gt;
  350000 |                       ** ***                 ***************     |&lt;br&gt;
         |                      * ***          **************** ****        |&lt;br&gt;
  300000 |                    *****   *********  ******     ****            |&lt;br&gt;
         |                  *****   **     ********    *****     ******     |&lt;br&gt;
  250000 |                *****  ***  *********    *****************        |&lt;br&gt;
         |              *****  **  ***  **** **********  *****              |&lt;br&gt;
         |             **** ***  ** ***********    ******        ******     |&lt;br&gt;
  200000 |           **** **  ************    *****    ************         |&lt;br&gt;
         |         ******* ***********  ********************                |&lt;br&gt;
  150000 |       ****** ********* ***********  *******         ********     |&lt;br&gt;
         |      ***** ***************   *************************           |&lt;br&gt;
  100000 |    ****************** **********************        ********     |&lt;br&gt;
         |  *********************************  ****************             |&lt;br&gt;
   50000 |**************************************       ****************     |&lt;br&gt;
         |*************************************************************     |&lt;br&gt;
       0 +------------------------------------------------------------------+&lt;br&gt;
         0     2000    4000   6000    8000  10000   12000  14000   16000  18000&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>gemmachallenge</category>
      <category>gemma</category>
    </item>
    <item>
      <title>Building ADK Agents with Azure Cloud Shell</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Thu, 07 May 2026 17:14:45 +0000</pubDate>
      <link>https://dev.to/gde/building-adk-agents-with-azure-cloud-shell-3bd1</link>
      <guid>https://dev.to/gde/building-adk-agents-with-azure-cloud-shell-3bd1</guid>
      <description>&lt;p&gt;This article covers installing and Running Gemini CLI cross-cloud. The Azure Cloud Shell provides a basic Linux environment that can support running Gemini CLI.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkqqqvgxkwwch6vs1wfk5.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkqqqvgxkwwch6vs1wfk5.jpeg" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Azure Cloud Shell
&lt;/h4&gt;

&lt;p&gt;Azure Cloud Shell is a browser-based, authenticated terminal for managing Azure resources, accessible directly through the Azure portal, shell.azure.com, or the mobile app. It provides a preconfigured environment with Bash or PowerShell, eliminating local installation of tools. It runs on a temporary, secure, Linux-based host.&lt;/p&gt;

&lt;p&gt;The full background information on the Cloud Shell is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://azure.microsoft.com/en-us/get-started/azure-portal/cloud-shell" rel="noopener noreferrer"&gt;https://azure.microsoft.com/en-us/get-started/azure-portal/cloud-shell&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;More information is available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/cloud-shell/get-started/classic?wt.mc_id=fulltextsearch_inproduct_azure-portal-global-search&amp;amp;tabs=azurecli" rel="noopener noreferrer"&gt;Get started with Azure Cloud Shell&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Why would I want Gemini CLI on Azure? Isn’t that a Google Thing?
&lt;/h4&gt;

&lt;p&gt;Yes- Gemini CLI leverages the Google Cloud console and Gemini models but it is also open source and platform agnostic. Many applications are already cross-cloud so this enables familiar tools to be run natively on Microsoft Azure.&lt;/p&gt;

&lt;h4&gt;
  
  
  Activating Azure CloudShell
&lt;/h4&gt;

&lt;p&gt;The Azure CloudShell comes preinstalled on all Azure Accounts.&lt;/p&gt;

&lt;p&gt;To access the shell — click the Terminal Icon in the upper right hand corner of the console:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbsubssxraetql1lmam9n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbsubssxraetql1lmam9n.png" width="229" height="60"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then activate the full shell:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj6zipv4ru2xdrzjerrc.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj6zipv4ru2xdrzjerrc.png" width="800" height="414"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note- the disk space in the environment is limited to 5GB- but this is sufficient for Gemini CLI, Google Cloud Tools, and some customization.&lt;/p&gt;

&lt;h4&gt;
  
  
  What is this project trying to Do?
&lt;/h4&gt;

&lt;p&gt;This project aims to configure an ADK agent to use an Azure Foundry model. The ADK provides agent support in standard Python code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Lite LLM
&lt;/h4&gt;

&lt;p&gt;LiteLLM is an open-source AI gateway and Python SDK that provides a unified OpenAI-compatible interface to over 100 LLMs (Anthropic, Gemini, Azure, Bedrock, Ollama). It simplifies API management by allowing users to call any model using the same format, providing features like spend tracking, model fallbacks, key management, and load balancing.&lt;/p&gt;

&lt;p&gt;More info is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/BerriAI/litellm" rel="noopener noreferrer"&gt;GitHub - BerriAI/litellm: Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Azure Foundry
&lt;/h4&gt;

&lt;p&gt;Azure AI Foundry (formerly Azure AI Studio) is a unified platform-as-a-service for developing, customizing, and managing AI agents and applications. It serves as an AI “factory” combining model catalogs (including OpenAI and Hugging Face), low-code tools, and enterprise-grade security for building generative AI apps. [&lt;a href="https://learn.microsoft.com/en-us/azure/foundry/what-is-foundry" rel="noopener noreferrer"&gt;1&lt;/a&gt;, &lt;a href="https://azure.microsoft.com/en-us/products/ai-foundry" rel="noopener noreferrer"&gt;2&lt;/a&gt;, &lt;a href="https://www.youtube.com/shorts/j83xQT9AtgA" rel="noopener noreferrer"&gt;3&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/azure/foundry/agents/overview" rel="noopener noreferrer"&gt;4&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;More details are available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://azure.microsoft.com/en-us/products/ai-foundry" rel="noopener noreferrer"&gt;https://azure.microsoft.com/en-us/products/ai-foundry&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxpoyn9lrk93g15meyy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxpoyn9lrk93g15meyy5.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Why would I want Gemini CLI with Azure? Isn’t that a Google Thing?
&lt;/h4&gt;

&lt;p&gt;Yes- Gemini CLI leverages the Google Cloud console and Gemini models but it is also open source and platform agnostic. Many applications are already cross-cloud so this enables familiar tools to be run natively on Microsoft Azure.&lt;/p&gt;

&lt;h4&gt;
  
  
  Azure CLI
&lt;/h4&gt;

&lt;p&gt;The Azure Command-Line Interface (CLI) is a cross-platform tool used to connect to Azure and execute administrative commands on Azure resources. It allows you to manage services like virtual machines, databases, and networking through a terminal using interactive prompts or scripts. [&lt;a href="https://learn.microsoft.com/en-us/cli/azure/what-is-azure-cli?view=azure-cli-latest" rel="noopener noreferrer"&gt;1&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/cli/azure/?view=azure-cli-latest" rel="noopener noreferrer"&gt;2&lt;/a&gt;, &lt;a href="https://sumble.com/tech/azure-cli" rel="noopener noreferrer"&gt;3&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;More details are available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/overview?tabs=linux" rel="noopener noreferrer"&gt;What is the Azure Developer CLI?&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Gemini CLI
&lt;/h4&gt;

&lt;p&gt;If not pre-installed you can download the Gemini CLI to interact with the source files and provide real-time assistance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @google/gemini-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Testing the Gemini CLI Environment
&lt;/h4&gt;

&lt;p&gt;Once you have all the tools and the correct Node.js version in place- you can test the startup of Gemini CLI. You will need to authenticate with a Key or your Google Account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;gemini
&lt;span class="go"&gt;
▝▜▄ Gemini CLI v0.39.1
   ▝▜▄
  ▗▟▀ Signed in with Google /auth
 ▝▀ Plan: Gemini Code Assist Standard /upgrade
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Python MCP Documentation
&lt;/h4&gt;

&lt;p&gt;The official GitHub Repo provides samples and documentation for getting started:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/modelcontextprotocol/python-sdk" rel="noopener noreferrer"&gt;GitHub - modelcontextprotocol/python-sdk: The official Python SDK for Model Context Protocol servers and clients&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent Development Kit
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://www.google.com/search?q=Google+Agent+Development+Kit&amp;amp;rlz=1CAIWTJ_enUS1114&amp;amp;oq=what+is+the+adk+google&amp;amp;gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIICAEQABgWGB4yCAgCEAAYFhgeMggIAxAAGBYYHjIICAQQABgWGB4yCAgFEAAYFhgeMggIBhAAGBYYHjIKCAcQABgKGBYYHjINCAgQABiGAxiABBiKBTIKCAkQABiABBiiBNIBCDMxODlqMGo3qAIAsAIA&amp;amp;sourceid=chrome&amp;amp;ie=UTF-8&amp;amp;mstk=AUtExfB5Oo7ZHHcDEHu7aqZiPBA2l1c-QGh5dB7xkkDPIiYcn8O1Imt2IHNR7bzA6JnyDCSDCUGpGWTeBW14namlN_QqzJLLI5-px1BE9jfSxwli6njPDPERjm5pRqNP3uC6HhUKiRcTJ1T8x5LHQrCkVxylw7QWg0N8B4dQDIcWpnVX9Gc&amp;amp;csui=3&amp;amp;ved=2ahUKEwjYu-G8p-uSAxXrv4kEHUbpLo0QgK4QegQIARAB" rel="noopener noreferrer"&gt;Google Agent Development Kit&lt;/a&gt; (ADK) is an open-source, Python-based framework designed to streamline the creation, deployment, and orchestration of sophisticated, multi-agent AI systems. It treats agent development like software engineering, offering modularity, state management, and built-in tools (like Google Search) to build autonomous agents.&lt;/p&gt;

&lt;p&gt;The ADK can be installed from here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Where do I start?
&lt;/h4&gt;

&lt;p&gt;The strategy for starting multi agent development is a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;First, the basic development environment is setup with the required system variables, and a working Gemini CLI configuration.&lt;/p&gt;

&lt;p&gt;Then, and ADK agent is built, debugged, and tested locally.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setup the Basic Environment
&lt;/h4&gt;

&lt;p&gt;At this point you should have a working Python environment and a working Gemini CLI installation. The next step is to clone the GitHub samples repository with support scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~
git clone https://github.com/xbill9/gemini-cli-azure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;strong&gt;init.sh&lt;/strong&gt; from the cloned directory.&lt;/p&gt;

&lt;p&gt;The script will attempt to determine your shell environment and set the correct variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;adk-litellm
&lt;span class="nb"&gt;source &lt;/span&gt;init.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your session times out or you need to re-authenticate- you can run the &lt;strong&gt;set_env.sh&lt;/strong&gt; script to reset your environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source &lt;/span&gt;set_env.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Variables like PROJECT_ID need to be setup for use in the various build scripts- so the set_env script can be used to reset the environment if you time-out.&lt;/p&gt;

&lt;h4&gt;
  
  
  Azure Setup
&lt;/h4&gt;

&lt;p&gt;The Azure CLI provides the basic tools for working with the cloud based services:&lt;/p&gt;

&lt;p&gt;Check the installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;william [~/gemini-cli-azure/adk-litellm-azure]$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;az &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;azure-cli 2.85.0 *

core 2.85.0 *
telemetry 1.1.0

Extensions:
ml 2.42.0
ssh 2.0.6

Dependencies:
msal 1.35.1
azure-mgmt-resource 24.0.0

Python location '/usr/bin/python3.12'
Config directory '/home/william/.azure'
Extensions directory '/home/william/.azure/cliextensions'
Extensions system directory '/usr/lib/python3.12/site-packages/azure-cli-extensions'

Python (Linux) 3.12.9 (main, Mar 26 2026, 23:21:55) [GCC 13.2.0]

Legal docs and information: aka.ms/AzureCliLegal

You have 2 update(s) available. They will be updated with the next build of Cloud Shell.
&lt;/span&gt;&lt;span class="gp"&gt;william [~/gemini-cli-azure/adk-litellm-azure]$&lt;/span&gt;&lt;span class="w"&gt; 
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the Azure setup scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./setup-azure-phi.sh

./get-azure-phi-creds.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Azure console should look something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjv3trvi7o26qlth7teo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgjv3trvi7o26qlth7teo.png" width="800" height="381"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With the resource group looking like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvk2mgwecn6bqk4ixu5d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvk2mgwecn6bqk4ixu5d.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Running the Python Code
&lt;/h4&gt;

&lt;p&gt;First- switch the directory with the Python version of the MCP sample code. Run the release version on the local system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight make"&gt;&lt;code&gt;&lt;span class="err"&gt;cd&lt;/span&gt; &lt;span class="err"&gt;adk-litellm$&lt;/span&gt; &lt;span class="err"&gt;make&lt;/span&gt; &lt;span class="err"&gt;install&lt;/span&gt;
&lt;span class="err"&gt;Processing&lt;/span&gt; &lt;span class="err"&gt;./.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Verify The ADK Installation
&lt;/h4&gt;

&lt;p&gt;First- verify the ADK is installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;william [~/gemini-cli-azure/adk-litellm-azure]$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;adk, version 1.32.0
&lt;/span&gt;&lt;span class="gp"&gt;william [~/gemini-cli-azure/adk-litellm-azure]$&lt;/span&gt;&lt;span class="w"&gt; 
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To check the ADK setup, run the ADK CLI locally with the hello Agent. This is a basic Hello World style agent. No external tools or MCP calls are used in the Agent code- which allows the ADK installation to be checked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;william [~/gemini-cli-azure/adk-litellm-azure]$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk run agents/azure
&lt;span class="go"&gt;Log setup complete: /tmp/agents_log/agent.20260505_192533.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/william/.local/lib/python3.12/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/william/.local/lib/python3.12/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent azure_agent, type exit to exit.
[user]: what model are you
19:25:39 - LiteLLM:INFO: utils.py:4011 - 
&lt;/span&gt;&lt;span class="gp"&gt;LiteLLM completion() model= phi-4-mini;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;provider &lt;span class="o"&gt;=&lt;/span&gt; azure_ai
&lt;span class="go"&gt;[azure_agent]: I am Phi, an AI developed by Microsoft, created to help assist you with providing information, answering questions, and helping out with related tasks. How can I assist you today?


&lt;h4&gt;
  
  
  Phi Models on Azure
&lt;/h4&gt;

&lt;p&gt;Microsoft Phi models are a family of highly capable, cost-effective Small Language Models (SLMs) designed for low-latency generative AI, available on Azure AI Foundry. They support text, image, and audio inputs (Phi-4-multimodal) and are ideal for memory-constrained environments. Key models include Phi-4, Phi-3.5-MoE, and Phi-3-mini/medium, optimized for high-performance reasoning tasks.&lt;/p&gt;

&lt;p&gt;More details are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79z2ygkn9mbifwnrgahl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79z2ygkn9mbifwnrgahl.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h4&gt;
  
  
  Deep Dive with a Phi Model
&lt;/h4&gt;

&lt;p&gt;Once the connection has been tested. You can drill into the model:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-azure/adk-litellm-azure$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk run agents/azure
&lt;span class="go"&gt;Log setup complete: /tmp/agents_log/agent.20260505_134754.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/xbill/.local/lib/python3.13/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.local/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent azure_agent, type exit to exit.
[user]: what model are you
13:48:00 - LiteLLM:INFO: utils.py:4011 - 
&lt;/span&gt;&lt;span class="gp"&gt;LiteLLM completion() model= phi-4-mini;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;provider &lt;span class="o"&gt;=&lt;/span&gt; azure_ai
&lt;span class="go"&gt;[azure_agent]: I am Phi, an AI developed by Microsoft, designed to help you with a wide range of queries and tasks. How can I assist you today?


&lt;h4&gt;
  
  
  One Last Code Review with Gemini CLI
&lt;/h4&gt;

&lt;p&gt;Gemini CLI was used to summarize the project:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; code review adk agent for phi model                                                                                                            

✦ I've reviewed the Azure agent implementation in agents/azure/agent.py. Overall, the implementation correctly utilizes the ADK (Agent Development
  Kit) and integrates with Azure AI Foundry via LiteLLM.

  🔍 Code Review Findings

  1. ADK Best Practices
   * Missing description: While the name is correctly defined, the description parameter is missing. In ADK, the description is critical for
     multi-agent systems, as it helps routers and other agents understand when to delegate tasks to this specific agent.
   * Minimal instruction: The current instruction is very brief. ADK best practices suggest using Markdown for clarity and being more specific
     about the agent's persona and constraints.
   * Short request_timeout: The timeout is set to 10 seconds. For LLM inferences, especially via serverless endpoints which may have occasional
     cold starts or latency, 30 seconds is generally more reliable.

  2. Azure AI Foundry Integration
   * Model String: The model string azure_ai/phi-4-mini is correctly formatted for LiteLLM to recognize the Azure AI Foundry provider.
   * Environment Variables: The code correctly leverages AZURE_MODEL with a fallback. It assumes AZURE_AI_API_KEY and AZURE_AI_API_BASE are set in
     the environment, which matches the setup-azure-phi.sh automation script.

  3. Code Quality
   * Imports: Clean and follows standard ADK patterns.
   * Encapsulation: Using root_agent as the export in agent.py (and re-exporting in __init__.py) is the standard pattern for ADK's discovery
     mechanism.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;The strategy for using the ADK with Azure LLMs was validated with a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;First- the basic environment tools were configured. Then the required Python libraries were installed. LiteLLM was used to proxy the Google ADK to a Model hosted on Azure. Finally, some tests were run from the CLI and the Web interface. Gemini CLI was used to validate the approach.&lt;/p&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>phimodel</category>
      <category>azurefoundry</category>
      <category>azurecloudshell</category>
      <category>gemini</category>
    </item>
    <item>
      <title>[Google Cloud Next '26 Recap #5] How I Prepared for the Trip — and How It Actually Played Out</title>
      <dc:creator>koichim2</dc:creator>
      <pubDate>Thu, 07 May 2026 15:40:31 +0000</pubDate>
      <link>https://dev.to/gde/google-cloud-next-26-recap-5-how-i-prepared-for-the-trip-and-how-it-actually-played-out-19ao</link>
      <guid>https://dev.to/gde/google-cloud-next-26-recap-5-how-i-prepared-for-the-trip-and-how-it-actually-played-out-19ao</guid>
      <description>&lt;p&gt;This is the fifth and &lt;strong&gt;final&lt;/strong&gt; post in my Google Cloud Next '26 (Las Vegas) recap series.&lt;/p&gt;

&lt;p&gt;You can find the previous posts here 👇&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Part 1: &lt;a href="https://dev.to/gde/google-cloud-next-26-recap-hands-on-with-the-agentic-hack-zone-12p9"&gt;[Google Cloud Next '26 Recap #1] Hands-On with the Agentic Hack Zone&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 2: &lt;a href="https://dev.to/gde/google-cloud-next-26-recap-2-three-unique-booths-i-tried-at-the-expo-2n6"&gt;[Google Cloud Next '26 Recap #2] Three Unique Booths I Tried at the EXPO&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 3: &lt;a href="https://dev.to/gde/google-cloud-next-26-recap-3-anthropics-vision-for-after-software-2cj6"&gt;[Google Cloud Next '26 Recap #3] Anthropic's Vision for "After Software"&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Part 4: &lt;a href="https://dev.to/gde/google-cloud-next-26-recap-4-live-report-from-the-two-keynotes-34nb"&gt;[Google Cloud Next '26 Recap #4] Live Report from the Two Keynotes&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So far, I've spent four posts walking through my Next '26 on-site experience. To wrap up the series, I'd like to look back on &lt;strong&gt;what I prepared before the trip — and how those preparations actually played out&lt;/strong&gt; once I was on the ground.&lt;/p&gt;

&lt;p&gt;Attending Next '26 in person turned out to be a &lt;strong&gt;5-night international trip&lt;/strong&gt;. I hope this post can serve as a useful reference for anyone planning to attend a tech conference in Las Vegas in the future.&lt;/p&gt;




&lt;h2&gt;
  
  
  Where I was coming from
&lt;/h2&gt;

&lt;p&gt;This was my overseas tech conference since I attended Next in San Francisco three years ago, and &lt;strong&gt;my very first time in Las Vegas&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;While I had attended Next once before, this was my first visit to Vegas itself, so I was very deliberate about gathering information during the prep phase.&lt;/p&gt;

&lt;p&gt;Since Google Cloud Next was held at &lt;strong&gt;the same Las Vegas venue twice — in '24 and '25&lt;/strong&gt; — I was able to learn a lot from past attendees by &lt;strong&gt;watching their YouTube videos and blog posts, and attending lightning talks at pre-events organized by communities back home&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The resources I leaned on the most are below — many thanks to the authors 🙏&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Jesse Nowlin: &lt;a href="https://youtu.be/pqNG4FgLHYo" rel="noopener noreferrer"&gt;https://youtu.be/pqNG4FgLHYo&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Kazumasa Iwao: &lt;a href="https://zenn.dev/hogeticlab/articles/0e8aa36b179930" rel="noopener noreferrer"&gt;https://zenn.dev/hogeticlab/articles/0e8aa36b179930&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Jagu'e'r: &lt;a href="https://jaguer.connpass.com/event/384137/" rel="noopener noreferrer"&gt;https://jaguer.connpass.com/event/384137/&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Things I felt I needed to prepare for
&lt;/h2&gt;

&lt;p&gt;Once I started organizing all that information, I realized there were a surprising number of things to think about before leaving:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dryness&lt;/strong&gt; from Las Vegas's desert climate&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sunburn&lt;/strong&gt; during the day&lt;/li&gt;
&lt;li&gt;The convention center and airplane cabin being &lt;strong&gt;freezing cold&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;The convention center being &lt;strong&gt;huge&lt;/strong&gt; — lots of walking&lt;/li&gt;
&lt;li&gt;Next having so much content that I might &lt;strong&gt;run out of time&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Possible crowds at &lt;strong&gt;badge pickup&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SWAG&lt;/strong&gt; filling up my luggage&lt;/li&gt;
&lt;li&gt;Possible &lt;strong&gt;battery drain&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hydration&lt;/strong&gt; (whether to bring a water bottle)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bag size restrictions&lt;/strong&gt; at the convention center&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clear bag requirements&lt;/strong&gt; for stadium entry&lt;/li&gt;
&lt;li&gt;General travel logistics: &lt;strong&gt;getting around the city, airport security, connecting flights, immigration&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Topics like meals, casinos and other entertainment, hotel amenities, medications, travel insurance, and ESTA are out of scope for this post.&lt;/p&gt;

&lt;p&gt;From here on, I'll share &lt;strong&gt;what I actually packed/prepared and how it went on the ground&lt;/strong&gt;, based on my own personal experience.&lt;/p&gt;




&lt;h2&gt;
  
  
  Coping with Las Vegas dryness
&lt;/h2&gt;

&lt;p&gt;What I brought:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Lip balm&lt;/li&gt;
&lt;li&gt;Hand cream&lt;/li&gt;
&lt;li&gt;Eye drops&lt;/li&gt;
&lt;li&gt;Toner / lotion&lt;/li&gt;
&lt;li&gt;Skincare cream&lt;/li&gt;
&lt;li&gt;Throat lozenges&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result: Extremely useful.&lt;/strong&gt; The dryness on the ground was more intense than I expected, and every single one of these earned its place. Skin, throat, and eye care turned out to matter way more than I'd anticipated — definitely glad I brought them all.&lt;/p&gt;




&lt;h2&gt;
  
  
  Sunburn protection during the day
&lt;/h2&gt;

&lt;p&gt;What I brought:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Sunscreen&lt;/li&gt;
&lt;li&gt;A cap&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;A thin, lightweight hooded jacket made for trail running&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Result: Glad I brought all of these.&lt;/strong&gt; The Vegas sun was stronger than I imagined — even a short walk outside left a clear "I'm getting cooked" feeling on my skin.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cold convention centers and airplane cabins
&lt;/h2&gt;

&lt;p&gt;This was solved with the same hooded jacket I mentioned above.&lt;/p&gt;

&lt;p&gt;Since it &lt;strong&gt;doubled as both sun protection and warmth&lt;/strong&gt;, it ended up being &lt;strong&gt;one of the most useful items of the entire trip&lt;/strong&gt;. It handled the over-air-conditioned convention center and long-haul flights equally well.&lt;/p&gt;




&lt;h2&gt;
  
  
  Walking the huge venue, and my final outfit setup
&lt;/h2&gt;

&lt;p&gt;Because the dryness, heat, and cold all turned out to be more intense than expected, I made &lt;strong&gt;prioritizing my health and stamina&lt;/strong&gt; my top concern, and decided to go as &lt;strong&gt;light as possible&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In my case, I went all in on the choice to &lt;strong&gt;leave my laptop at the hotel&lt;/strong&gt;. My final on-site setup looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The hooded jacket on top&lt;/li&gt;
&lt;li&gt;Thick-soled sneakers&lt;/li&gt;
&lt;li&gt;Stretchy pants&lt;/li&gt;
&lt;li&gt;Badge hanging from my neck&lt;/li&gt;
&lt;li&gt;Passport stayed in the hotel safe — not on me&lt;/li&gt;
&lt;li&gt;In my pockets: &lt;strong&gt;phone, hotel key card, one credit card&lt;/strong&gt; — that's it&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In short, I &lt;strong&gt;walked in empty-handed&lt;/strong&gt;. The downside was that my only computing device on me was my phone, but in exchange I could &lt;strong&gt;move around the venue extremely lightly&lt;/strong&gt; — and for me, this style turned out to be the right call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Too much content at Next
&lt;/h2&gt;

&lt;p&gt;Next is the kind of event where, if you're not careful, you blink and suddenly there's no time left for the things you actually wanted to do.&lt;/p&gt;

&lt;p&gt;To deal with this, I &lt;strong&gt;picked and reserved sessions early on&lt;/strong&gt;. My booking rules were:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Only sessions I'd want to keep on my schedule &lt;strong&gt;even if I had to cut other plans&lt;/strong&gt; (i.e., true must-sees)&lt;/li&gt;
&lt;li&gt;A hard cap of &lt;strong&gt;2–3 sessions per day&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There were tons of sessions available, but by deliberately narrowing things down, I made sure I had &lt;strong&gt;plenty of buffer time for exploring the venue and trying out booths&lt;/strong&gt;. This is advice many people give, and it's also a direct reflection on having packed my schedule too tightly the last time I went to Next.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Result: This worked out really well.&lt;/strong&gt; I came away with both rich session content and plenty of booth experiences.&lt;/p&gt;

&lt;h3&gt;
  
  
  Failure story: I got locked out of a workshop
&lt;/h3&gt;

&lt;p&gt;That said, I did mess one thing up.&lt;/p&gt;

&lt;p&gt;I had reserved a &lt;strong&gt;3-hour workshop&lt;/strong&gt; right after the Opening Keynote, but I got hungry and decided to &lt;strong&gt;grab a snack at the partner lounge first&lt;/strong&gt; — by the time I made it to the workshop room, &lt;strong&gt;it had already filled up to capacity, and I couldn't get in&lt;/strong&gt; 😭&lt;/p&gt;

&lt;p&gt;Apparently, after a session has been running for a while, they start letting standby attendees take vacant seats. &lt;strong&gt;Grabbing food to-go and eating it while attending the workshop&lt;/strong&gt; would probably have been the right move.&lt;/p&gt;

&lt;p&gt;After that, even for sessions I had already reserved, I started lining up early whenever I knew the session had reached its booking cap.&lt;/p&gt;




&lt;h2&gt;
  
  
  Crowds at badge pickup
&lt;/h2&gt;

&lt;p&gt;I handled this by &lt;strong&gt;picking up my badge as soon as I landed at the airport&lt;/strong&gt;, which made the whole process pretty smooth.&lt;/p&gt;




&lt;h2&gt;
  
  
  SWAG filling up your luggage
&lt;/h2&gt;

&lt;p&gt;SWAG is one of those things where you keep wanting just one more thing, and your bag fills up fast on the way home.&lt;/p&gt;

&lt;p&gt;This time I exercised some restraint and stuck mostly to &lt;strong&gt;pins, stickers, and other small, low-volume items&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Battery situation
&lt;/h2&gt;

&lt;p&gt;I packed a portable battery just in case, but as mentioned above, all I was carrying was my phone, and I wasn't using it heavily for long stretches, so I was fine.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things I brought that I never used
&lt;/h2&gt;

&lt;p&gt;A few "just in case" items that ended up never coming out of my bag:&lt;/p&gt;

&lt;h3&gt;
  
  
  ・Water bottle
&lt;/h3&gt;

&lt;p&gt;I brought one from home for hydration — but &lt;strong&gt;water bottles were being given out for free at the venue&lt;/strong&gt;. On top of that, there were plenty of spots to drink water from paper cups, so I ended up &lt;strong&gt;bringing the freebie bottle home, completely unused&lt;/strong&gt;…&lt;/p&gt;

&lt;h3&gt;
  
  
  ・A bag for the venue
&lt;/h3&gt;

&lt;p&gt;I brought a separate bag sized to fit the official policy (roughly &lt;strong&gt;12" × 18"&lt;/strong&gt;) in addition to my travel bag — but since I went &lt;strong&gt;empty-handed&lt;/strong&gt;, it never got used.&lt;/p&gt;

&lt;h3&gt;
  
  
  ・Clear bag for stadium entry
&lt;/h3&gt;

&lt;p&gt;Policy stated that &lt;strong&gt;a clear bag is required for Next at Night&lt;/strong&gt;, so I prepared one in advance — but again, I went in empty-handed, so this also went unused.&lt;/p&gt;




&lt;h2&gt;
  
  
  Getting around, the airport, connections, and immigration
&lt;/h2&gt;

&lt;p&gt;Apps made this side of things significantly more comfortable:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Uber app&lt;/strong&gt;: handled all my airport ↔ hotel rides&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MPC app&lt;/strong&gt;: gave me access to a &lt;strong&gt;dedicated lane&lt;/strong&gt; at immigration during my connection — big time saver&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MyTSA app&lt;/strong&gt;: about a month before the trip, there was news about &lt;strong&gt;major airport security backlogs caused by TSA pay disruptions and staffing shortages tied to a federal funding lapse&lt;/strong&gt; — I used this to check current wait times&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Airline app&lt;/strong&gt;: kept me on top of delays, schedule changes, and gate changes&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  A side note on CLEAR+
&lt;/h3&gt;

&lt;p&gt;On the way home, I had to re-enter security at LAX during my connection, and a &lt;strong&gt;CLEAR+ Ambassador&lt;/strong&gt; caught me near the security entrance and pitched the service. I went along with it and signed up on the spot.&lt;/p&gt;

&lt;p&gt;It did genuinely &lt;strong&gt;shortcut my path through security&lt;/strong&gt;, but I later discovered that &lt;strong&gt;after the 2-week trial period, it costs roughly $200/year as a subscription&lt;/strong&gt;. Since I have no plans to be back in the US anytime soon, &lt;strong&gt;I cancelled before the charge hit&lt;/strong&gt;. Lesson learned: be careful about signing up for things on impulse abroad.&lt;/p&gt;




&lt;h2&gt;
  
  
  Wrap-up: what I learned from this trip
&lt;/h2&gt;

&lt;p&gt;The takeaways I'd hand my future self:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Dryness, sunburn, and cold protection are non-negotiables.&lt;/strong&gt; A lightweight hooded jacket is a multi-purpose powerhouse.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Going empty-handed is more comfortable than expected.&lt;/strong&gt; Committing to leaving the laptop behind paid off.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cap your sessions.&lt;/strong&gt; 2–3 a day, with the gaps reserved for venue exploration, ends up being more satisfying.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Line up early for popular sessions and workshops.&lt;/strong&gt; "I have a reservation" doesn't always mean you'll get in.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Water bottles and secondary bags&lt;/strong&gt; can easily go unused depending on your style.&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Use apps to streamline immigration and connections.&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;And with that, this wraps up my series of posts on Next '26.&lt;/p&gt;

&lt;p&gt;I wanted to share the most memorable lessons from everything I learned on this trip. I hope this proves to be useful to anyone reading.&lt;/p&gt;

&lt;p&gt;Thank you so much for reading all the way to the end. See you somewhere next time!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;See you in the Next.&lt;/strong&gt; 👋&lt;/p&gt;

</description>
      <category>googlecloud</category>
      <category>googlecloudnext</category>
    </item>
    <item>
      <title>MCP Development with Amazon Lambda Managed Instances</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Thu, 07 May 2026 13:08:51 +0000</pubDate>
      <link>https://dev.to/gde/mcp-development-with-amazon-lambda-managed-instances-24cj</link>
      <guid>https://dev.to/gde/mcp-development-with-amazon-lambda-managed-instances-24cj</guid>
      <description>&lt;p&gt;Leveraging Gemini CLI and the underlying Gemini LLM to build Model Context Protocol (MCP) AI applications with Python from a local development environment deployed to the Lambda Managed Instance (LMI) service on AWS.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5c22gikdpvplcldqx1c3.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5c22gikdpvplcldqx1c3.jpeg" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Aren’t There a Gazillion Python MCP Demos?
&lt;/h4&gt;

&lt;p&gt;Yes there are.&lt;/p&gt;

&lt;p&gt;Python has traditionally been the main coding language for ML and AI tools. The goal of this article is to provide a minimal viable basic working MCP stdio server that can be run locally without any unneeded extra code or extensions.&lt;/p&gt;

&lt;h4&gt;
  
  
  What Is Python?
&lt;/h4&gt;

&lt;p&gt;Python is an interpreted language that allows for rapid development and testing and has deep libraries for working with ML and AI:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.python.org/" rel="noopener noreferrer"&gt;Welcome to Python.org&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Python Version Management
&lt;/h4&gt;

&lt;p&gt;One of the downsides of the wide deployment of Python has been managing the language versions across platforms and maintaining a supported version.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;pyenv&lt;/strong&gt; tool enables deploying consistent versions of Python:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/pyenv/pyenv" rel="noopener noreferrer"&gt;GitHub - pyenv/pyenv: Simple Python version management&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As of writing — the mainstream python version is 3.13. To validate your current Python:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;python &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;Python 3.13.13
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Gemini CLI
&lt;/h4&gt;

&lt;p&gt;If not pre-installed you can download the Gemini CLI to interact with the source files and provide real-time assistance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;npm install -g @google/gemini-cli
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Testing the Gemini CLI Environment
&lt;/h4&gt;

&lt;p&gt;Once you have all the tools and the correct Node.js version in place- you can test the startup of Gemini CLI. You will need to authenticate with a Key or your Google Account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;gemini

&lt;/span&gt;&lt;span class="gp"&gt;admin@ip-172-31-70-211:~/gemini-cli-aws/mcp-lambda-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;gemini
&lt;span class="go"&gt;
▝▜▄ Gemini CLI v0.33.1
    ▝▜▄
   ▗▟▀ Logged in with Google /auth
  ▝▀ Gemini Code Assist Standard /upgrade no sandbox (see /docs) /model Auto (Gemini 3) | 239.8 MB
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Node Version Management
&lt;/h4&gt;

&lt;p&gt;Gemini CLI needs a consistent, up to date version of Node. The &lt;strong&gt;nvm&lt;/strong&gt; command can be used to get a standard Node environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nvm-sh/nvm" rel="noopener noreferrer"&gt;GitHub - nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Python MCP Documentation
&lt;/h4&gt;

&lt;p&gt;The official GitHub Repo provides samples and documentation for getting started:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/modelcontextprotocol/python-sdk" rel="noopener noreferrer"&gt;GitHub - modelcontextprotocol/python-sdk: The official Python SDK for Model Context Protocol servers and clients&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The most common MCP Python deployment path uses the FASTMCP library:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://gofastmcp.com/getting-started/welcome" rel="noopener noreferrer"&gt;Welcome to FastMCP - FastMCP&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Docker Version Management
&lt;/h4&gt;

&lt;p&gt;The AWS Cli tools and Lightsail extensions need current version of Docker. If your environment does not provide a recent docker tool- the Docker Version Manager can be used to downlaod the latest supported Docker:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://howtowhale.github.io/dvm/install.html" rel="noopener noreferrer"&gt;Install&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To check the version of Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;dvm &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;Docker Version Manager version 1.0.3 (e59ba7b)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Amazon Lamba Managed Instances
&lt;/h4&gt;

&lt;p&gt;Lambda Managed Instances (LMI) allow running AWS Lambda functions on dedicated EC2 instances, blending serverless simplicity with EC2 performance flexibility. They feature fully managed infrastructure (patching, scaling) with higher throughput, no cold starts, and cost-efficient scaling for predictable, high-volume, memory-intensive, or specialized hardware workloads&lt;/p&gt;

&lt;p&gt;Full details are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/lambda/latest/dg/lambda-managed-instances.html" rel="noopener noreferrer"&gt;Lambda Managed Instances&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  AWS CLI
&lt;/h4&gt;

&lt;p&gt;The AWS CLI provides a command line tool to directly access AWS services from your current environment. Full details on the CLI are available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.aws.amazon.com/lightsail/latest/userguide/amazon-lightsail-install-software.html" rel="noopener noreferrer"&gt;Install Docker, AWS CLI, and the Lightsail Control plugin for containers&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can version check the tool after installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;aws-cli/2.34.43 Python/3.14.4 Linux/6.6.99-09128-g14e87a8a9b71 exe/x86_64.debian.12
&lt;/span&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; 
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Where do I start?
&lt;/h4&gt;

&lt;p&gt;The strategy for starting MCP development is a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;First, the basic development environment is setup with the required system variables, and a working Gemini CLI configuration.&lt;/p&gt;

&lt;p&gt;Then, a minimal Hello World Style Python MCP Server is built with HTTP transport. This server is validated with Gemini CLI in the local environment.&lt;/p&gt;

&lt;p&gt;This setup validates the connection from Gemini CLI to the local process via MCP. The MCP client (Gemini CLI) and the Python MCP server both run in the same local environment.&lt;/p&gt;

&lt;p&gt;Next- the MCP server is wrapped in a container with docker and deployed to Amazon Lambda Instances. This remote deployment is validated with Gemini CLI running as a MCP client.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setup the Basic Environment
&lt;/h4&gt;

&lt;p&gt;At this point you should have a working Python interpreter and a working Gemini CLI installation. The next step is to clone the GitHub samples repository with support scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;cd ~
git clone https://github.com/xbill9/gemini-cli-aws
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;strong&gt;init.sh&lt;/strong&gt; from the cloned directory.&lt;/p&gt;

&lt;p&gt;The script will attempt to determine your shell environment and set the correct variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;cd gemini-cli-aws
source init.sh
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your session times out or you need to re-authenticate- you can run the &lt;strong&gt;set_env.sh&lt;/strong&gt; script to reset your environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;cd gemini-cli-aws
source set_env.sh
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Variables like PROJECT_ID need to be setup for use in the various build scripts- so the set_env script can be used to reset the environment if you time-out.&lt;/p&gt;

&lt;h4&gt;
  
  
  Hello World with HTTP Transport
&lt;/h4&gt;

&lt;p&gt;One of the key features that the standard MCP libraries provide is abstracting various transport methods.&lt;/p&gt;

&lt;p&gt;The high level MCP tool implementation is the same no matter what low level transport channel/method that the MCP Client uses to connect to a MCP Server.&lt;/p&gt;

&lt;p&gt;The simplest transport that the SDK supports is the stdio (stdio/stdout) transport — which connects a locally running process. Both the MCP client and MCP Server must be running in the same environment.&lt;/p&gt;

&lt;p&gt;The HTTP transport allows the MCP client and server to run in the same environment or distributed over the Internet.&lt;/p&gt;

&lt;p&gt;The connection over HTTP will look similar to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mcp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;transport&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Running the Python Code
&lt;/h4&gt;

&lt;p&gt;First- switch the directory with the Python MCP sample code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;cd ~/gemini-cli-aws/mcp-lambdami-python-aws
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Refresh the AWS credentials:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;aws login &lt;span class="nt"&gt;--remote&lt;/span&gt;
&lt;span class="go"&gt;Browser will not be automatically opened.

&lt;/span&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;source &lt;/span&gt;save-aws-creds.sh 
&lt;span class="go"&gt;Exporting AWS credentials...
Successfully saved credentials to .aws_creds
The Makefile will now automatically use these for deployments.
&lt;/span&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lmi-python-aws$&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run the deploy version on the local system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;source &lt;/span&gt;save-aws-creds.sh 
&lt;span class="go"&gt;Exporting AWS credentials...
Successfully saved credentials to .aws_creds
The Makefile will now automatically use these for deployments.
&lt;/span&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make deploy
&lt;span class="go"&gt;Step 0: Running Tests...
make[1]: Entering directory '/home/xbill/gemini-cli-aws/mcp-lambdami-python-aws'
Running tests...
============================================================== test session starts ===============================================================
platform linux -- Python 3.13.13, pytest-9.0.2, pluggy-1.6.0
rootdir: /home/xbill
configfile: pyproject.toml
plugins: anyio-4.11.0, asyncio-1.3.0, langsmith-0.8.0
asyncio: mode=Mode.STRICT, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 3 items                                                                                                                                

tests/test_main.py ... [100%]

=============================================================== 3 passed in 0.74s ================================================================
make[1]: Leaving directory '/home/xbill/gemini-cli-aws/mcp-lambdami-python-aws'
Step 1: Building and Pushing Docker Image to ECR...
make[1]: Entering directory '/home/xbill/gemini-cli-aws/mcp-lambdami-python-aws'
Checking if ECR repository exists...
Logging in to Amazon ECR...

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can validate the final result by checking the messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  Step 7: Publishing to LATEST.PUBLISHED for Managed Instances support...
Step 8: Finalizing API Gateway...
Deployment complete!
API Endpoint URL: https://3x10n3uguc.execute-api.us-east-1.amazonaws.com
MCP HTTP URL: https://3x10n3uguc.execute-api.us-east-1.amazonaws.com/mcp
Health Check URL: https://3x10n3uguc.execute-api.us-east-1.amazonaws.com/health

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the everything is deployed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make status
&lt;span class="go"&gt;Function Status (mcp-lambda-python-aws):
---------------------------------------------------------------------------------
| GetFunction |
+------------------------------+-------------------------+----------------------+
| LastModified | Name | Status |
+------------------------------+-------------------------+----------------------+
| 2026-05-06T16:33:26.000+0000| mcp-lambda-python-aws | ActiveNonInvocable |
+------------------------------+-------------------------+----------------------+
Prod Alias Status:
---------------------
| GetAlias |
+-------+-----------+
| Name | Version |
+-------+-----------+
| prod | 28 |
+-------+-----------+
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can then get the endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;make endpoint
&lt;span class="go"&gt;https://3x10n3uguc.execute-api.us-east-1.amazonaws.com
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Gemini CLI settings.json
&lt;/h4&gt;

&lt;p&gt;Once you have the deployed endpoint — update the Gemini CLI MCP settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mcpServers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mcp-lambdami-python-aws"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"httpUrl"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://3x10n3uguc.execute-api.us-east-1.amazonaws.com/mcp/"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Remote MCP Server Testing
&lt;/h4&gt;

&lt;p&gt;Restart Gemini CLI and check for the new MCP tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/mcp-lambdami-python-aws$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;gemini
&lt;span class="go"&gt;
 ▝▜▄ Gemini CLI v0.41.1
   ▝▜▄
  ▗▟▀ Signed in with Google /auth
 ▝▀ Plan: Gemini Code Assist Standard /upgrade

/mcp list

🟢 mcp-lambdami-python-aws - Ready (1 tool)
  Tools:
  - mcp_mcp-lambdami-python-aws_greet
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the remote MCP tool in Lambda Managed Instances can be called:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; greet Lambda Managed Instances!

  Researching MCP Server Implementation: Reading main.py to understand the greet tool implementation.

  ✓ ReadFile main.py
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ ✓ greet (mcp-lambdami-python-aws MCP Server) {"param":"Lambda Managed Instances"} │
│ │
│ Hello, Lambda Managed Instances! │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

✦ Hello, Lambda Managed Instances!

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini CLI can also summarize the deployment:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;✦ The Lambda Managed Instances (LMI) deployment in this project is a hybrid setup that combines CloudFormation for infrastructure and AWS CLI for
  specialized Lambda configurations. It is designed to provide high-performance, dedicated compute for the MCP server with zero cold starts and
  multi-concurrency.

  1. Core Architecture
   * Compute: Lambda functions run on dedicated EC2 capacity managed by AWS but living in your VPC.
   * Networking: A VPC with Public and Private subnets. The Lambda resides in the Private subnet and uses a NAT Gateway for outbound internet
     access (required for external tool calls).
   * Inbound: API Gateway (HTTP API) acts as the entry point, routing traffic to a Lambda Alias (prod).

  2. Infrastructure (CloudFormation)
  The template.yaml defines the foundational components:
   * VPC &amp;amp; NAT Gateway: Essential for providing the dedicated instances with a secure environment and internet connectivity.
   * IAM Roles:
       * Execution Role: Standard Lambda permissions + VPC access.
       * Operator Role: Allows the Lambda service to manage EC2 resources (Scaling, Patching) on your behalf via the
         AWSLambdaManagedEC2ResourceOperator policy.
   * API Gateway: Configured with a $default route pointing to the prod alias of the Lambda.

  3. Managed Instances Configuration (deploy.sh)
  Because LMI features are often newer or require specific CLI sequences, deploy.sh handles the following:
   * Capacity Provider: A named resource (mcp-lambda-python-aws-cp) that defines:
       * Instance Requirements: Architectures (x86_64) and Scaling (Max vCPU count).
       * VPC Placement: Subnets and Security Groups.
   * Lambda Function: Created with a capacity-provider-config that links it to the Capacity Provider.
       * Memory/vCPU: Set to 2048MB (2GB) minimum to meet LMI requirements.
       * Package Type: Image (Container-based using the project's Dockerfile).

  4. Deployment Workflow
   1. Build &amp;amp; Push: Docker image is built and pushed to Amazon ECR.
   2. Infra Deploy: CloudFormation creates the VPC, IAM roles, and API Gateway.
   3. CP Setup: The script ensures the Capacity Provider exists and is configured.
   4. Function Update/Create: The Lambda function is deployed or updated with the LMI configuration.
   5. Versioning:
       * A new version is published.
       * The prod alias is updated to point to this version.
       * LATEST_PUBLISHED support: The function is published to the special LATEST_PUBLISHED state, which is a requirement for LMI to manage
         instances effectively.

  5. Key Operational Characteristics
   * Multi-concurrency: Unlike standard Lambda, LMI handles multiple concurrent requests per instance (default 8), making it ideal for the
     persistent connections used in MCP SSE.
   * Stateless HTTP: The server uses stateless_http=True in FastMCP, ensuring compatibility with the Lambda lifecycle.
   * Warm-up Period: Functions may show as ActiveNonInvocable initially while the Managed Instances are being provisioned and warmed up.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;The strategy for using Python for MCP development with Gemini CLI on Amazon Lambda Managed Instances was validated with a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;A minimal HTTP transport MCP Server was started from Python source code and validated with Gemini CLI running as a MCP client in the same local environment.&lt;/p&gt;

&lt;p&gt;This MCP server was then wrapped in a Docker container and deployed to Amazon Lambda Instances. The local copy of Gemini CLI was used as a MCP client to validate the connection.&lt;/p&gt;

&lt;p&gt;This approach can be extended to more complex deployments using other MCP transports and Cloud based options.&lt;/p&gt;

</description>
      <category>agenticai</category>
      <category>mcps</category>
      <category>gemini</category>
      <category>aws</category>
    </item>
    <item>
      <title>[GCP Practice][BwAI] AI-Powered Development: Quickly Deploy a LINE Bot Cloud Backup Tool with Gemini CLI</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Thu, 07 May 2026 04:36:40 +0000</pubDate>
      <link>https://dev.to/gde/gcp-practicebwai-ai-powered-development-quickly-deploy-a-line-bot-cloud-backup-tool-with-4ghi</link>
      <guid>https://dev.to/gde/gcp-practicebwai-ai-powered-development-quickly-deploy-a-line-bot-cloud-backup-tool-with-4ghi</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpdr5gzv1yj95xvae4ss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpdr5gzv1yj95xvae4ss.png" alt="Preview Program 2026-05-05 12.38.54" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;In the upcoming &lt;strong&gt;Build With AI 2026&lt;/strong&gt; workshop, we're bringing a very practical project: the &lt;strong&gt;LINE Bot File Backup Robot&lt;/strong&gt;. It allows you to directly upload images and files from your LINE chatroom to Google Drive, and it will automatically create folders by month to keep things organized.&lt;/p&gt;

&lt;p&gt;Traditionally, putting a project like this, which includes OAuth authorization, a Firestore database, and Cloud Run container deployment, on the cloud would often leave beginners struggling with lengthy &lt;code&gt;gcloud&lt;/code&gt; commands.&lt;/p&gt;

&lt;p&gt;But this time it's different, we have a secret weapon: &lt;strong&gt;Gemini CLI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article will document how we used AI as a DevOps engineer, completing the entire complex deployment process by "talking," and of course, including the various real pitfalls we encountered along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Preparation: Summoning the AI Assistant
&lt;/h2&gt;

&lt;p&gt;Before we start, besides the basic &lt;code&gt;gcloud&lt;/code&gt; installation and login, you only need to install &lt;a href="https://github.com/google/gemini-cli" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Prepare the following "confidential parameters" (all are Mock processed in this article):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PROJECT_ID&lt;/strong&gt;: &lt;code&gt;your-cool-project-id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LINE Channel Secret&lt;/strong&gt;: &lt;code&gt;YOUR_LINE_SECRET_XXXX&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LINE Access Token&lt;/strong&gt;: &lt;code&gt;YOUR_LINE_TOKEN_XXXX&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After entering the project folder, I only said one sentence to Gemini CLI:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Help me deploy to Cloud Run using gcloud, and stop and ask me if you need any information. Refer to the repo…"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Next, it's time to witness miracles (and fix bugs).&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Deployment Process: AI Leading the Way
&lt;/h2&gt;

&lt;p&gt;Gemini CLI intelligently analyzed &lt;code&gt;Dockerfile&lt;/code&gt; and &lt;code&gt;main.go&lt;/code&gt; and immediately listed a set of battle plans.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Environment Detection and API Enablement
&lt;/h3&gt;

&lt;p&gt;The AI first confirmed my current project settings in &lt;code&gt;gcloud&lt;/code&gt; and then enabled the necessary services in one go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;firestore.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  cloudbuild.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  run.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  artifactregistry.googleapis.com

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Creating a Firestore Database (Encountering the First Pitfall)
&lt;/h3&gt;

&lt;p&gt;Our Bot needs to record the OAuth State anti-counterfeiting mark, so Firestore is needed. The AI tried to execute the command, but we immediately encountered an error. &lt;em&gt;(See the pitfall record below for details)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After correction, the correct command is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud firestore databases create &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;firestore-native

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Deploying Cloud Run First, Filling in the Blanks Later
&lt;/h3&gt;

&lt;p&gt;This is a classic "chicken or the egg" problem: Google OAuth needs to know your Cloud Run URL (Redirect URI), but your Cloud Run deployment needs to fill in the OAuth Client ID and Secret.&lt;/p&gt;

&lt;p&gt;Gemini CLI's strategy is great: &lt;strong&gt;Deploy with placeholders first!&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy linebot-backup-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_CLOUD_PROJECT=your-cool-project-id,ChannelSecret=YOUR_LINE_SECRET_XXXX,ChannelAccessToken=YOUR_LINE_TOKEN_XXXX,GOOGLE_CLIENT_ID=PENDING,GOOGLE_CLIENT_SECRET=PENDING,GOOGLE_REDIRECT_URL=PENDING"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After successful deployment, we got a string of fragrant URLs: &lt;code&gt;https://linebot-backup-service-xxxxx.a.run.app&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Completing Google OAuth Settings and Environment Variable Updates
&lt;/h3&gt;

&lt;p&gt;With the URL, I can go to the "API &amp;amp; Services" in Google Cloud Console to complete the settings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Create an &lt;strong&gt;OAuth consent screen&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Create credentials for a &lt;strong&gt;Web application&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Fill in the "Authorized redirect URI" with the URL we just got, plus &lt;code&gt;/oauth/callback&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After getting the real ID and Secret, I directly pasted the information to Gemini CLI, and it automatically updated the service for me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run services update linebot-backup-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--update-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_REDIRECT_URL=https://[YOUR_URL]/oauth/callback,GOOGLE_CLIENT_ID=real-client-id.apps.googleusercontent.com,GOOGLE_CLIENT_SECRET=real-secret-xxxx"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done! Finally, just go to the LINE Developers Console and fill in the Webhook.&lt;/p&gt;




&lt;h2&gt;
  
  
  Blood and Tears Pitfall Records During the Deployment Process
&lt;/h2&gt;

&lt;p&gt;It looks smooth, but in fact, the AI and I hit a few walls together. This is also the most real experience of using CLI tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Forgetting to Bind a Credit Card, the 390001 Error
&lt;/h3&gt;

&lt;p&gt;When executing the first &lt;code&gt;gcloud run deploy&lt;/code&gt;, the terminal directly spewed red text all over the face:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;FAILED_PRECONDITION: Billing account for project is not found...&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: Cloud Run and Cloud Build require the project to enable billing (Billing Enabled). This is a brand new test project, and I forgot to bind the billing account. &lt;strong&gt;Solution&lt;/strong&gt;: The AI immediately checked the project status for me (&lt;code&gt;gcloud beta billing projects describe&lt;/code&gt;) and asked me if I wanted to switch to a project with billing, or to fix it. I obediently went to the Console to bind my credit card, and the deployment was able to continue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: The Evolution of Command Parameter Syntax
&lt;/h3&gt;

&lt;p&gt;When creating Firestore, the AI initially gave the command &lt;code&gt;--type=native-mode&lt;/code&gt; or &lt;code&gt;--type=native&lt;/code&gt;, but gcloud didn't appreciate it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;ERROR: argument --type: Invalid choice: 'native-mode'&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: The CLI parameters of &lt;code&gt;gcloud&lt;/code&gt; will change with version updates. &lt;strong&gt;Solution&lt;/strong&gt;: Carefully look at the gcloud error message, and now the correct parameter values are &lt;code&gt;firestore-native&lt;/code&gt; or &lt;code&gt;datastore-mode&lt;/code&gt;. After changing to &lt;code&gt;--type=firestore-native&lt;/code&gt;, it passed smoothly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: The Invisible "Drive API"
&lt;/h3&gt;

&lt;p&gt;When everything was deployed, we encountered a permission error when testing "upload to Google Drive". &lt;strong&gt;Reason&lt;/strong&gt;: This is a Bot that helps you upload files to Drive, but when we enabled the API in the first step, we actually forgot to enable the protagonist: &lt;strong&gt;Google Drive API&lt;/strong&gt;! Without it, even if OAuth authorization is successful, the program will still be blocked. &lt;strong&gt;Solution&lt;/strong&gt;: I only entered the mysterious &lt;code&gt;"3."&lt;/code&gt; (implying the third checkpoint) into the terminal, and the AI immediately understood and added this critical blow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;drive.googleapis.com

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Through Gemini CLI, the originally tedious and error-prone infrastructure construction work has become a "two-person pair programming" session.&lt;/p&gt;

&lt;p&gt;AI can help you remember lengthy gcloud parameters, help you sort out the deployment logic (deploy with PENDING first and then update), and even adjust strategies quickly based on error messages when you encounter errors.&lt;/p&gt;

&lt;p&gt;This is the core spirit that &lt;strong&gt;Build With AI 2026&lt;/strong&gt; wants to convey: let AI handle the tedious DevOps chores, so that developers can focus more energy on innovation in core business logic.&lt;/p&gt;

&lt;p&gt;If you are still manually typing long and ugly gcloud commands, I strongly recommend you install Gemini CLI and give it a try!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building ADK Agents with Phi Models on Azure Foundry</title>
      <dc:creator>xbill</dc:creator>
      <pubDate>Wed, 06 May 2026 14:32:51 +0000</pubDate>
      <link>https://dev.to/gde/building-adk-agents-with-phi-models-on-azure-foundry-25a2</link>
      <guid>https://dev.to/gde/building-adk-agents-with-phi-models-on-azure-foundry-25a2</guid>
      <description>&lt;p&gt;This article provides a step by step deployment guide for using Azure foundry models with ADK Agents. The small Phi-4 model hosted on Azure Foundry is configured with LiteLLM for use from the ADK Agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpa5q85flxsd3jl7hfb5.jpeg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjpa5q85flxsd3jl7hfb5.jpeg" width="800" height="437"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  What is this project trying to Do?
&lt;/h4&gt;

&lt;p&gt;This project aims to configure an ADK agent to use an Azure Foundry model. The ADK provides agent support in standard Python code.&lt;/p&gt;

&lt;h4&gt;
  
  
  Lite LLM
&lt;/h4&gt;

&lt;p&gt;LiteLLM is an open-source AI gateway and Python SDK that provides a unified OpenAI-compatible interface to over 100 LLMs (Anthropic, Gemini, Azure, Bedrock, Ollama). It simplifies API management by allowing users to call any model using the same format, providing features like spend tracking, model fallbacks, key management, and load balancing.&lt;/p&gt;

&lt;p&gt;More info is here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/BerriAI/litellm" rel="noopener noreferrer"&gt;GitHub - BerriAI/litellm: Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Azure Foundry
&lt;/h4&gt;

&lt;p&gt;Azure AI Foundry (formerly Azure AI Studio) is a unified platform-as-a-service for developing, customizing, and managing AI agents and applications. It serves as an AI “factory” combining model catalogs (including OpenAI and Hugging Face), low-code tools, and enterprise-grade security for building generative AI apps. [&lt;a href="https://learn.microsoft.com/en-us/azure/foundry/what-is-foundry" rel="noopener noreferrer"&gt;1&lt;/a&gt;, &lt;a href="https://azure.microsoft.com/en-us/products/ai-foundry" rel="noopener noreferrer"&gt;2&lt;/a&gt;, &lt;a href="https://www.youtube.com/shorts/j83xQT9AtgA" rel="noopener noreferrer"&gt;3&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/azure/foundry/agents/overview" rel="noopener noreferrer"&gt;4&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;More details are available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://azure.microsoft.com/en-us/products/ai-foundry" rel="noopener noreferrer"&gt;https://azure.microsoft.com/en-us/products/ai-foundry&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxpoyn9lrk93g15meyy5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnxpoyn9lrk93g15meyy5.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Why would I want Gemini CLI with Azure? Isn’t that a Google Thing?
&lt;/h4&gt;

&lt;p&gt;Yes- Gemini CLI leverages the Google Cloud console and Gemini models but it is also open source and platform agnostic. Many applications are already cross-cloud so this enables familiar tools to be run natively on Microsoft Azure.&lt;/p&gt;

&lt;h4&gt;
  
  
  Azure CLI
&lt;/h4&gt;

&lt;p&gt;The Azure Command-Line Interface (CLI) is a cross-platform tool used to connect to Azure and execute administrative commands on Azure resources. It allows you to manage services like virtual machines, databases, and networking through a terminal using interactive prompts or scripts. [&lt;a href="https://learn.microsoft.com/en-us/cli/azure/what-is-azure-cli?view=azure-cli-latest" rel="noopener noreferrer"&gt;1&lt;/a&gt;, &lt;a href="https://learn.microsoft.com/en-us/cli/azure/?view=azure-cli-latest" rel="noopener noreferrer"&gt;2&lt;/a&gt;, &lt;a href="https://sumble.com/tech/azure-cli" rel="noopener noreferrer"&gt;3&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;More details are available here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://learn.microsoft.com/en-us/azure/developer/azure-developer-cli/overview?tabs=linux" rel="noopener noreferrer"&gt;What is the Azure Developer CLI?&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Gemini CLI
&lt;/h4&gt;

&lt;p&gt;If not pre-installed you can download the Gemini CLI to interact with the source files and provide real-time assistance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @google/gemini-cli
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Testing the Gemini CLI Environment
&lt;/h4&gt;

&lt;p&gt;Once you have all the tools and the correct Node.js version in place- you can test the startup of Gemini CLI. You will need to authenticate with a Key or your Google Account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/aisprintapr2026/tpu-vllm-devops-agent$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;gemini
&lt;span class="go"&gt;
▝▜▄ Gemini CLI v0.39.1
   ▝▜▄
  ▗▟▀ Signed in with Google /auth
 ▝▀ Plan: Gemini Code Assist Standard /upgrade
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Node Version Management
&lt;/h4&gt;

&lt;p&gt;Gemini CLI needs a consistent, up to date version of Node. The &lt;strong&gt;nvm&lt;/strong&gt; command can be used to get a standard Node environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/nvm-sh/nvm" rel="noopener noreferrer"&gt;GitHub - nvm-sh/nvm: Node Version Manager - POSIX-compliant bash script to manage multiple active node.js versions&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Python MCP Documentation
&lt;/h4&gt;

&lt;p&gt;The official GitHub Repo provides samples and documentation for getting started:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/modelcontextprotocol/python-sdk" rel="noopener noreferrer"&gt;GitHub - modelcontextprotocol/python-sdk: The official Python SDK for Model Context Protocol servers and clients&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent Development Kit
&lt;/h4&gt;

&lt;p&gt;The &lt;a href="https://www.google.com/search?q=Google+Agent+Development+Kit&amp;amp;rlz=1CAIWTJ_enUS1114&amp;amp;oq=what+is+the+adk+google&amp;amp;gs_lcrp=EgZjaHJvbWUyBggAEEUYOTIICAEQABgWGB4yCAgCEAAYFhgeMggIAxAAGBYYHjIICAQQABgWGB4yCAgFEAAYFhgeMggIBhAAGBYYHjIKCAcQABgKGBYYHjINCAgQABiGAxiABBiKBTIKCAkQABiABBiiBNIBCDMxODlqMGo3qAIAsAIA&amp;amp;sourceid=chrome&amp;amp;ie=UTF-8&amp;amp;mstk=AUtExfB5Oo7ZHHcDEHu7aqZiPBA2l1c-QGh5dB7xkkDPIiYcn8O1Imt2IHNR7bzA6JnyDCSDCUGpGWTeBW14namlN_QqzJLLI5-px1BE9jfSxwli6njPDPERjm5pRqNP3uC6HhUKiRcTJ1T8x5LHQrCkVxylw7QWg0N8B4dQDIcWpnVX9Gc&amp;amp;csui=3&amp;amp;ved=2ahUKEwjYu-G8p-uSAxXrv4kEHUbpLo0QgK4QegQIARAB" rel="noopener noreferrer"&gt;Google Agent Development Kit&lt;/a&gt; (ADK) is an open-source, Python-based framework designed to streamline the creation, deployment, and orchestration of sophisticated, multi-agent AI systems. It treats agent development like software engineering, offering modularity, state management, and built-in tools (like Google Search) to build autonomous agents.&lt;/p&gt;

&lt;p&gt;The ADK can be installed from here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://google.github.io/adk-docs/" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Agent Skills
&lt;/h4&gt;

&lt;p&gt;Gemini CLI can be customized to work with ADK agents. Both an Agent Development MCP server, and specific Agent skills are available.&lt;/p&gt;

&lt;p&gt;More details are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://adk.dev/tutorials/coding-with-ai/" rel="noopener noreferrer"&gt;Agent Development Kit (ADK)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The ADK documentation is available as a MCP server:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;&amp;gt;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/mcp list
&lt;span class="go"&gt;Configured MCP servers:
🟢 adk-docs-mcp (from adk-docs-ext) - Ready (2 tools)
  Tools:
  - mcp_adk-docs-mcp_fetch_docs
  - mcp_adk-docs-mcp_list_doc_sources
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Where do I start?
&lt;/h4&gt;

&lt;p&gt;The strategy for starting multi agent development is a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;First, the basic development environment is setup with the required system variables, and a working Gemini CLI configuration.&lt;/p&gt;

&lt;p&gt;Then, and ADK agent is built, debugged, and tested locally.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setup the Basic Environment
&lt;/h4&gt;

&lt;p&gt;At this point you should have a working Python environment and a working Gemini CLI installation. The next step is to clone the GitHub samples repository with support scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd&lt;/span&gt; ~
git clone https://github.com/xbill9/gemini-cli-azure
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run &lt;strong&gt;init.sh&lt;/strong&gt; from the cloned directory.&lt;/p&gt;

&lt;p&gt;The script will attempt to determine your shell environment and set the correct variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;adk-litellm
&lt;span class="nb"&gt;source &lt;/span&gt;init.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If your session times out or you need to re-authenticate- you can run the &lt;strong&gt;set_env.sh&lt;/strong&gt; script to reset your environment variables:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;source &lt;/span&gt;set_env.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Variables like PROJECT_ID need to be setup for use in the various build scripts- so the set_env script can be used to reset the environment if you time-out.&lt;/p&gt;

&lt;h4&gt;
  
  
  Azure Setup
&lt;/h4&gt;

&lt;p&gt;The Azure CLI provides the basic tools for working with the cloud based services:&lt;/p&gt;

&lt;p&gt;Check the installation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-azure$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;az &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;azure-cli 2.86.0

core 2.86.0
telemetry 1.1.0

Dependencies:
msal 1.35.1
azure-mgmt-resource 24.0.0

Python location '/opt/az/bin/python3'
Config directory '/home/xbill/.azure'
Extensions directory '/home/xbill/.azure/cliextensions'

Python (Linux) 3.13.13 (main, May 1 2026, 00:24:43) [GCC 12.2.0]

Legal docs and information: aka.ms/AzureCliLegal

Your CLI is up-to-date.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once the tools are installed — login to the console:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;az login
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then run the Azure setup scripts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./setup-azure-phi.sh

./get-azure-phi-creds.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Azure console should look something like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvk2mgwecn6bqk4ixu5d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frvk2mgwecn6bqk4ixu5d.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Running the Python Code
&lt;/h4&gt;

&lt;p&gt;First- switch the directory with the Python version of the MCP sample code. Run the release version on the local system:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;adk-litellm&lt;span class="nv"&gt;$ &lt;/span&gt;make &lt;span class="nb"&gt;install
&lt;/span&gt;Processing ./.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Verify The ADK Installation
&lt;/h4&gt;

&lt;p&gt;First- verify the ADK is installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-aws/adk-litellm$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;span class="go"&gt;adk, version 1.32.0
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To check the ADK setup, run the ADK CLI locally with the hello Agent. This is a basic Hello World style agent. No external tools or MCP calls are used in the Agent code- which allows the ADK installation to be checked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-azure/adk-litellm-azure$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk run agents/azure
&lt;span class="go"&gt;Log setup complete: /tmp/agents_log/agent.20260505_133501.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/xbill/.local/lib/python3.13/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.local/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent azure_agent, type exit to exit.

13:35:04 - LiteLLM:INFO: utils.py:4011 - 
&lt;/span&gt;&lt;span class="gp"&gt;LiteLLM completion() model= phi-4-mini;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;provider &lt;span class="o"&gt;=&lt;/span&gt; azure_ai
&lt;span class="go"&gt;
Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm._turn_on_debug()'.

13:35:14 - LiteLLM:INFO: utils.py:4011 - 
&lt;/span&gt;&lt;span class="gp"&gt;LiteLLM completion() model= phi-4-mini;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;provider &lt;span class="o"&gt;=&lt;/span&gt; azure_ai
&lt;span class="go"&gt;[azure_agent]: Hello! How can I assist you today?
[user]: what are your model details
13:35:23 - LiteLLM:INFO: utils.py:4011 - 
&lt;/span&gt;&lt;span class="gp"&gt;LiteLLM completion() model= phi-4-mini;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;provider &lt;span class="o"&gt;=&lt;/span&gt; azure_ai
&lt;span class="go"&gt;[azure_agent]: Hello! I am Phi, developed by Microsoft, and I'm here to help you with information, answer questions, and assist with a wide range of topics. How may I be of service to you?


&lt;h4&gt;
  
  
  Test The ADK Web Interface
&lt;/h4&gt;

&lt;p&gt;This step tests the ADK agent interactions with a browser:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-azure/adk-litellm-azure/agents$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk web &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0
&lt;span class="go"&gt;2026-05-05 13:39:05,375 - INFO - service_factory.py:266 - Using in-memory memory service
2026-05-05 13:39:05,375 - INFO - local_storage.py:84 - Using per-agent session storage rooted at /home/xbill/gemini-cli-azure/adk-litellm-azure/agents
2026-05-05 13:39:05,376 - INFO - local_storage.py:110 - Using file artifact service at /home/xbill/gemini-cli-azure/adk-litellm-azure/agents/.adk/artifacts
/home/xbill/.local/lib/python3.13/site-packages/google/adk/cli/fast_api.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.local/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
INFO: Started server process [15609]
INFO: Waiting for application startup.

+-----------------------------------------------------------------------------+
| ADK Web Server started |
| |
| For local testing, access at http://0.0.0.0:8000. |
+-----------------------------------------------------------------------------+
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then use the web interface — either on the local interface &lt;strong&gt;127.0.0.1&lt;/strong&gt; or the catch-all web interface &lt;strong&gt;0.0.0.0&lt;/strong&gt; -depending on your environment:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpk0p1on779qpn7wr5voa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpk0p1on779qpn7wr5voa.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Special note for Google Cloud Shell Deployments- add a CORS &lt;strong&gt;allow_origins&lt;/strong&gt; configuration exemption to allow the ADK agent to run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adk web &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--allow_origins&lt;/span&gt; &lt;span class="s1"&gt;'regex:.*'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Phi Models on Azure
&lt;/h4&gt;

&lt;p&gt;Microsoft Phi models are a family of highly capable, cost-effective Small Language Models (SLMs) designed for low-latency generative AI, available on Azure AI Foundry. They support text, image, and audio inputs (Phi-4-multimodal) and are ideal for memory-constrained environments. Key models include Phi-4, Phi-3.5-MoE, and Phi-3-mini/medium, optimized for high-performance reasoning tasks.&lt;/p&gt;

&lt;p&gt;More details are here:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79z2ygkn9mbifwnrgahl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F79z2ygkn9mbifwnrgahl.png" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Deep Dive with a Phi Model
&lt;/h4&gt;

&lt;p&gt;Once the connection has been tested. You can drill into the model:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="gp"&gt;xbill@penguin:~/gemini-cli-azure/adk-litellm-azure$&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;adk run agents/azure
&lt;span class="go"&gt;Log setup complete: /tmp/agents_log/agent.20260505_134754.log
To access latest log: tail -F /tmp/agents_log/agent.latest.log
/home/xbill/.local/lib/python3.13/site-packages/google/adk/cli/cli.py:204: UserWarning: [EXPERIMENTAL] InMemoryCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  credential_service = InMemoryCredentialService()
/home/xbill/.local/lib/python3.13/site-packages/google/adk/auth/credential_service/in_memory_credential_service.py:33: UserWarning: [EXPERIMENTAL] BaseCredentialService: This feature is experimental and may change or be removed in future versions without notice. It may introduce breaking changes at any time.
  super(). __init__ ()
Running agent azure_agent, type exit to exit.
[user]: what model are you
13:48:00 - LiteLLM:INFO: utils.py:4011 - 
&lt;/span&gt;&lt;span class="gp"&gt;LiteLLM completion() model= phi-4-mini;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;provider &lt;span class="o"&gt;=&lt;/span&gt; azure_ai
&lt;span class="go"&gt;[azure_agent]: I am Phi, an AI developed by Microsoft, designed to help you with a wide range of queries and tasks. How can I assist you today?



&lt;p&gt;and deep model details:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;10:29:49 - LiteLLM:INFO: utils.py:4011 - 
&lt;/span&gt;&lt;span class="gp"&gt;LiteLLM completion() model= phi-4-mini;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;provider &lt;span class="o"&gt;=&lt;/span&gt; azure_ai
&lt;span class="go"&gt;[azure_agent]: My model knowledge was last updated in October 2023. Please note that I can't access or retrieve real-time data, so my information may not include events or developments that occurred after this date. For the most current and accurate information, I recommend referring to the latest sources.


&lt;h4&gt;
  
  
  One Last Code Review with Gemini CLI
&lt;/h4&gt;

&lt;p&gt;Gemini CLI was used to summarize the project:&lt;br&gt;
&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; code review adk agent for phi model                                                                                                            

✦ I've reviewed the Azure agent implementation in agents/azure/agent.py. Overall, the implementation correctly utilizes the ADK (Agent Development
  Kit) and integrates with Azure AI Foundry via LiteLLM.

  🔍 Code Review Findings

  1. ADK Best Practices
   * Missing description: While the name is correctly defined, the description parameter is missing. In ADK, the description is critical for
     multi-agent systems, as it helps routers and other agents understand when to delegate tasks to this specific agent.
   * Minimal instruction: The current instruction is very brief. ADK best practices suggest using Markdown for clarity and being more specific
     about the agent's persona and constraints.
   * Short request_timeout: The timeout is set to 10 seconds. For LLM inferences, especially via serverless endpoints which may have occasional
     cold starts or latency, 30 seconds is generally more reliable.

  2. Azure AI Foundry Integration
   * Model String: The model string azure_ai/phi-4-mini is correctly formatted for LiteLLM to recognize the Azure AI Foundry provider.
   * Environment Variables: The code correctly leverages AZURE_MODEL with a fallback. It assumes AZURE_AI_API_KEY and AZURE_AI_API_BASE are set in
     the environment, which matches the setup-azure-phi.sh automation script.

  3. Code Quality
   * Imports: Clean and follows standard ADK patterns.
   * Encapsulation: Using root_agent as the export in agent.py (and re-exporting in __init__.py) is the standard pattern for ADK's discovery
     mechanism.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Summary
&lt;/h4&gt;

&lt;p&gt;The strategy for using the ADK with Azure LLMs was validated with a incremental step by step approach.&lt;/p&gt;

&lt;p&gt;First- the basic environment tools were configured. Then the required Python libraries were installed. LiteLLM was used to proxy the Google ADK to a Model hosted on Azure. Finally, some tests were run from the CLI and the Web interface. Gemini CLI was used to validate the approach.&lt;/p&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>microsoftphi4</category>
      <category>agents</category>
      <category>googleadk</category>
      <category>gemini</category>
    </item>
    <item>
      <title>Stop Using Python for Your Gen AI Apps, Use Go and Genkit Instead</title>
      <dc:creator>Xavier Portilla Edo</dc:creator>
      <pubDate>Wed, 06 May 2026 03:54:40 +0000</pubDate>
      <link>https://dev.to/gde/stop-using-python-for-your-gen-ai-apps-use-go-and-genkit-instead-401p</link>
      <guid>https://dev.to/gde/stop-using-python-for-your-gen-ai-apps-use-go-and-genkit-instead-401p</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;For the last few years, every Gen AI tutorial, framework, and "hello world" has assumed one thing: that you are writing Python. It made sense at the start. The research community lives in Python, the model providers ship Python SDKs first, and the notebook culture is hard to beat for prototyping. But there is a quiet, important shift happening in 2026: the teams actually shipping AI features at scale are increasingly moving their &lt;strong&gt;production&lt;/strong&gt; Gen AI workloads off Python, and onto languages built for services.&lt;/p&gt;

&lt;p&gt;Go is at the center of that shift. And &lt;strong&gt;&lt;a href="https://genkit.dev/docs/go/get-started/" rel="noopener noreferrer"&gt;Genkit Go&lt;/a&gt;&lt;/strong&gt;, the Go flavor of Google's open-source Gen AI framework, is the cleanest path I have seen to build production-ready AI services in Go: typed flows, structured output, built-in HTTP serving, observability, and a Developer UI, all from a single binary.&lt;/p&gt;

&lt;p&gt;This article is two things at once. First, an honest argument about why Python is a poor fit for production Gen AI services. Second, a hands-on getting-started with Genkit Go so you can replace that Python microservice this week.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Python Hurts in Production Gen AI
&lt;/h2&gt;

&lt;p&gt;Python is great for research and prototyping. But Gen AI applications are not really "AI code", they are mostly &lt;strong&gt;I/O-heavy network services&lt;/strong&gt; that happen to call a model. And that is exactly where Python struggles.&lt;/p&gt;

&lt;h3&gt;
  
  
  Concurrency is a constant fight
&lt;/h3&gt;

&lt;p&gt;Gen AI workloads are dominated by long, concurrent network calls: streaming completions, tool calls, embedding requests, vector DB lookups, MCP servers. Go's goroutines and channels were literally designed for this. In Python you have a choice between three uncomfortable options: threads (limited by the GIL), &lt;code&gt;asyncio&lt;/code&gt; (which infects your entire codebase and breaks the moment one library is sync), or multiprocessing (heavy, awkward, and unfriendly to shared state). None of them feel native. All of them leak through your abstractions.&lt;/p&gt;

&lt;h3&gt;
  
  
  Cold starts and memory footprint
&lt;/h3&gt;

&lt;p&gt;A Python AI service typically pulls in &lt;code&gt;pydantic&lt;/code&gt;, &lt;code&gt;httpx&lt;/code&gt;, an SDK or two, and a tokenizer. You are easily looking at 200, 400 MB of resident memory and several seconds of cold start before you serve a single request. A Go binary doing the same job is one statically linked file, tens of MB of RAM, and starts in milliseconds. On Cloud Run, Lambda, Azure Functions, or any autoscaling platform, this difference is not a micro-optimization, it is the difference between a service that scales to zero gracefully and one that does not.&lt;/p&gt;

&lt;h3&gt;
  
  
  Dependency hell is worse for AI
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;pip&lt;/code&gt;, &lt;code&gt;poetry&lt;/code&gt;, &lt;code&gt;uv&lt;/code&gt;, &lt;code&gt;conda&lt;/code&gt;, &lt;code&gt;venv&lt;/code&gt;, &lt;code&gt;requirements.txt&lt;/code&gt;, &lt;code&gt;pyproject.toml&lt;/code&gt;. Pin a Torch version, break a transitive dep. Upgrade an SDK, break Pydantic v1 vs v2. Every Python AI repo I have inherited has spent at least a day fixing the environment before running a single prompt. Go's module system, with a single &lt;code&gt;go.mod&lt;/code&gt; and &lt;code&gt;go.sum&lt;/code&gt;, is boring, reproducible, and just works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Types are optional, and it shows
&lt;/h3&gt;

&lt;p&gt;Structured output, tool calling, and MCP all rely on &lt;strong&gt;schemas&lt;/strong&gt;. In Python, the schema lives in Pydantic models, in docstrings, in comments, and sometimes in your head. In Go, the schema &lt;strong&gt;is&lt;/strong&gt; the struct. The compiler enforces it. Genkit picks it up automatically via JSON schema tags. You cannot ship a flow whose input does not match what the model returns, because it will not compile.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment is a packaging exercise
&lt;/h3&gt;

&lt;p&gt;Python deployments are Dockerfiles full of system packages, base images that drift, and "works on my machine" surprises. Go deploys as a single static binary. &lt;code&gt;FROM scratch&lt;/code&gt;, copy the binary, done. For AI services that need to run on Cloud Run, on Kubernetes, on the edge, or as a sidecar, that is a massive operational win.&lt;/p&gt;

&lt;h3&gt;
  
  
  The performance ceiling is real
&lt;/h3&gt;

&lt;p&gt;Yes, the heavy lifting happens on the model provider's GPUs. But your service still has to parse tokens off a streaming response, fan out tool calls, merge results, enforce timeouts, and push telemetry, &lt;strong&gt;per request, at concurrency&lt;/strong&gt;. Go does that work an order of magnitude more efficiently than CPython, and without you having to think about it.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;None of this means Python is wrong for &lt;strong&gt;research&lt;/strong&gt;. It means Python is the wrong default for the &lt;strong&gt;service&lt;/strong&gt; that exposes that research to your users.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Go Is the Best Language for Agentic Coders
&lt;/h2&gt;

&lt;p&gt;There is one more reason to pick Go in 2026 that did not really exist two years ago: &lt;strong&gt;agentic coders&lt;/strong&gt;. Tools like Claude Code, Cursor's agent mode, GitHub Copilot's agent, Gemini Code Assist, Codex, Aider, and the growing ecosystem of autonomous coding agents are now a real part of how software gets written. And it turns out that &lt;strong&gt;Go is the language they thrive in&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Why? It comes down to three properties of the language that align almost perfectly with how an LLM-based agent reasons about code:&lt;/p&gt;

&lt;h3&gt;
  
  
  Strong, static typing closes the feedback loop
&lt;/h3&gt;

&lt;p&gt;Agentic coders work in a tight loop: write code, compile, read the error, fix, repeat. Go's compiler is fast, strict, and brutally honest. When an agent generates a wrong call, the compiler tells it exactly what is wrong and where, in seconds. In Python, the same mistake might only surface at runtime, three layers deep, with a stack trace that requires the agent to spend tokens reasoning about dynamic behavior. Strong typing turns "guess and pray" into "verify and continue".&lt;/p&gt;

&lt;h3&gt;
  
  
  There is usually one obvious way to do something
&lt;/h3&gt;

&lt;p&gt;Python has at least four HTTP clients, three async paradigms, two type systems, and an opinion war about every major design decision. An agent has to choose, and choices cost tokens and increase the chance of going off the rails. Go is famously opinionated: one formatter (&lt;code&gt;gofmt&lt;/code&gt;), one module system, one idiomatic way to handle errors, one standard layout. Less surface area means less ambiguity, which means &lt;strong&gt;less token consumption and more correct code per iteration&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Tooling is built for machines, not just humans
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;go build&lt;/code&gt;, &lt;code&gt;go test&lt;/code&gt;, &lt;code&gt;go vet&lt;/code&gt;, &lt;code&gt;gopls&lt;/code&gt;, and &lt;code&gt;staticcheck&lt;/code&gt; produce structured, parseable output. Agents can read it directly without heuristics. Combine that with &lt;code&gt;go doc&lt;/code&gt; and the standard library being uniformly documented, and you give an agent a self-describing environment it can navigate without hallucinating.&lt;/p&gt;

&lt;h3&gt;
  
  
  And then Genkit Go takes it one level further
&lt;/h3&gt;

&lt;p&gt;Genkit Go leans into the same properties:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Flow inputs and outputs are &lt;strong&gt;Go structs&lt;/strong&gt;, the schema is the type. An agent generating a new flow knows exactly what shape the data has, because the compiler will reject anything else.&lt;/li&gt;
&lt;li&gt;The API surface is small and consistent: &lt;code&gt;genkit.Init&lt;/code&gt;, &lt;code&gt;genkit.DefineFlow&lt;/code&gt;, &lt;code&gt;genkit.DefineTool&lt;/code&gt;, &lt;code&gt;genkit.GenerateData&lt;/code&gt;, &lt;code&gt;genkit.Handler&lt;/code&gt;. There is one obvious way to define a flow, one obvious way to expose it, one obvious way to call a model.&lt;/li&gt;
&lt;li&gt;Tool definitions are typed end-to-end, so an agent writing a new tool gets compile-time guarantees that its signature matches what the runtime expects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The net effect is that an agentic coder pointed at a Genkit Go codebase will produce &lt;strong&gt;more correct code, in fewer iterations, with fewer tokens&lt;/strong&gt; than the same agent pointed at an equivalent Python codebase. In a world where you are increasingly going to be the reviewer of agent-generated code rather than the author, that compounds fast.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Genkit Go Specifically
&lt;/h2&gt;

&lt;p&gt;If you accept the premise that Go is the better runtime for Gen AI services, the next question is: which framework? You can absolutely call the Gemini, OpenAI, or Anthropic SDKs directly from Go. But you will quickly end up rebuilding the same primitives every Genkit user already has for free.&lt;/p&gt;

&lt;p&gt;Here is what Genkit Go gives you out of the box, and what you would otherwise have to write yourself:&lt;/p&gt;

&lt;p&gt;| Feature | Without Genkit | With Genkit Go |&lt;br&gt;
||||&lt;br&gt;
| Call a model | Hand-rolled HTTP client per provider, manual JSON, manual streaming | &lt;code&gt;genkit.Generate(...)&lt;/code&gt;, one call, multi-provider |&lt;br&gt;
| Structured output | Parse raw JSON, custom unmarshaling, validate by hand | &lt;code&gt;genkit.GenerateData[MyStruct]&lt;/code&gt;, typed Go struct returned |&lt;br&gt;
| Expose as an API | &lt;code&gt;net/http&lt;/code&gt; boilerplate per endpoint, request/response wiring | &lt;code&gt;genkit.Handler(flow)&lt;/code&gt;, auto HTTP endpoint |&lt;br&gt;
| Tool calling | Parse function call payloads, dispatch, re-submit | &lt;code&gt;genkit.DefineTool(...)&lt;/code&gt;, automatic execution loop |&lt;br&gt;
| Observability | Wire OpenTelemetry, define spans, ship metrics | Built-in tracing, metrics, latency, zero config |&lt;br&gt;
| Local dev | &lt;code&gt;curl&lt;/code&gt;, Postman, manual harnesses | &lt;strong&gt;Genkit Developer UI&lt;/strong&gt;, visual flow runner, traces, prompt playground |&lt;br&gt;
| Multi-provider | Different SDKs, different auth, different schemas | Unified plugin interface (Google AI, Vertex, OpenAI, Anthropic, Bedrock, Azure, Ollama, ...) |&lt;/p&gt;

&lt;p&gt;It is the same philosophy as &lt;a href="https://dev.to%20post_url%20genkit/2026-02-10-genkit-java-101%20"&gt;Genkit Java&lt;/a&gt; and the JavaScript flavor I covered in &lt;a href="https://dev.to%20post_url%20genkit/2026-04-16-top-jsts-genai-frameworks-2026%20"&gt;my 2026 JS/TS Gen AI frameworks comparison&lt;/a&gt;: a thin, opinionated, cloud-agnostic layer that turns "AI logic" into a &lt;strong&gt;typed function&lt;/strong&gt; you can call, test, deploy, and observe.&lt;/p&gt;
&lt;h2&gt;
  
  
  What We Are Going to Build
&lt;/h2&gt;

&lt;p&gt;A Go service exposing a single AI flow that generates a structured &lt;strong&gt;recipe&lt;/strong&gt; from a main ingredient and optional dietary restrictions. It will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Accept a typed &lt;code&gt;RecipeInput&lt;/code&gt; as input.&lt;/li&gt;
&lt;li&gt;Call &lt;strong&gt;Gemini 3 Pro&lt;/strong&gt; via the Google AI plugin.&lt;/li&gt;
&lt;li&gt;Return a strongly-typed &lt;code&gt;Recipe&lt;/code&gt; struct (no manual JSON parsing).&lt;/li&gt;
&lt;li&gt;Be served as an HTTP endpoint on &lt;code&gt;:3400&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Be testable visually in the &lt;strong&gt;Genkit Developer UI&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All in &lt;strong&gt;a single &lt;code&gt;main.go&lt;/code&gt; file&lt;/strong&gt;. No web framework. No code generation. Just Go.&lt;/p&gt;
&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Go 1.24+&lt;/strong&gt; (&lt;a href="https://go.dev/doc/install" rel="noopener noreferrer"&gt;install&lt;/a&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Node.js 18+&lt;/strong&gt; (only required for the Genkit CLI / Developer UI)&lt;/li&gt;
&lt;li&gt;A &lt;strong&gt;Google GenAI API key&lt;/strong&gt; (free, no credit card, from &lt;a href="https://aistudio.google.com/apikey" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;)&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  Install the Genkit CLI
&lt;/h3&gt;

&lt;p&gt;The Genkit CLI is your local companion for running and inspecting flows in the Developer UI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-sL&lt;/span&gt; cli.genkit.dev | bash
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Verify it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;genkit &lt;span class="nt"&gt;--version&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Set Up the Project
&lt;/h2&gt;

&lt;p&gt;Create a fresh module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;genkit-go-recipes &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;genkit-go-recipes
go mod init example/genkit-go-recipes
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Install the Genkit Go package:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go get github.com/firebase/genkit/go
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set your API key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;export &lt;/span&gt;&lt;span class="nv"&gt;GEMINI_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&amp;lt;your API key&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  The Code: a Single &lt;code&gt;main.go&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;Create &lt;code&gt;main.go&lt;/code&gt; with the following content. This is the entire service.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight go"&gt;&lt;code&gt;&lt;span class="k"&gt;package&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;

&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="s"&gt;"context"&lt;/span&gt;
    &lt;span class="s"&gt;"encoding/json"&lt;/span&gt;
    &lt;span class="s"&gt;"fmt"&lt;/span&gt;
    &lt;span class="s"&gt;"log"&lt;/span&gt;
    &lt;span class="s"&gt;"net/http"&lt;/span&gt;

    &lt;span class="s"&gt;"github.com/firebase/genkit/go/ai"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/firebase/genkit/go/genkit"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/firebase/genkit/go/plugins/googlegenai"&lt;/span&gt;
    &lt;span class="s"&gt;"github.com/firebase/genkit/go/plugins/server"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c"&gt;// Input schema, picked up automatically by Genkit and the Dev UI.&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;RecipeInput&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Ingredient&lt;/span&gt;          &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"ingredient" jsonschema:"description=Main ingredient or cuisine type"`&lt;/span&gt;
    &lt;span class="n"&gt;DietaryRestrictions&lt;/span&gt; &lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"dietaryRestrictions,omitempty" jsonschema:"description=Any dietary restrictions"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;// Output schema, returned directly by the model as a typed Go struct.&lt;/span&gt;
&lt;span class="k"&gt;type&lt;/span&gt; &lt;span class="n"&gt;Recipe&lt;/span&gt; &lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;Title&lt;/span&gt;        &lt;span class="kt"&gt;string&lt;/span&gt;   &lt;span class="s"&gt;`json:"title"`&lt;/span&gt;
    &lt;span class="n"&gt;Description&lt;/span&gt;  &lt;span class="kt"&gt;string&lt;/span&gt;   &lt;span class="s"&gt;`json:"description"`&lt;/span&gt;
    &lt;span class="n"&gt;PrepTime&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;   &lt;span class="s"&gt;`json:"prepTime"`&lt;/span&gt;
    &lt;span class="n"&gt;CookTime&lt;/span&gt;     &lt;span class="kt"&gt;string&lt;/span&gt;   &lt;span class="s"&gt;`json:"cookTime"`&lt;/span&gt;
    &lt;span class="n"&gt;Servings&lt;/span&gt;     &lt;span class="kt"&gt;int&lt;/span&gt;      &lt;span class="s"&gt;`json:"servings"`&lt;/span&gt;
    &lt;span class="n"&gt;Ingredients&lt;/span&gt;  &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"ingredients"`&lt;/span&gt;
    &lt;span class="n"&gt;Instructions&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"instructions"`&lt;/span&gt;
    &lt;span class="n"&gt;Tips&lt;/span&gt;         &lt;span class="p"&gt;[]&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt; &lt;span class="s"&gt;`json:"tips,omitempty"`&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;func&lt;/span&gt; &lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Background&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="c"&gt;// Initialize Genkit with the Google AI plugin and a default model.&lt;/span&gt;
    &lt;span class="n"&gt;g&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Init&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithPlugins&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;googlegenai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GoogleAI&lt;/span&gt;&lt;span class="p"&gt;{}),&lt;/span&gt;
        &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithDefaultModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"googleai/gemini-3-pro"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// Define a typed flow: (RecipeInput) -&amp;gt; (Recipe, error)&lt;/span&gt;
    &lt;span class="n"&gt;recipeGeneratorFlow&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DefineFlow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"recipeGeneratorFlow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="k"&gt;func&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;RecipeInput&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;Recipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;error&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;dietary&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DietaryRestrictions&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;dietary&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;dietary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"none"&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;

            &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Sprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;`Create a recipe with the following requirements:
                Main ingredient: %s
                Dietary restrictions: %s`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;input&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Ingredient&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;dietary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="c"&gt;// Structured generation: Gemini returns a Recipe directly.&lt;/span&gt;
            &lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;GenerateData&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Recipe&lt;/span&gt;&lt;span class="p"&gt;](&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;ai&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;WithPrompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Errorf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"failed to generate recipe: %w"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c"&gt;// Smoke-test the flow once at boot.&lt;/span&gt;
    &lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;recipeGeneratorFlow&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;RecipeInput&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;Ingredient&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt;          &lt;span class="s"&gt;"avocado"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;DietaryRestrictions&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"vegetarian"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="no"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatalf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"could not generate recipe: %v"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;err&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MarshalIndent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipe&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"  "&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Sample recipe generated:"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;fmt&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="c"&gt;// Expose the flow as an HTTP endpoint.&lt;/span&gt;
    &lt;span class="n"&gt;mux&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;http&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NewServeMux&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;mux&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HandleFunc&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"POST /recipeGeneratorFlow"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;genkit&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Handler&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;recipeGeneratorFlow&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Starting server on http://localhost:3400"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Println&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Flow available at: POST http://localhost:3400/recipeGeneratorFlow"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;log&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Fatal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;server&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Start&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ctx&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"127.0.0.1:3400"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mux&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Take a moment to count what is &lt;strong&gt;not&lt;/strong&gt; in this file:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No web framework.&lt;/li&gt;
&lt;li&gt;No JSON parsing of the model output.&lt;/li&gt;
&lt;li&gt;No manual OpenTelemetry setup.&lt;/li&gt;
&lt;li&gt;No request/response DTO duplication.&lt;/li&gt;
&lt;li&gt;No Dockerfile yet (we will not need much).&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The struct &lt;strong&gt;is&lt;/strong&gt; the contract. The flow &lt;strong&gt;is&lt;/strong&gt; the endpoint. The compiler enforces both.&lt;/p&gt;

&lt;h2&gt;
  
  
  Run It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;go run &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see a structured recipe printed as JSON, then the server logging that it is listening on &lt;code&gt;:3400&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;In another terminal, hit it with &lt;code&gt;curl&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s2"&gt;"http://localhost:3400/recipeGeneratorFlow"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="s1"&gt;'{"data": {"ingredient": "tomato", "dietaryRestrictions": "vegan"}}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will get back a fully structured JSON recipe. That is it, you have a production-shaped Gen AI microservice in one file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Test It Visually with the Developer UI
&lt;/h2&gt;

&lt;p&gt;The Genkit Developer UI is one of the strongest reasons to adopt Genkit, regardless of language. It gives you a local web app to run flows, inspect traces, tweak prompts, and debug tool calls.&lt;/p&gt;

&lt;p&gt;From the project root:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;genkit start &lt;span class="nt"&gt;--&lt;/span&gt; go run &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Open &lt;a href="http://localhost:4000" rel="noopener noreferrer"&gt;http://localhost:4000&lt;/a&gt;, pick &lt;code&gt;recipeGeneratorFlow&lt;/code&gt;, paste:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"ingredient"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"avocado"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"dietaryRestrictions"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"vegetarian"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Click &lt;strong&gt;Run&lt;/strong&gt;. You will see the typed output and a full trace of the model call: tokens, latency, prompt, response. This is the kind of inner loop Python frameworks are still catching up on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Deploying It
&lt;/h2&gt;

&lt;p&gt;Because it is Go, deployment is almost anticlimactic. A minimal &lt;code&gt;Dockerfile&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;golang:1.24&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;AS&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s"&gt;build&lt;/span&gt;
&lt;span class="k"&gt;WORKDIR&lt;/span&gt;&lt;span class="s"&gt; /src&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; . .&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;&lt;span class="nv"&gt;CGO_ENABLED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0 go build &lt;span class="nt"&gt;-o&lt;/span&gt; /out/server .

&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; gcr.io/distroless/static&lt;/span&gt;
&lt;span class="k"&gt;COPY&lt;/span&gt;&lt;span class="s"&gt; --from=build /out/server /server&lt;/span&gt;
&lt;span class="k"&gt;ENV&lt;/span&gt;&lt;span class="s"&gt; PORT=3400&lt;/span&gt;
&lt;span class="k"&gt;EXPOSE&lt;/span&gt;&lt;span class="s"&gt; 3400&lt;/span&gt;
&lt;span class="k"&gt;ENTRYPOINT&lt;/span&gt;&lt;span class="s"&gt; ["/server"]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is your entire production image. Deploy it to &lt;strong&gt;Cloud Run&lt;/strong&gt;, &lt;strong&gt;Cloud Run Jobs&lt;/strong&gt;, &lt;strong&gt;Kubernetes&lt;/strong&gt;, &lt;strong&gt;AWS Lambda&lt;/strong&gt; (via container image), &lt;strong&gt;Azure Container Apps&lt;/strong&gt;, or any platform that runs containers. No Python runtime to vendor. No &lt;code&gt;pip install&lt;/code&gt; at build time. No virtual environment. Just a binary.&lt;/p&gt;

&lt;p&gt;If you want to see the same pattern applied to other clouds and languages, I have already covered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to%20post_url%20genkit/2026-03-20-genkit-aws-lambda-bedrock%20"&gt;Genkit + AWS Lambda + Bedrock&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to%20post_url%20genkit/2026-03-20-genkit-azure-function-ai-foundry%20"&gt;Genkit + Azure Functions + AI Foundry&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to%20post_url%20genkit/2026-02-10-genkit-java-101%20"&gt;Genkit Java 101&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Genkit Go fits the same mold, with the smallest runtime footprint of all of them.&lt;/p&gt;

&lt;h2&gt;
  
  
  "But What About..."
&lt;/h2&gt;

&lt;p&gt;A few honest objections worth addressing.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;"All the cool research libraries are in Python."&lt;/strong&gt; True. Keep them in Python, behind a small Python service that does only the research-y bit. Put your &lt;strong&gt;product surface&lt;/strong&gt; (the part your users actually call) in Go. That separation is healthy.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"My team only knows Python."&lt;/strong&gt; Go is famously the easiest "real" backend language to learn. A Python developer can be productive in Go in days, and Genkit's API surface is small enough that the learning curve is mostly Go itself, not the framework.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"What about LangChain / LlamaIndex features?"&lt;/strong&gt; Most of what those frameworks give you (flows, tools, RAG, prompts, evaluation, observability) Genkit Go gives you too, with a fraction of the surface area and without the abstraction tax. See my &lt;a href="https://dev.to%20post_url%20genkit/2026-04-16-top-jsts-genai-frameworks-2026%20"&gt;2026 frameworks comparison&lt;/a&gt; for the long version.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;"Is Genkit Go production-ready?"&lt;/strong&gt; It powers Gen AI features at Google and a growing list of companies. The Go SDK shares the same core philosophy and plugin model as the JS and Java SDKs. It is stable enough to bet on, and the iteration speed is high.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Wrapping Up
&lt;/h2&gt;

&lt;p&gt;Python earned its place as the language of AI &lt;strong&gt;research&lt;/strong&gt;. It did not earn its place as the language of AI &lt;strong&gt;services&lt;/strong&gt;. Those are different problems with different constraints, and the constraints of production services, concurrency, footprint, deployment, types, observability, all favor Go.&lt;/p&gt;

&lt;p&gt;Genkit Go is the framework that finally makes that switch painless. You get a typed, observable, multi-provider Gen AI service in one file, one binary, and one deploy. If you are still maintaining a Python microservice whose only job is to call an LLM and return structured JSON, you are paying a tax you do not need to pay.&lt;/p&gt;

&lt;p&gt;Try it on your next flow. Replace one Python service. See how much smaller the resulting system is, in code, in memory, and in operational surface area.&lt;/p&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://genkit.dev/docs/go/get-started/" rel="noopener noreferrer"&gt;Genkit Go, Get Started&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://genkit.dev/docs/go/flows/" rel="noopener noreferrer"&gt;Genkit Go, Flows&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://genkit.dev/docs/go/tool-calling/" rel="noopener noreferrer"&gt;Genkit Go, Tool Calling&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://genkit.dev/docs/go/deployment/cloud-run/" rel="noopener noreferrer"&gt;Genkit Go, Deployment on Cloud Run&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/firebase/genkit" rel="noopener noreferrer"&gt;Genkit GitHub&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26d7w5adb0372b1y8pet.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F26d7w5adb0372b1y8pet.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>go</category>
    </item>
  </channel>
</rss>
