<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Evan Lin</title>
    <description>The latest articles on DEV Community by Evan Lin (@evanlin).</description>
    <link>https://dev.to/evanlin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F409957%2Fc150d4a7-cb20-469d-a230-bac27232c577.jpeg</url>
      <title>DEV Community: Evan Lin</title>
      <link>https://dev.to/evanlin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/evanlin"/>
    <language>en</language>
    <item>
      <title>[Gemini API] Gemini Batch API and Webhook API practical usage on restaurant survey</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Mon, 15 Jun 2026 04:09:16 +0000</pubDate>
      <link>https://dev.to/gde/gemini-api-hands-on-6im</link>
      <guid>https://dev.to/gde/gemini-api-hands-on-6im</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xmga58mup383o4go36l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2xmga58mup383o4go36l.png" alt="image-20260614175257527" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  A Powerful Tool for Asynchronous Processing: Gemini Batch API &amp;amp; Webhooks
&lt;/h1&gt;

&lt;p&gt;When developing LLM-based applications, we often need to handle a large number of data analysis tasks—for example, analyzing reviews from dozens of restaurants at once, classifying a large volume of articles, or batch generating translations. If we use traditional synchronous APIs (real-time calls), we would not only face severe &lt;strong&gt;Rate Limit&lt;/strong&gt; blockages but also fail due to network connection timeouts and extremely high computing costs.&lt;/p&gt;

&lt;p&gt;To overcome this limitation, Google has launched the &lt;strong&gt;Gemini Batch API&lt;/strong&gt; and &lt;strong&gt;Webhook API&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/batch-api?hl=zh-tw" rel="noopener noreferrer"&gt;Gemini Batch API&lt;/a&gt;&lt;/strong&gt;: Allows developers to package a large number of requests into a JSONL file and upload them all at once. Gemini performs asynchronous scheduled computations in the background, without consuming your daily real-time API quotas (Rate Limits), and its computing cost is usually half that of real-time APIs, making it a perfect choice for non-urgent big data processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/webhooks?hl=zh-tw" rel="noopener noreferrer"&gt;Webhook API&lt;/a&gt;&lt;/strong&gt;: Traditional Batch tasks require us to constantly write polling logic locally to check the status. With Webhooks, when Gemini completes a Batch computation, it actively sends an HTTP POST callback to your specified URL, instantly notifying you that the task is complete, making the system architecture more elegant and energy-efficient.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article will document how we integrated these two powerful APIs into our &lt;strong&gt;LINE Bot Restaurant Analysis Assistant&lt;/strong&gt; to achieve one-click deep review and signature dish big data analysis for specific restaurants on mobile devices.&lt;/p&gt;




&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.evanlin.com%2Fimages%2FLINE%25202026-06-14%252017.30.21.tiff" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fwww.evanlin.com%2Fimages%2FLINE%25202026-06-14%252017.30.21.tiff" alt="LINE 2026-06-14 17.30.21" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  System Design and Optimized Architecture
&lt;/h1&gt;

&lt;p&gt;Originally, the restaurant analysis function worked by having the Bot list nearby restaurants when a user sent their location, and then providing a generic "Deep Review Analysis (Batch)" button. Clicking it would send all nearby restaurants for analysis at once. However, this led to a poor UX: analyzing all restaurants took too long, and users often only wanted to delve into &lt;strong&gt;one specific restaurant&lt;/strong&gt; they were interested in.&lt;/p&gt;

&lt;p&gt;Therefore, we optimized the function into &lt;strong&gt;dynamic Quick Reply buttons&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The user sends their location, and the Bot searches for nearby restaurants via Google Maps Grounding.&lt;/li&gt;
&lt;li&gt;After the client receives a plain text list of restaurants, the Bot automatically uses Gemini to extract the top 3 highest-rated restaurant names.&lt;/li&gt;
&lt;li&gt;Three customized Quick Reply buttons are generated (e.g., &lt;code&gt;🍴 Analyze Din Tai Fung&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;After the user clicks a specific restaurant button, the Bot immediately replies "Processing" to avoid LINE timeouts, and submits the Batch task for that single restaurant in the background. Once Gemini completes the computation, it proactively pushes a dedicated big data report.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  System Architecture Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;graph TD
    A[User Sends Location] --&amp;gt;|Location Message| B[Google Maps Grounding Search]
    B --&amp;gt;|Plain Text Restaurant List| C[Gemini-2.5-flash Extracts Top 3 Restaurants]
    C --&amp;gt;|Dynamically Generates Quick Reply| D[LINE Bot Replies with 3 Customized Analysis Buttons]
    D --&amp;gt;|User Clicks Specific Analysis| E[FastAPI Background Task]
    E --&amp;gt;|Immediate Reply ACK| F[LINE Chat Message]
    E --&amp;gt;|Package JSONL and Upload| G[Gemini Batch API Submission]
    G --&amp;gt;|Computation Complete Webhook/Polling Callback| H[Proactively Pushes Deep Report to User]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Core Implementation
&lt;/h1&gt;

&lt;h3&gt;
  
  
  1. Precisely Extracting Restaurant Names from Grounding Text using Gemini
&lt;/h3&gt;

&lt;p&gt;In &lt;a&gt;tools/maps_tool.py&lt;/a&gt;, the map search returns a plain text string rich in formatting and descriptions. We use Gemini-2.5-flash's structured output concept to precisely extract restaurant names in JSON format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# Extract top three restaurant names for Quick Reply
&lt;/span&gt;        &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;place_type&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restaurant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;extract_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please extract all restaurant names from the following text and return them in a JSON array format (e.g., [&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;Restaurant A&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;Restaurant B&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;]). Please output the JSON array directly, without any markdown tags (like ```
&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;endraw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;
json) or explanatory text.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="n"&gt;extract_res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;extract_prompt&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="n"&gt;extract_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;extract_res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;extract_res&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;
                    &lt;span class="n"&gt;array_match&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;\[(.*?)\]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extract_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;re&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DOTALL&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;array_match&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;
                        &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ast&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;literal_eval&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;array_match&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;group&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;]&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

                &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;names&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Extracted restaurant names for Quick Reply: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;names&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e_extract&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to extract restaurant names: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e_extract&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="n"&gt;raw&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h3&gt;
  
  
  2. Dynamically Generating LINE Quick Reply Buttons
&lt;/h3&gt;

&lt;p&gt;In &lt;a&gt;main.py&lt;/a&gt;, after obtaining the restaurant list, we dynamically generate &lt;code&gt;QuickReplyButton&lt;/code&gt;. We need to pay special attention to LINE API's length limit for button &lt;code&gt;label&lt;/code&gt;:&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
python
        quick_reply = None
        if place_type == "restaurant" and result.get("status") == "success":
            restaurant_names = result.get("restaurant_names", [])
            if restaurant_names:
                buttons = []
                for name in restaurant_names[:3]:
                    clean_label = name
                    # LINE label limit is 20 characters
                    if len(clean_label) &amp;gt; 10:
                        clean_label = clean_label[:9] + "…"
                    buttons.append(
                        QuickReplyButton(
                            action=PostbackAction(
                                label=f"🍴 分析 {clean_label}",
                                data=json.dumps({
                                    "action": "specific_foodie_deep_analysis",
                                    "restaurant_name": name
                                }),
                                display_text=f"🔍 進行「{name}」深度評論與招牌菜色分析"
                            )
                        )
                    )
                quick_reply = QuickReply(items=buttons)



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;




&lt;h1&gt;
  
  
  Major Pitfalls and Solutions
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F39x43upsykqln99yroez.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F39x43upsykqln99yroez.png" alt="Finder 2026-06-14 17.53.52" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;During the process of connecting this dynamic Quick Reply to the Batch API, we encountered several critical UX and API limitation issues:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall One: LINE 20-character Limit Causing API Sending Errors
&lt;/h3&gt;

&lt;p&gt;Initially, when implementing, we directly used the full restaurant name in the button's Label, for example: &lt;code&gt;🍴 Analyze Love Hot Pot Ultimate Hot Pot&lt;/code&gt;. As a result, the LINE API immediately returned a 400 error, and the message could not be sent at all:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
plaintext
LineBotApiError: status_code=400, error_message=The property 'label' must be less than 20 characters.



&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;[Cause Analysis and Solution]&lt;/strong&gt; LINE's official &lt;code&gt;label&lt;/code&gt; limit for Quick Reply is extremely strict; &lt;strong&gt;including emojis and spaces, it can have a maximum of 20 characters&lt;/strong&gt;. To address this, we added a character count check and dynamic truncation mechanism in our code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First, the original restaurant name (&lt;code&gt;clean_label&lt;/code&gt;) is truncated: if its length exceeds 10 characters, it is forcibly cut to the first 9 characters and appended with "…" (occupying 10 characters).&lt;/li&gt;
&lt;li&gt;Adding the prefix &lt;code&gt;🍴 Analyze&lt;/code&gt; (a total of 5 characters), the maximum total length becomes 15 characters, safely staying within the 20-character limit, thus eliminating the error!&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall Two: Batch API Asynchronous Delay and LINE Webhook's "Three-Second Timeout Survival Battle"
&lt;/h3&gt;

&lt;p&gt;When a user clicks the "Analyze Restaurant" button, the Bot must first call Google Search Grounding to collect online reviews for that restaurant, then package the JSONL file and upload it to Gemini to submit the Batch task. This entire sequence usually takes 3 to 8 seconds. However, &lt;strong&gt;the LINE Webhook server requires the Bot to return an HTTP 200 OK response within 3 seconds&lt;/strong&gt;, otherwise it will be deemed a connection failure and re-send the request, leading to severe server congestion.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;[Cause Analysis and Solution]&lt;/strong&gt; We completely asynchronous the processing architecture:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Fast Response&lt;/strong&gt;: When the Bot intercepts a &lt;code&gt;specific_foodie_deep_analysis&lt;/code&gt; Postback action, &lt;strong&gt;it does not execute the analysis directly within the Request flow&lt;/strong&gt;. Instead, it immediately calls LINE's &lt;code&gt;reply_message&lt;/code&gt; to respond to the user: &lt;code&gt;

🔍 Received! Performing deep analysis for you... This will take about 1-2 minutes...&lt;/code&gt;, and then instantly returns HTTP 200 to end that Webhook request.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Background Task Dispatch&lt;/strong&gt;: Use Python &lt;code&gt;asyncio.create_task&lt;/code&gt; to dispatch heavy network search, upload, and submission tasks to FastAPI's background Worker for execution.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Big Data Push&lt;/strong&gt;: When the background Polling listener or Gemini Webhook receives a task completion notification, it then uses LINE's &lt;code&gt;push_message&lt;/code&gt; to proactively send the analysis report to the specific user.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Pitfall Three: Gemini Batch API's Queuing and Pending Status
&lt;/h3&gt;

&lt;p&gt;During testing, users sometimes got confused, "Why hasn't there been a reply after three minutes? Is the Bot down?". After checking the system logs, we found that our JSONL file had been successfully uploaded, but the task status on the Gemini server side was stuck at &lt;code&gt;JobState.JOB_STATE_PENDING&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;[Solution]&lt;/strong&gt; This is a characteristic of the Batch API; tasks need to be queued, waiting for Google's server resources. We adopted two major optimizations:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Minimize Workload&lt;/strong&gt;: Reduce the number of restaurants for batch analysis to 1, shrinking the number of request lines in the JSONL to the extreme, to speed up Gemini's scheduling and processing.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;UX Optimization and Deduplication Mechanism&lt;/strong&gt;: When a user clicks to analyze, we first check if that user already has a Batch Job running. If so, we reply: &lt;code&gt;⏳ Your deep analysis task is currently running, please wait patiently&lt;/code&gt;, preventing users from submitting multiple duplicate Batch Jobs due to anxious repeated clicks, which would consume unnecessary resources.&lt;/li&gt;
&lt;/ol&gt;




&lt;h1&gt;
  
  
  Results and Benefits
&lt;/h1&gt;

&lt;p&gt;This optimization of Quick Reply and Gemini Batch API for the &lt;strong&gt;LINE Bot Restaurant Assistant&lt;/strong&gt; has achieved excellent practical value:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Highly Customized Mobile Experience&lt;/strong&gt;: After locating, users don't need to type; they can directly click on a restaurant of interest with one tap to precisely get a summary of its signature dishes and review pain points.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robust Backend Architecture&lt;/strong&gt;: By leveraging asynchronous background tasks and LINE's character limit safety valve, the risks of Webhook timeouts and LINE API errors have been completely resolved.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cost Advantage for Big Data Processing&lt;/strong&gt;: Through the Batch API's half-price advantage and Webhook's proactive callback, while ensuring user experience, it also saves significant computing resources and API costs for the server.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Through this architecture, the LINE Bot truly achieves a low-latency, highly stable big data deep analysis experience on mobile!&lt;/p&gt;

&lt;p&gt;All development code for this project has been open-sourced on GitHub: &lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;kkdai/linebot-helper-python&lt;/a&gt;. Everyone is welcome to deploy and personally test this one-click analysis function, which we believe can bring a higher level of intelligent experience to your LINE Bot projects!&lt;/p&gt;

</description>
      <category>api</category>
      <category>gemini</category>
      <category>llm</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>[I/O Extended Taipei] Building</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 14 Jun 2026 07:12:11 +0000</pubDate>
      <link>https://dev.to/gde/io-extended-taipei-building-cl0</link>
      <guid>https://dev.to/gde/io-extended-taipei-building-cl0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7nklzeowl72fpp27lvn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn7nklzeowl72fpp27lvn.png" alt="image-20260612163641980" width="799" height="529"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Activity: &lt;a href="https://gdg.community.dev/events/details/google-gdg-taipei-presents-google-io-extended-2026-taipei/" rel="noopener noreferrer"&gt;Google I/O Extended 2026 Taipei&lt;/a&gt; / Presentation: &lt;a href="https://speakerdeck.com/line_developers_tw/building-applications-in-the-gemini-api-family" rel="noopener noreferrer"&gt;SpeakerDeck&lt;/a&gt;)&lt;/p&gt;

&lt;h1&gt;
  
  
  Context: The Gemini API is no longer just "adding one more prompt"
&lt;/h1&gt;

&lt;p&gt;If your impression of the Gemini API is still limited to "select a model, send a prompt, get back a piece of text," then when you see this round of updates in 2026, you'll likely suddenly realize something:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Gemini API has evolved from a simple API interface into a complete platform that can be used to build applications, agents, and asynchronous workflows.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This content is compiled from my talk "Building Applications in the Gemini API Family" at &lt;a href="https://gdg.community.dev/events/details/google-gdg-taipei-presents-google-io-extended-2026-taipei/" rel="noopener noreferrer"&gt;Google I/O Extended 2026 Taipei&lt;/a&gt;. &lt;strong&gt;Evan Lin&lt;/strong&gt;, Technical Director of LINE Taiwan Developer Relations, repeatedly emphasized a core observation at the event: what developers truly need to consider now is no longer just &lt;em&gt;"Should I use Pro or Flash?"&lt;/em&gt;, but rather &lt;em&gt;"How do I string together models, retrieval, agents, callbacks, and cost control into a cohesive system?"&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;In other words, the focus is shifting from &lt;strong&gt;calling APIs&lt;/strong&gt; to &lt;strong&gt;designing systems&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  First, let's look at the big picture: What's new in the 2026 Gemini API family?
&lt;/h2&gt;

&lt;p&gt;If we view the 2026 Gemini API as a capability map, it can broadly be divided into three layers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Layer 1: Core Models
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 3.5 Pro&lt;/strong&gt;: Strongest reasoning capability, suitable for complex planning, advanced analysis, and multi-step tasks.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini 3.5 Flash&lt;/strong&gt;: Main model, best balance of speed, cost, and capability, suitable for most product traffic.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Flash-Lite&lt;/strong&gt;: Intent classifier and pre-classifier for high-frequency, low-cost scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gemini Embedding 2&lt;/strong&gt;: Supports not only text but also multi-modal vectorization needs.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 2: Key Capability Modules
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval&lt;/strong&gt;: File Search, Google Search Grounding, URL Context.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agent / Async&lt;/strong&gt;: Agents API, Webhook, Deep Research agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Infrastructure&lt;/strong&gt;: Context caching, Batch API, Live API.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Layer 3: System Design Approach
&lt;/h3&gt;

&lt;p&gt;This layer is arguably the most important. Because once the above capabilities are offered as platform services, many "intermediate layers" that previously had to be built manually suddenly disappear:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No longer necessarily need to build your own RAG pipeline.&lt;/li&gt;
&lt;li&gt;No longer necessarily need to maintain your own agent loop.&lt;/li&gt;
&lt;li&gt;No longer necessarily need to block the main server with polling while waiting for results.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Core Observation&lt;/strong&gt;: The Gemini API upgrade is not just about "stronger models"; it's about &lt;strong&gt;Google absorbing the complexities that were originally at the application layer into the platform layer&lt;/strong&gt;. This will directly change how we design AI systems.&lt;/p&gt;




&lt;h1&gt;
  
  
  Architectural Turning Point: Three Tools, Three Paradigm Shifts
&lt;/h1&gt;

&lt;p&gt;What's most worth repeatedly digesting from this talk are the architectural changes represented by these three tools.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. File Search: Shifting from Hand-Coded RAG to Managed RAG
&lt;/h2&gt;

&lt;p&gt;Previously, when discussing enterprise knowledge Q&amp;amp;A, the immediate thought was:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Chunking.&lt;/li&gt;
&lt;li&gt;Creating embeddings.&lt;/li&gt;
&lt;li&gt;Storing in a vector DB.&lt;/li&gt;
&lt;li&gt;Writing retrieval code.&lt;/li&gt;
&lt;li&gt;Then manually adding citation and permission control.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now, with the advent of File Search, developers can focus more on "how documents are governed, how permissions are allocated, and how answers are presented," rather than repeatedly writing that foundational infrastructure.&lt;/p&gt;

&lt;p&gt;More importantly, it doesn't just search text.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why is this File Search particularly noteworthy?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Images and text in the same space&lt;/strong&gt;: Screenshots, charts, and mixed text-image layouts in PDFs are no longer just attachments, but content understandable by the model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Metadata filtering&lt;/strong&gt;: Can filter by department, system, and document type, which is crucial for internal enterprise knowledge retrieval.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Precise citation&lt;/strong&gt;: Can refer back to specific page numbers and grounding metadata, making answers more trustworthy.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This represents a very practical shift: much of the time enterprises previously spent on LangChain, vector databases, and chunking strategies can now largely be redirected towards &lt;strong&gt;permission design, UX, and content governance&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Agents API: Shifting from Client-Side Loop to Server-Side Managed Agent
&lt;/h2&gt;

&lt;p&gt;In the past, to build an agent, the common approach was to maintain your own ReAct or tool loop:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model decides the next step.&lt;/li&gt;
&lt;li&gt;Calls a tool.&lt;/li&gt;
&lt;li&gt;Receives results.&lt;/li&gt;
&lt;li&gt;Feeds back to the model.&lt;/li&gt;
&lt;li&gt;Repeats until completion.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The problem is that this is full of engineering details: state preservation, timeouts, retries, background execution, long-task monitoring. Ultimately, you'd find yourself spending most of your time maintaining an "agent runtime."&lt;/p&gt;

&lt;p&gt;What the Agents API changes is that you can POST a task to Gemini, allowing it to complete the long process on the server side, even handling complex tasks that take up to 20 minutes.&lt;/p&gt;

&lt;p&gt;The significance behind this is not just "more convenient"; it means developers can finally refocus on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How are tasks defined?&lt;/li&gt;
&lt;li&gt;Which tools can be used?&lt;/li&gt;
&lt;li&gt;What are the success criteria?&lt;/li&gt;
&lt;li&gt;How should the product integrate the results when they return?&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  3. Webhook: Shifting from Polling to Event-Driven
&lt;/h2&gt;

&lt;p&gt;Once tasks might run for several minutes, or even more than ten minutes, traditional synchronous requests become unreasonable.&lt;/p&gt;

&lt;p&gt;Therefore, the role of Webhook is actually crucial: it's not a minor feature, but a prerequisite for the entire agent workflow to truly enter production. When Gemini completes a task and actively POSTs the result back to your server, your system can become event-driven:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The frontend first responds to the user with "Task received."&lt;/li&gt;
&lt;li&gt;The Agents API executes in the background.&lt;/li&gt;
&lt;li&gt;Upon completion, the result is pushed back via webhook.&lt;/li&gt;
&lt;li&gt;Your service then notifies the user, updates the database, or triggers the next step.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is particularly important for high-concurrency products, as you finally don't need to hold a bunch of server connections idly waiting.&lt;/p&gt;




&lt;h1&gt;
  
  
  From the Perspective of a LINE Bot, How Should a Gemini Application Be Designed?
&lt;/h1&gt;

&lt;p&gt;A very practical suggestion Evan gave in his talk is to &lt;strong&gt;place a router layer before the LLM&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This design sounds simple, but it largely determines your cost, latency, and predictability.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Very Pragmatic Routing Approach
&lt;/h2&gt;

&lt;p&gt;First, use the inexpensive &lt;strong&gt;Flash-Lite&lt;/strong&gt; for intent routing:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Quick Q&amp;amp;A&lt;/strong&gt;: Directly handed over to Flash or Flash-Lite for generation.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Query company documents&lt;/strong&gt;: Enters File Search.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Complex long tasks&lt;/strong&gt;: Enters Agents API.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Doing this has three benefits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cost control first&lt;/strong&gt;: Not every query directly hits the most expensive, heaviest model.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Latency control first&lt;/strong&gt;: Simple requests should not mistakenly enter long processes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;System behavior control first&lt;/strong&gt;: Makes the overall process more stable than "throwing everything at a large model for improvisation."&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If you're building a LINE Bot, customer service assistant, internal knowledge assistant, or workflow agent, this router should almost certainly be the default configuration, rather than an afterthought.&lt;/p&gt;




&lt;h2&gt;
  
  
  Infrastructure is Not Unimportant, But You Don't Have to Rebuild it Yourself Every Time
&lt;/h2&gt;

&lt;p&gt;Another strong message from this talk is that developers' time should be reallocated.&lt;/p&gt;

&lt;p&gt;Previously, much of the man-hours in many AI projects were actually consumed by these tasks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vector database operations and maintenance&lt;/li&gt;
&lt;li&gt;Chunking and retrieval parameter tuning&lt;/li&gt;
&lt;li&gt;Long-task scheduling&lt;/li&gt;
&lt;li&gt;Websocket / polling / callback processes&lt;/li&gt;
&lt;li&gt;Token cost optimization&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now, with File Search, Agents API, Webhook, Context caching, and Batch API, the areas where we should spend more time have shifted to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Business rules and tool boundaries&lt;/li&gt;
&lt;li&gt;Document permissions and data governance&lt;/li&gt;
&lt;li&gt;User interaction experience&lt;/li&gt;
&lt;li&gt;Task decomposition and routing strategies&lt;/li&gt;
&lt;li&gt;Failure recovery and result interpretability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is also why I strongly agree with Evan's underlying message: &lt;strong&gt;What's truly valuable is not whether you can build your own vector database, but whether you can redirect 80% of your energy back to the product's core.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Three Most Valuable Practical Takeaways
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Place a routing layer before the LLM
&lt;/h3&gt;

&lt;p&gt;Don't send all problems directly to the same model. First classify, then decide whether to generate, retrieve, or enter an agent task.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Embrace asynchronous operations; don't force long tasks into synchronous APIs
&lt;/h3&gt;

&lt;p&gt;If a task might take more than a few seconds, you should seriously consider Agents API + Webhook. This is not an optimization; it's an architectural correctness issue.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Redirect RAG engineering time to permissions and experience
&lt;/h3&gt;

&lt;p&gt;When File Search can handle a large amount of foundational work, developers should be more concerned with: can data be securely queried, can answers be verified, and can citations be trusted by users.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why is this talk worth revisiting repeatedly?
&lt;/h2&gt;

&lt;p&gt;Because it highlights a turning point that many teams are currently facing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;We are no longer just writing prompts for LLMs; we are designing operating systems for AI applications.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Models are certainly still at the core, but what truly differentiates products is increasingly not "which model you choose," but:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;How you decide when to use which capability.&lt;/li&gt;
&lt;li&gt;How you make the system run reliably for extended periods.&lt;/li&gt;
&lt;li&gt;How you make answers traceable, verifiable, and maintainable.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you still understand generative AI using the 2024 approach of "a single chat endpoint for everything," then you'll easily underestimate the 2026 Gemini API family.&lt;/p&gt;




&lt;h2&gt;
  
  
  Postscript: From API User to AI System Designer
&lt;/h2&gt;

&lt;p&gt;The most valuable aspect of this "Building Applications in the Gemini API Family" talk is not teaching you another new parameter or SDK, but reminding everyone of a more fundamental shift:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The competitiveness of the next phase will not be about who is better at calling models, but who is better at assembling models, retrieval, agents, and event flows into a truly functional system.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you are working on a LINE Bot, enterprise knowledge base, internal assistant, customer service process, or any product requiring multi-step AI collaboration, this architectural perspective is well worth using to redraw your current system diagram.&lt;/p&gt;

&lt;p&gt;Often, what truly needs refactoring is not the prompt, but the entire pipeline.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gemini</category>
      <category>google</category>
    </item>
    <item>
      <title>[Hands-on Gemini 3.5 Live</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 12 Jun 2026 06:09:59 +0000</pubDate>
      <link>https://dev.to/gde/hands-on-gemini-35-live-3dh6</link>
      <guid>https://dev.to/gde/hands-on-gemini-35-live-3dh6</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x1mkub62aoeiv68idtq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6x1mkub62aoeiv68idtq.png" alt="image-20260610144830233" width="800" height="326"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Brand New API Unveiled: Gemini 3.5 Live Translate
&lt;/h1&gt;

&lt;p&gt;On June 9, 2026, Google officially released its brand new real-time voice translation model — &lt;strong&gt;Gemini 3.5 Live Translate&lt;/strong&gt;. This marks another significant breakthrough for Google in AI voice translation technology. It is currently available for public preview to developers in Google AI Studio and Gemini Live API, and has been simultaneously integrated into services like Google Translate and Google Meet.&lt;/p&gt;

&lt;p&gt;Key features of Gemini 3.5 Live Translate include:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Fluent and Natural Bidirectional Voice Translation&lt;/strong&gt;: Supports over 70 languages, automatically detecting the input voice language without manual configuration.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Continuous Stream Generation (Instead of Single-Sentence Turn-Taking)&lt;/strong&gt;: Unlike previous turn-by-turn systems that required the speaker to finish speaking before translation, Gemini 3.5 Live Translate generates translations in real-time while listening. It strikes a balance between contextual understanding and immediacy, with translations lagging only a few seconds behind the speaker, completely avoiding awkward pauses.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Preservation of Intonation and Rhythm&lt;/strong&gt;: The generated voice is not only smooth but also retains the original speaker's tone, intonation, and speaking rhythm.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Robust Noise Cancellation Capability&lt;/strong&gt;: Accurately captures and recognizes speech even in noisy or unstable environments.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This article will document how we developed a native macOS application, &lt;strong&gt;MeetingTranslator&lt;/strong&gt;, using Swift, to integrate with this powerful new API and achieve real-time translation of specific app audio into Traditional Chinese voice and subtitles.&lt;/p&gt;




&lt;h1&gt;
  
  
  System Design and Architecture
&lt;/h1&gt;

&lt;p&gt;Our goal is to develop a Native SwiftUI application that does not require installing virtual sound cards like BlackHole. Instead, it utilizes Apple's official &lt;strong&gt;ScreenCaptureKit&lt;/strong&gt; framework to directly capture the audio stream from a selected application (such as YouTube in Google Chrome or an online meeting) and, through the &lt;strong&gt;Gemini Live WebSocket API&lt;/strong&gt;, achieve ultra-low-latency conversational voice translation.&lt;/p&gt;

&lt;h3&gt;
  
  
  System Architecture Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight dot"&gt;&lt;code&gt;&lt;span class="k"&gt;graph&lt;/span&gt; &lt;span class="nv"&gt;TD&lt;/span&gt;
    &lt;span class="nv"&gt;A&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;ScreenCaptureKit&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;br&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;Capture&lt;/span&gt; &lt;span class="nv"&gt;Application&lt;/span&gt; &lt;span class="nv"&gt;Audio&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;|&lt;/span&gt;&lt;span class="mi"&gt;48&lt;/span&gt;&lt;span class="nv"&gt;kHz&lt;/span&gt; &lt;span class="nv"&gt;Stereo&lt;/span&gt; &lt;span class="nv"&gt;Float32&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt; &lt;span class="nv"&gt;B&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;AVAudioConverter&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;br&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;Resampling&lt;/span&gt; &lt;span class="nv"&gt;and&lt;/span&gt; &lt;span class="nv"&gt;Channel&lt;/span&gt; &lt;span class="nv"&gt;Conversion&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="nv"&gt;B&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;|&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="nv"&gt;kHz&lt;/span&gt; &lt;span class="nv"&gt;Mono&lt;/span&gt; &lt;span class="nv"&gt;Int16&lt;/span&gt; &lt;span class="nv"&gt;PCM&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt; &lt;span class="nv"&gt;C&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;Gemini&lt;/span&gt; &lt;span class="nv"&gt;Live&lt;/span&gt; &lt;span class="nv"&gt;API&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;br&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;WebSocket&lt;/span&gt; &lt;span class="nv"&gt;Connection&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="nv"&gt;C&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;|&lt;/span&gt;&lt;span class="nv"&gt;Real&lt;/span&gt;&lt;span class="err"&gt;-&lt;/span&gt;&lt;span class="nv"&gt;time&lt;/span&gt; &lt;span class="nv"&gt;Subtitle&lt;/span&gt; &lt;span class="nv"&gt;Recognition&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt; &lt;span class="nv"&gt;D&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;SwiftUI&lt;/span&gt; &lt;span class="nv"&gt;Subtitle&lt;/span&gt; &lt;span class="nv"&gt;HUD&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;br&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;Traditional&lt;/span&gt; &lt;span class="nv"&gt;Chinese&lt;/span&gt; &lt;span class="nv"&gt;Bilingual&lt;/span&gt; &lt;span class="nv"&gt;Subtitles&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
    &lt;span class="nv"&gt;C&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="err"&gt;&amp;gt;|&lt;/span&gt;&lt;span class="mi"&gt;24&lt;/span&gt;&lt;span class="nv"&gt;kHz&lt;/span&gt; &lt;span class="nv"&gt;Mono&lt;/span&gt; &lt;span class="nv"&gt;Int16&lt;/span&gt; &lt;span class="nv"&gt;PCM&lt;/span&gt; &lt;span class="nv"&gt;Translated&lt;/span&gt; &lt;span class="nv"&gt;Audio&lt;/span&gt;&lt;span class="err"&gt;|&lt;/span&gt; &lt;span class="nv"&gt;E&lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;AudioPlaybackManager&lt;/span&gt; &lt;span class="p"&gt;&amp;lt;&lt;/span&gt;br&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;&lt;span class="nv"&gt;AVAudioEngine&lt;/span&gt; &lt;span class="nv"&gt;Player&lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Core Implementation One: ScreenCaptureKit Capture and Resampling
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;ScreenCaptureKit&lt;/strong&gt;, introduced in macOS 13, frees developers from the pain of relying on kernel audio virtual devices, allowing precise filtering and recording of specific application screens and audio.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Filter and Select Target App
&lt;/h3&gt;

&lt;p&gt;We use &lt;code&gt;SCShareableContent&lt;/code&gt; to get currently running applications on the system and filter out background services without names and system-自带 services:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;fetchShareableApps&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="kt"&gt;SCRunningApplication&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="kt"&gt;SCShareableContent&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;applications&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;filter&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;applicationName&lt;/span&gt;
            &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;isEmpty&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;bundleId&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bundleIdentifier&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;bundleId&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;hasPrefix&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"com.apple.system"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;bundleId&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="kt"&gt;Bundle&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;main&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;bundleIdentifier&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sorted&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nv"&gt;$0&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;applicationName&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="nv"&gt;$1&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;applicationName&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"無法獲取可共享內容: &lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="n"&gt;error&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Start Audio Capture Stream
&lt;/h3&gt;

&lt;p&gt;After filtering out the target App (e.g., Google Chrome), we create an &lt;code&gt;SCContentFilter&lt;/code&gt; for it and apply it to &lt;code&gt;SCStream&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;appFilter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;SCContentFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;display&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;displays&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;first&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;including&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;targetApp&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="nv"&gt;exceptingWindows&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[])&lt;/span&gt;
&lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;SCStreamConfiguration&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;capturesAudio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;width&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt; &lt;span class="c1"&gt;// When only capturing audio, set video frame to minimal to save performance&lt;/span&gt;
&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;height&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;32&lt;/span&gt;

&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;SCStream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;filter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;appFilter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;configuration&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;delegate&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;addStreamOutput&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;audio&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;sampleHandlerQueue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;DispatchQueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;label&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"com.translator.audioQueue"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startCapture&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Core Implementation Two: Gemini Live WebSocket Bidirectional Connection
&lt;/h1&gt;

&lt;p&gt;The core of the Gemini Live API lies in using a &lt;code&gt;wss://&lt;/code&gt; connection to transmit microphone/application audio in real-time through a single channel, and simultaneously receive model-generated translated text and translated audio.&lt;/p&gt;

&lt;p&gt;In &lt;a&gt;GeminiLiveConnection.swift&lt;/a&gt;, we maintain this bidirectional pipeline via &lt;code&gt;URLSessionWebSocketTask&lt;/code&gt;. After connecting, a &lt;code&gt;setup&lt;/code&gt; control message must be sent immediately to initialize the model configuration.&lt;/p&gt;




&lt;h1&gt;
  
  
  Major Pitfalls and Solutions
&lt;/h1&gt;

&lt;p&gt;During the process of integrating the system, we encountered three blocking difficulties. Below is our troubleshooting process and solutions:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall One: Gemini Live Exclusive Model Restrictions
&lt;/h3&gt;

&lt;p&gt;Initially, we tried to use standard REST API model names (e.g., &lt;code&gt;gemini-3.5-flash&lt;/code&gt;) in the WebSocket connection, but the server immediately disconnected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ WebSocket 被 Gemini 伺服器關閉 (CloseCode: 1008, 原因: models/gemini-3.5-flash is not found for API version v1beta, or is not supported for bidiGenerateContent.)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;【Solution】&lt;/strong&gt; Gemini's bidirectional Live API currently only supports specific optimized real-time models. We must restrict the model field to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;gemini-2.0-flash-exp&lt;/code&gt; (standard bidirectional conversation)&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;gemini-3.5-live-translate-preview&lt;/code&gt; (preview model optimized for real-time translation)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall Two: Incorrect JSON Payload Field Structure (Hidden Differences Between Documentation and API Versions)
&lt;/h3&gt;

&lt;p&gt;When configuring real-time interpretation, we referred to Google's official documentation and placed the &lt;code&gt;inputAudioTranscription&lt;/code&gt; (input speech-to-text) and &lt;code&gt;outputAudioTranscription&lt;/code&gt; (output speech-to-text) fields within &lt;code&gt;generationConfig&lt;/code&gt;, which resulted in a 1007 error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ WebSocket 被 Gemini 伺服器關閉 (CloseCode: 1007, 原因: Invalid JSON payload received. Unknown name "inputAudioTranscription" at 'setup.generation_config': Cannot find field.)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;【Cause Analysis and Solution】&lt;/strong&gt; In the official documentation, for &lt;code&gt;v1alpha&lt;/code&gt; and client SDKs (e.g., JavaScript / Python SDK), these two fields are wrapped within &lt;code&gt;generationConfig&lt;/code&gt;. However, in the current &lt;code&gt;v1beta&lt;/code&gt; WebSocket native endpoint: &lt;code&gt;/ws/google.ai.generativelanguage.v1beta.GenerativeService.BidiGenerateContent&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;These two fields should be located at the &lt;strong&gt;root level&lt;/strong&gt; of the &lt;code&gt;setup&lt;/code&gt; object, while the translation-specific &lt;code&gt;translationConfig&lt;/code&gt; must be placed under &lt;code&gt;generationConfig&lt;/code&gt;. The correct JSON Payload structure is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="n"&gt;setupMessage&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="s"&gt;"setup"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="s"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"models/&lt;/span&gt;&lt;span class="se"&gt;\(&lt;/span&gt;&lt;span class="n"&gt;modelName&lt;/span&gt;&lt;span class="se"&gt;)&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="s"&gt;"inputAudioTranscription"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[:],&lt;/span&gt; &lt;span class="c1"&gt;// Enable real-time input subtitles, placed at the setup root&lt;/span&gt;
        &lt;span class="s"&gt;"outputAudioTranscription"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[:],&lt;/span&gt; &lt;span class="c1"&gt;// Enable real-time output subtitles, placed at the setup root&lt;/span&gt;
        &lt;span class="s"&gt;"generationConfig"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s"&gt;"responseModalities"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"AUDIO"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="s"&gt;"translationConfig"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="s"&gt;"targetLanguageCode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"zh-TW"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;// Set target translation language to Traditional Chinese&lt;/span&gt;
                &lt;span class="s"&gt;"echoTargetLanguage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this modification, the WebSocket setup finally successfully handshaked and no longer crashed!&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall Three: "Zero-Byte Silence" Caused by Multi-Channel Stereo Capture
&lt;/h3&gt;

&lt;p&gt;After successfully establishing the WebSocket pipeline and starting to push resampled audio, we found that Gemini still had no translation response. Observing the log output, we discovered that the content of the sent audio blocks was all &lt;code&gt;0&lt;/code&gt; (Silence):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;📊 [WebSocket] 已發送 500 個音訊區塊 | 大小: 640 bytes | 是否為靜音(全0): true

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;【Cause Analysis】&lt;/strong&gt; When the captured object (e.g., Google Chrome playing a YouTube video) outputs stereo (2 Channels) or multi-channel audio, our original method for converting &lt;code&gt;CMSampleBuffer&lt;/code&gt; to &lt;code&gt;AVAudioPCMBuffer&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="c1"&gt;// Old method: Directly assumes a single Channel pointer and copies&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;audioBufferList&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;AudioBufferList&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;blockBuffer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;CMBlockBuffer&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;
&lt;span class="kt"&gt;CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;audioBufferList&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In a multi-channel environment, this would lead to insufficient memory allocation, causing copy interruption or fill failure, resulting in all subsequent audio resampler (AVAudioConverter) inputs being null values (silence).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;【Solution】&lt;/strong&gt; It is necessary to use the &lt;strong&gt;Double-Call technique&lt;/strong&gt; to dynamically allocate memory space for &lt;code&gt;AudioBufferList&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;First Call&lt;/strong&gt;: Pass &lt;code&gt;nil&lt;/code&gt; as the buffer output, used only to precisely query the required physical memory size (&lt;code&gt;bufferListSizeNeededOut&lt;/code&gt;) for that &lt;code&gt;sampleBuffer&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Memory Allocation&lt;/strong&gt;: Use &lt;code&gt;UnsafeMutablePointer&amp;lt;AudioBufferList&amp;gt;.allocate&lt;/code&gt; to dynamically allocate space based on the queried size.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Second Call&lt;/strong&gt;: Pass the allocated pointer to safely fill in multi-channel audio data.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Channel Reassembly&lt;/strong&gt;: Based on the multi-channel format (Interleaved/Non-Interleaved), precisely use &lt;code&gt;memcpy&lt;/code&gt; to copy the corresponding data segments into a temporary buffer, then send it to the converter for noise reduction and downsampling.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Core code correction:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight swift"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;func&lt;/span&gt; &lt;span class="nf"&gt;audioBufferFromSampleBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="nv"&gt;sampleBuffer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;CMSampleBuffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;asbd&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;AudioStreamBasicDescription&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="kt"&gt;AVAudioPCMBuffer&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;sourceFormat&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sourceFormat&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// 1. Dynamically get the required AudioBufferList memory size&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;bufferListSize&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;sampleBuffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;bufferListSizeNeededOut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;bufferListSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;bufferListOut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;bufferListSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;blockBufferAllocator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;blockBufferMemoryAllocator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;blockBufferOut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;noErr&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// 2. Allocate a pointer with sufficient space and fill it&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;bufferListPointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;UnsafeMutablePointer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;AudioBufferList&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;.&lt;/span&gt;&lt;span class="nf"&gt;allocate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;capacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bufferListSize&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;defer&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="n"&gt;bufferListPointer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;deallocate&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;var&lt;/span&gt; &lt;span class="nv"&gt;blockBuffer&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;CMBlockBuffer&lt;/span&gt;&lt;span class="p"&gt;?&lt;/span&gt;
    &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;CMSampleBufferGetAudioBufferListWithRetainedBlockBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;sampleBuffer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;bufferListSizeNeededOut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;bufferListOut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bufferListPointer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;bufferListSize&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bufferListSize&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;blockBufferAllocator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;blockBufferMemoryAllocator&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;flags&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nv"&gt;blockBufferOut&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;blockBuffer&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;noErr&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="c1"&gt;// 3. Create an AVAudioPCMBuffer conforming to the source format and safely copy...&lt;/span&gt;
    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;frameCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;AVAudioFrameCount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;CMSampleBufferGetNumSamples&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sampleBuffer&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;pcmBuffer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;AVAudioPCMBuffer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;pcmFormat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sourceFormat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;frameCapacity&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;frameCount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="kc"&gt;nil&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="n"&gt;pcmBuffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;frameLength&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;frameCount&lt;/span&gt;

    &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;audioBuffers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kt"&gt;UnsafeMutableAudioBufferListPointer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bufferListPointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;audioBuffer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="n"&gt;audioBuffers&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;enumerated&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;guard&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;mData&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;audioBuffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sourceFormat&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;channelCount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="k"&gt;continue&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="c1"&gt;// Differentiate between non-interleaved and interleaved formats for copying&lt;/span&gt;
        &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;isNonInterleaved&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asbd&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mFormatFlags&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;kAudioFormatFlagIsNonInterleaved&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;isNonInterleaved&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pcmBuffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int16ChannelData&lt;/span&gt;&lt;span class="p"&gt;?[&lt;/span&gt;&lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="nf"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audioBuffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mDataByteSize&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;dst&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pcmBuffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;int16ChannelData&lt;/span&gt;&lt;span class="p"&gt;?[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="nv"&gt;offset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;frameCount&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="nf"&gt;memcpy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;dst&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;advanced&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;by&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;offset&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;mData&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;Int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audioBuffer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mDataByteSize&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pcmBuffer&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After applying this refactoring, when we played a test video on Chrome's YouTube again, the console finally printed: &lt;code&gt;是否為靜音(全0): false&lt;/code&gt;, and we successfully received Gemini's real-time voice feedback!&lt;/p&gt;




&lt;h1&gt;
  
  
  Results and Benefits
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fih2tjzsuyw067ix5845a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fih2tjzsuyw067ix5845a.png" alt="image-20260610144945151" width="800" height="626"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Full development repo: &lt;a href="https://github.com/kkdai/gemini-live-translate-macos" rel="noopener noreferrer"&gt;https://github.com/kkdai/gemini-live-translate-macos&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Through this architectural upgrade and bug fixes, &lt;strong&gt;MeetingTranslator&lt;/strong&gt; has demonstrated excellent practical value:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Zero External Device Dependency&lt;/strong&gt;: No need to set up complex routing like BlackHole or Loopback; it works out of the box.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Accurate and Real-time Subtitles&lt;/strong&gt;: The Gemini Live API can complete English to Traditional Chinese translation within hundreds of milliseconds, smoothly displaying the results in a HUD floating window.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Synchronized Voice Translation Broadcast&lt;/strong&gt;: Through &lt;code&gt;AudioPlaybackManager&lt;/code&gt;, users can listen to the original meeting while simultaneously hearing high-quality 24kHz Traditional Chinese interpretation in their headphones.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We hope this record of pitfalls encountered with macOS Core Audio / ScreenCaptureKit and the Gemini WebSocket API can provide valuable reference for developers also exploring AI real-time voice applications!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gemini</category>
      <category>google</category>
    </item>
    <item>
      <title>[AI Practice] Building blazing-Fast AI Mac OS App with Antigravity CLI</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 12 Jun 2026 06:09:40 +0000</pubDate>
      <link>https://dev.to/gde/ai-practice-blazing-fast-ai-co-29l7</link>
      <guid>https://dev.to/gde/ai-practice-blazing-fast-ai-co-29l7</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Folr85lchvp9197zg9lvg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Folr85lchvp9197zg9lvg.png" alt="image-20260612102252662" width="800" height="622"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Foreword: A Developer's New Collaboration Model
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0iuas8puugsm23a61hw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs0iuas8puugsm23a61hw.png" alt="image-20260612102436750" width="800" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Imagine this scenario: you are developing a real-time meeting translation App that combines macOS low-level audio (CoreAudio/ScreenCaptureKit) with Gemini Live API WebSocket. During the testing phase, the program suddenly crashed with an error, and the audio stream produced a complete silence of all zeros.&lt;/p&gt;

&lt;p&gt;In the past, your troubleshooting process might have been:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the terminal and retrieve the log file.&lt;/li&gt;
&lt;li&gt;Copy the entire error message and relevant code.&lt;/li&gt;
&lt;li&gt;Switch to the browser, open an AI chat window, paste it, and ask for the reason.&lt;/li&gt;
&lt;li&gt;After receiving modification suggestions, copy them back to the editor and test manually.&lt;/li&gt;
&lt;li&gt;Repeat the above steps until fixed, then manually write &lt;code&gt;README.md&lt;/code&gt;, write a blog post, create a GitHub repository, commit the code, and push it.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In this development cycle, we adopted the &lt;strong&gt;AGY CLI (Antigravity-CLI)&lt;/strong&gt; agent designed by Google DeepMind. We were surprised to find that all the tedious context switching mentioned above could be &lt;strong&gt;fully automated&lt;/strong&gt; through conversations with the intelligent agent within the terminal. This article will reconstruct the actual Prompt dialogue flow and share how we collaborated with AGY CLI to build a macOS meeting translation App from scratch.&lt;/p&gt;




&lt;h1&gt;
  
  
  Phase One: Idea Generation and Architecture Design
&lt;/h1&gt;

&lt;p&gt;Everything originated from a development idea and a newly released Google API document. The developer pasted a URL into the terminal for AGY CLI and posed the first core question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User&lt;/strong&gt; : Following this example, is it possible to create a Mac OS App that can capture audio from the computer and translate it into other languages in real-time? Start helping me think about how to do this. My main goal is to be able to use it for Zoom / Google Meet meetings to directly translate other countries' languages into Traditional Chinese.&lt;/p&gt;

&lt;p&gt;Here are other documents for your reference: &lt;a href="https://ai.google.dev/gemini-api/docs/live-api/live-translate?hl=zh-tw" rel="noopener noreferrer"&gt;https://ai.google.dev/gemini-api/docs/live-api/live-translate?hl=zh-tw&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Upon receiving the instruction, AGY CLI acted as an architect, quickly analyzing and providing a set of native macOS solutions:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No virtual sound card installation required&lt;/strong&gt;: It recommends using Apple's &lt;strong&gt;ScreenCaptureKit&lt;/strong&gt; framework, introduced in macOS 13+, to directly capture the clean audio track of selected applications (such as Zoom or Chrome) in Sandbox secure mode.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Real-time audio resampling&lt;/strong&gt;: Utilize &lt;code&gt;AVAudioConverter&lt;/code&gt; to resample multi-channel/stereo 48kHz audio in real-time to the 16kHz mono PCM format supported by Gemini Live.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bidirectional WebSocket communication&lt;/strong&gt;: Implement a bidirectional long connection using Swift's &lt;code&gt;URLSessionWebSocketTask&lt;/code&gt; to send PCM audio chunks while receiving Traditional Chinese translation results and broadcast audio.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Phase Two: Environment Configuration and Compilation Anxiety Elimination
&lt;/h1&gt;

&lt;p&gt;Before starting to write Swift code, the developer raised questions about the macOS compilation environment:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User&lt;/strong&gt; : Do I absolutely need Xcode to compile?&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User&lt;/strong&gt; : Help me run xcode-select –install&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Addressing the developer's environment anxiety, AGY CLI provided the answers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Although it's best to use Xcode to create standard macOS SwiftUI GUI application bundles, we &lt;strong&gt;don't necessarily need to manually open the Xcode IDE&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;AGY CLI helped write an automated build script &lt;code&gt;build_app.sh&lt;/code&gt;, which directly calls the terminal's &lt;code&gt;swiftc&lt;/code&gt; command-line tool to compile all &lt;code&gt;.swift&lt;/code&gt; source files and automatically packages them into a complete &lt;code&gt;.app&lt;/code&gt; directory structure.&lt;/li&gt;
&lt;li&gt;At the same time, in response to the developer's request to install Command Line Tools, AGY CLI proactively invoked the permission tool and directly ran &lt;code&gt;xcode-select --install&lt;/code&gt; locally, automatically configuring the Swift compilation environment.&lt;/li&gt;
&lt;/ul&gt;




&lt;h1&gt;
  
  
  Phase Three: Connection Troubleshooting and Audio Bug Fixes
&lt;/h1&gt;

&lt;p&gt;After the code was initially completed, the developer ran the App from the command line, but the connection status showed abnormalities, and no characters were translated:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User&lt;/strong&gt; : Didn't see any error messages~ but the connection status is disconnected&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This was the moment for AGY CLI to demonstrate its "autonomous troubleshooting" power. Upon receiving the prompt, it automatically located the &lt;code&gt;debug.log&lt;/code&gt; file, called &lt;code&gt;tail&lt;/code&gt; to analyze the runtime logs, and identified two critical issues:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Incompatible model name&lt;/strong&gt;: The original program used the standard REST model &lt;code&gt;models/gemini-3.5-flash&lt;/code&gt;, whereas the Live WebSocket API only accepts &lt;code&gt;gemini-3.5-live-translate-preview&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Incorrect JSON configuration level&lt;/strong&gt;: The API documentation used the &lt;code&gt;v1alpha&lt;/code&gt; version SDK, which wrapped &lt;code&gt;inputAudioTranscription&lt;/code&gt; within &lt;code&gt;generationConfig&lt;/code&gt;; however, the native WebSocket's &lt;code&gt;v1beta&lt;/code&gt; endpoint required these two fields to be placed directly under the &lt;code&gt;setup&lt;/code&gt; root directory. This was the culprit behind the &lt;code&gt;CloseCode 1007&lt;/code&gt; crash.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-channel stereo silence Bug&lt;/strong&gt;: The multi-channel audio track captured by &lt;code&gt;ScreenCaptureKit&lt;/code&gt; was truncated to complete silence (all zeros) during copying in the old code due to insufficient AudioBufferList memory allocation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;AGY CLI immediately proactively modified &lt;a&gt;AudioCaptureManager.swift&lt;/a&gt;, introducing the &lt;strong&gt;"Double-Call" register allocation pointer technique&lt;/strong&gt;, and refactored the Payload structure of &lt;a&gt;GeminiLiveConnection.swift&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;After the modifications were completed, the application ran smoothly, the console log finally printed &lt;code&gt;是否為靜音(全0): false&lt;/code&gt; (Is it silent (all 0s): false), and both real-time bilingual subtitles and real-time broadcast audio functioned correctly!&lt;/p&gt;




&lt;h1&gt;
  
  
  Phase Four: Automated DevOps and GitHub Delivery
&lt;/h1&gt;

&lt;p&gt;Once the developer confirmed that the program was working correctly, the final step was to open-source and share the code:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;User&lt;/strong&gt; : I want to check in the swift-demo folder to my own GitHub repo. Give me a suggested repo name and write a README.md under swift-demo.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;User&lt;/strong&gt; : Help me commit all relevant changes in that folder to &lt;a href="mailto:git@github.com"&gt;git@github.com&lt;/a&gt;:kkdai/gemini-live-translate-macos.git&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;AGY CLI immediately took over the final DevOps tasks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;It recommended using &lt;code&gt;gemini-live-translate-macos&lt;/code&gt; as the Repo name and wrote the project's English GitHub description and topics tags.&lt;/li&gt;
&lt;li&gt;It automatically completed the full environment preparation, Xcode Sandbox Capabilities settings, command-line script execution steps, and API troubleshooting tips in &lt;a&gt;README.md&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;After obtaining the user's repository URL, AGY CLI proactively ran &lt;code&gt;git init&lt;/code&gt; in the background, wrote &lt;code&gt;.gitignore&lt;/code&gt;, committed all the code, and successfully pushed it to the remote GitHub repository!&lt;/li&gt;
&lt;/ol&gt;




&lt;h1&gt;
  
  
  Conclusion: Development Transformation and Insights
&lt;/h1&gt;

&lt;p&gt;Through this collaborative development with AGY CLI, we experienced an unprecedentedly rapid development process:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reduced cognitive load&lt;/strong&gt;: Developers only need to express their intentions in natural language (e.g., "help me run the installation," "help me troubleshoot why the connection is broken"), and the AI Agent will autonomously translate them into corresponding system commands and code modifications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Native system-level control&lt;/strong&gt;: AI can directly read and execute commands, synchronizing with the development environment in real-time, greatly reducing the hallucinations and environment version mismatches that often occurred with traditional Web AI Chat.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;One-stop delivery&lt;/strong&gt;: From the first phrase "think about how to do it" to the final "Push to GitHub repository" with a single click, AGY CLI seamlessly integrated the entire software engineering lifecycle.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This practical experience proves that in the era of Agentic AI, a single developer, paired with a powerful CLI agent, can deliver a high-quality Native application involving system-level foundations and the latest APIs in an extremely short amount of time. See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>productivity</category>
      <category>softwaredevelopment</category>
    </item>
    <item>
      <title>[GCP Practical] LINE Business Card Bot</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 07 Jun 2026 15:26:28 +0000</pubDate>
      <link>https://dev.to/gde/gcp-practical-line-business-card-bot-d46</link>
      <guid>https://dev.to/gde/gcp-practical-line-business-card-bot-d46</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71r8jl4qibup4fxxh4tv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F71r8jl4qibup4fxxh4tv.png" alt="image-20260607133454831" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Upgrade Preamble
&lt;/h1&gt;

&lt;p&gt;After refactoring the agent based on &lt;strong&gt;Vertex AI ADK&lt;/strong&gt;, our LINE Name Card Assistant Bot (&lt;code&gt;linebot-namecard-python&lt;/code&gt;) entered the production environment for testing. However, in real-world usage scenarios, we quickly identified three core pain points affecting user experience and security:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Unstable OCR JSON Parsing&lt;/strong&gt;: Using the standard JSON Mode with a Prompt, Gemini occasionally still outputs Markdown tags or misses fields, causing parser errors.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Excessive Search Results Leading to LINE API 400 Error&lt;/strong&gt;: LINE limits sending a maximum of 5 messages at a time. When search results include 5 cards plus the Agent's text reply, totaling 6, LINE directly rejects it and doesn't reply.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;AI Accidental Modification&lt;/strong&gt;: If a user mentions modification, the Agent directly writes to Firebase without secondary confirmation, easily leading to data corruption due to mishearing or hallucination.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This article will focus on sharing how we conducted a second wave of upgrades to address the above pain points, implementing &lt;strong&gt;Structured Outputs&lt;/strong&gt;, &lt;strong&gt;Disambiguation Lists&lt;/strong&gt;, &lt;strong&gt;Two-Stage Confirmation Mechanism&lt;/strong&gt;, and the major pitfall we encountered during operations and deployment regarding environment variable recovery!&lt;/p&gt;




&lt;h1&gt;
  
  
  Optimization One: Embracing Gemini Structured Outputs
&lt;/h1&gt;

&lt;p&gt;Previously, when calling &lt;code&gt;gemini-3-flash-preview&lt;/code&gt; for name card image parsing, we commanded it via Prompt and manually parsed JSON. To ensure 100% format guarantee, we introduced the native &lt;strong&gt;Structured Outputs&lt;/strong&gt; feature of the Vertex AI API.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Defining the Name Card Schema
&lt;/h3&gt;

&lt;p&gt;In &lt;a&gt;app/gemini_utils.py&lt;/a&gt;, we defined the constraint Schema for the name card object, forcing Gemini to strictly adhere to this format for output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;NAMECARD_SCHEMA&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;OBJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;properties&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STRING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;聯絡人姓名，如果看不出來，請填寫 N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STRING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;職稱或頭銜，如果看不出來，請填寫 N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STRING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;公司名稱，如果看不出來，請填寫 N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STRING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;公司或聯絡地址，如果看不出來，請填寫 N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STRING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;電話號碼，格式為 #886-0123-456-789,1234。&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;沒有分機就忽略 ,1234。如果看不出來，請填寫 N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STRING&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;description&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;電子郵件信箱，如果看不出來，請填寫 N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;required&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;phone&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Applying to Generation Config
&lt;/h3&gt;

&lt;p&gt;We only need to specify &lt;code&gt;response_schema&lt;/code&gt; in &lt;code&gt;generation_config&lt;/code&gt; when instantiating &lt;code&gt;GenerativeModel&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_json_from_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Image&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;object&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;GenerativeModel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-flash-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;generation_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_mime_type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_schema&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;NAMECARD_SCHEMA&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;img_part&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;pil_to_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;img&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;img_part&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After application, the JSON error rate of the returned response dropped directly to 0%, eliminating complex string cleaning and parser error-prevention logic.&lt;/p&gt;




&lt;h1&gt;
  
  
  Optimization Two: Solving LINE Message Limit with 'Disambiguation List'
&lt;/h1&gt;

&lt;p&gt;LINE Webhook has an iron rule: &lt;strong&gt;the number of message bubbles sent in a single &lt;code&gt;reply_message&lt;/code&gt; must be between 1 and 5&lt;/strong&gt;. If the search results happen to be 5 or more, and a text reply is added, the total will exceed 5, triggering a LINE API 400 error.&lt;/p&gt;

&lt;h3&gt;
  
  
  💡 Solution: Disambiguation List
&lt;/h3&gt;

&lt;p&gt;We modified the search reply judgment in &lt;a&gt;app/line_handlers.py&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  When search results are &lt;strong&gt;1 to 4 items&lt;/strong&gt;: Directly display Carousel detailed name cards (conforming to LINE's 5-item limit).&lt;/li&gt;
&lt;li&gt;  When search results are &lt;strong&gt;5 or more items&lt;/strong&gt;: Do not display large cards; instead, return a &lt;strong&gt;'Name Card Search List' Flex Message Bubble&lt;/strong&gt;. The list itemizes names and companies, with a 'View ❯' Postback button on the right. Clicking it loads and displays that specific name card.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This design not only maintains a clean layout but also completely avoids the pitfall of exceeding the message limit!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;found_card_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;found_card_ids&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# If the quantity is less than or equal to 4, directly display Carousel detailed name cards
&lt;/span&gt;                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;card_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;found_card_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;card_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;firebase_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_card_by_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;card_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;card_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;reply_msgs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                            &lt;span class="n"&gt;flex_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_namecard_flex_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;card_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;card_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# If the quantity is greater than 4, display as a list Flex Message for disambiguation
&lt;/span&gt;                &lt;span class="n"&gt;cards_list&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;card_id&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;found_card_ids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;card_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;firebase_utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_card_by_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;card_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;card_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;cards_list&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;card_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;card_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;card_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;card_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;company&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;card_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;N/A&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="p"&gt;})&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;cards_list&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;list_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;flex_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_namecard_list_flex_msg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;cards&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;cards_list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;title_text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;🔍 Found multiple matching name cards&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;reply_msgs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;list_msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Optimization Three: Contact Modification Safety Lock — Two-Stage Confirmation Mechanism
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe8rumozkznwk41fslmft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe8rumozkznwk41fslmft.png" alt="image-20260607133518906" width="800" height="1061"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Under the ADK agent architecture, users can update data through natural conversation (e.g., "Add 'Meeting next Monday' to Evan's memo"). However, if the LLM misinterprets the instruction, Firebase data can be directly overwritten.&lt;/p&gt;

&lt;p&gt;To address this, we implemented a &lt;strong&gt;Two-Stage Confirmation mechanism&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Delayed Write&lt;/strong&gt;: When the ADK Tool (&lt;code&gt;update_namecard_field&lt;/code&gt; and &lt;code&gt;update_namecard_memo&lt;/code&gt;) is invoked by the model, the system does not directly rewrite Firebase. Instead, it temporarily stores the content to be modified in &lt;code&gt;user_states&lt;/code&gt; in memory and returns &lt;code&gt;True&lt;/code&gt; to allow the Agent to continue generating dialogue.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Display Confirmation Card&lt;/strong&gt;: After the conversation ends, if the main program detects a pending state, it generates a Flex Message card containing 'Confirm Modification' and 'Cancel' buttons.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Write After Confirmation&lt;/strong&gt;: Only after the user clicks 'Confirm Modification' (sending a Postback Event &lt;code&gt;action=confirm_update&lt;/code&gt;) does the system truly write the data to Firebase.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This not only perfectly prevents AI from accidentally triggering tools but also gives users absolute control when modifying data!&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="c1"&gt;# Handle confirmation of modification in handle_postback_event
&lt;/span&gt;    &lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;action&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;confirm_update&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;state&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_states&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{})&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;action&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;pending_update&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;update_type&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;update_type&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;card_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;card_id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# Read data from temporary storage based on update_type, and truly write to Firebase...
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;success&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Reply with successful modification, and automatically display the updated Flex Card for user verification
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Ops Pitfall Record: Manual Deployment - The Mysterious Disappearance of Environment Variables
&lt;/h1&gt;

&lt;p&gt;In addition to code refactoring, we also encountered a significant operational pitfall during deployment.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Pitfall
&lt;/h3&gt;

&lt;p&gt;When we attempted to upload a local folder to Cloud Run using the MCP deployment tool locally, because the command did not include environment variable declaration parameters, the previously working LINE Token and Firebase URL on Cloud Run were all cleared and overwritten. Upon restart, the Container crashed directly with an error:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Specify ChannelSecret as environment variable.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The online service instantly became paralyzed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Recovery Process
&lt;/h3&gt;

&lt;p&gt;Fortunately, Cloud Run fully retains the configuration settings of older versions. We can use the &lt;code&gt;gcloud&lt;/code&gt; command to view previous Revisions and restore the lost variables:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Retrieve the detailed configuration of the last successfully running Revision&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run revisions describe linebot-namecard-python-00096-d89 &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;line-vertex &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will output the environment variable values bound to that version.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Re-inject environment variables into the service&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run services update linebot-namecard-python &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;line-vertex &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"ChannelAccessToken=...,ChannelSecret=..."&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By restoring the variables, we seamlessly recovered the service within minutes. This also reminds us: when manually deploying to Cloud Run, always pay extra attention to the inheritance or declaration of environment variables to avoid accidentally clearing the official cloud configuration.&lt;/p&gt;




&lt;h1&gt;
  
  
  Summary and Benefits
&lt;/h1&gt;

&lt;p&gt;This optimization brought excellent production-level transformations to our LINE Name Card Bot:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;100% Format Security&lt;/strong&gt;: Through API native Schema enforcement, the name card recognition format error rate dropped to 0%.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Explosion-Proof Reply Protection&lt;/strong&gt;: Multiple search results are automatically converted into a "Disambiguation List", perfectly complying with LINE's message limit.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Secure Contact Changes&lt;/strong&gt;: The two-stage confirmation mechanism confines AI's write access to a confirmation sandbox, protecting important user data.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Robust Configuration Disaster Recovery&lt;/strong&gt;: Utilizing gcloud historical Revision restoration technology ensures the service can quickly recover within a short period.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The complete and linter-optimized code has been pushed to &lt;a href="https://github.com/kkdai/linebot-namecard-python" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;. We hope this practical experience helps everyone avoid detours when building production-grade AI Agents! See you next time!&lt;/p&gt;

</description>
      <category>api</category>
      <category>gemini</category>
      <category>google</category>
      <category>python</category>
    </item>
    <item>
      <title>[Gemini][Agent] Google Managed Agents API</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Wed, 03 Jun 2026 01:01:36 +0000</pubDate>
      <link>https://dev.to/gde/geminiagent-google-managed-agents-api-4e43</link>
      <guid>https://dev.to/gde/geminiagent-google-managed-agents-api-4e43</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9nwsti79ib9ae970q7q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe9nwsti79ib9ae970q7q.png" alt="image-20260602220526732" width="800" height="542"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image Source: &lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/build/managed-agents" rel="noopener noreferrer"&gt;Google Cloud Docs - Managed Agents on Agent Platform&lt;/a&gt;)&lt;/p&gt;

&lt;h1&gt;
  
  
  Preamble: The era of hand-rolling your own agent loop is coming to an end
&lt;/h1&gt;

&lt;p&gt;In the past, if you wanted to build an AI agent that could truly " &lt;strong&gt;do things&lt;/strong&gt; ", the component list that came to mind probably looked something like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An LLM main loop (ReAct? Write your own state machine?)&lt;/li&gt;
&lt;li&gt;A sandbox to run LLM-generated code (Docker? Firecracker? E2B?)&lt;/li&gt;
&lt;li&gt;A filesystem to store intermediate files produced by the agent (S3? Local? Temporary or persistent?)&lt;/li&gt;
&lt;li&gt;A search API (Connect to Google Custom Search yourself? SerpAPI?)&lt;/li&gt;
&lt;li&gt;A page fetcher (playwright? readability-lxml?)&lt;/li&gt;
&lt;li&gt;A tool router to connect all of the above&lt;/li&gt;
&lt;li&gt;And only then, how to let the user continue the session&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And once the session broke, the &lt;code&gt;report.md&lt;/code&gt;, &lt;code&gt;sources.json&lt;/code&gt; that the agent was halfway through writing, and the venv that was halfway running, would all be gone. Nobody wants to do "I'll open a Docker for you, mount a volume, and remember to delete it in 7 days" again.&lt;/p&gt;

&lt;p&gt;These past few days, Google has turned this pipeline into " &lt;strong&gt;calling a managed API&lt;/strong&gt; " in Cloud Docs — &lt;a href="https://docs.cloud.google.com/gemini-enterprise-agent-platform/build/managed-agents" rel="noopener noreferrer"&gt;Gemini Enterprise Agent Platform&lt;/a&gt; launched the &lt;strong&gt;Managed Agents API&lt;/strong&gt; (internal codename Antigravity), which manages the sandbox, filesystem, and toolset entirely. Just pass an environment ID, and the agent's intermediate files from last time will still be waiting for you.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx0nba37wpvwlppthq9e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgx0nba37wpvwlppthq9e.png" alt="image-20260602220556522" width="645" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This article will do two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Break down the core capabilities clearly, including what the underlying &lt;code&gt;antigravity-preview-05-2026&lt;/code&gt; model is doing.&lt;/li&gt;
&lt;li&gt; Use an &lt;strong&gt;open-source&lt;/strong&gt; LINE Research Planner Bot (&lt;a href="https://github.com/kkdai/line-research-bot" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/line-research-bot&lt;/code&gt;&lt;/a&gt;) as a live demonstration to see how new features are combined in actual production code — and share the &lt;strong&gt;five&lt;/strong&gt; typical Pre-GA pitfalls I encountered during debugging to help you avoid them.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Three Key Core Capabilities
&lt;/h2&gt;

&lt;p&gt;According to the official documentation, the core of Managed Agents revolves around three things:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Persistent Sandbox + Filesystem
&lt;/h3&gt;

&lt;p&gt;In the past, code interpreter-like functions would restart a container with each call, losing all previously &lt;code&gt;pip install&lt;/code&gt;ed packages, written files, and half-open Python interpreters.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Each agent operates within a sandboxed environment … capable of reasoning, planning, executing code, web searching, and file operations.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now, if you make a second interaction &lt;strong&gt;with the same &lt;code&gt;environment_id&lt;/code&gt;&lt;/strong&gt;, the agent will see the &lt;code&gt;/workspace/&lt;/code&gt; from the previous session:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;/workspace/sources.json&lt;/code&gt; is still there&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;/workspace/report.md&lt;/code&gt; was half-written, this time it continues to modify it&lt;/li&gt;
&lt;li&gt;Packages like &lt;code&gt;markdown&lt;/code&gt; installed with &lt;code&gt;pip install&lt;/code&gt; last time don't need to be reinstalled&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For us product builders, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No need to maintain your own sandbox infrastructure&lt;/strong&gt; (Firecracker, microVM, expiration cleanup).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Agents can truly "complete a big task in multiple turns"&lt;/strong&gt;, instead of starting over each turn.&lt;/li&gt;
&lt;li&gt;A TTL of &lt;strong&gt;7 days&lt;/strong&gt;, during which any interaction automatically refreshes, meaning it stays alive as long as the user uses it once a week.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My LINE Bot relies on this for " &lt;strong&gt;progressive deepening&lt;/strong&gt; ": the user first says "research X" → the agent writes sources and a report in the sandbox; a few minutes later, the user says "Chapter 2, go deeper" → the agent reads back the original file, modifies Chapter 2, and rewrites it, all within the &lt;strong&gt;same sandbox and the same markdown file&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Built-in Tools
&lt;/h3&gt;

&lt;p&gt;When building an agent, you just list the tools you want, &lt;strong&gt;without having to connect to APIs yourself&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;# Python / bash / persistent venv
&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filesystem&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;# Read/write /workspace
&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;# Real Google Search, not Custom Search
&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;# Feed URL to automatically fetch content + extract
&lt;/span&gt;    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mcp_server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Any plug-in MCP server
&lt;/span&gt;     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;grep-search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
     &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://mcp.grep.app&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Several key observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;google_search&lt;/code&gt; is real Google&lt;/strong&gt;, not the basic version that requires you to customize a search engine ID + API key. The return format includes search suggestions and can be used for grounding.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;&lt;code&gt;url_context&lt;/code&gt; is equivalent to free readability + content extraction&lt;/strong&gt;, feed a URL and get the main text. No need to maintain another playwright fleet.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Native MCP support&lt;/strong&gt;: You can directly integrate any &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;Model Context Protocol&lt;/a&gt; server. The entire ecosystem is open.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  3. Multi-turn Session Chaining
&lt;/h3&gt;

&lt;p&gt;Each interaction returns an &lt;code&gt;id&lt;/code&gt;. When calling the next turn, pass it as &lt;code&gt;previous_interaction_id&lt;/code&gt;, and the agent will see the &lt;strong&gt;entire conversation history + sandbox state&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;r1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLAN ...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="c1"&gt;# Open a new sandbox
&lt;/span&gt;    &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# … poll until completed …
&lt;/span&gt;
&lt;span class="n"&gt;r2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SEARCH_COMPARE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# No need to restate context
&lt;/span&gt;    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Reuse sandbox
&lt;/span&gt;    &lt;span class="n"&gt;previous_interaction_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Connect history
&lt;/span&gt;    &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This design turns your backend into " &lt;strong&gt;only responsible for deciding what prompt to send each turn&lt;/strong&gt; ". Session state, conversation history, and file system are all server-side managed.&lt;/p&gt;




&lt;h2&gt;
  
  
  Two APIs: Agents for Control Plane, Interactions for Data Plane
&lt;/h2&gt;

&lt;p&gt;The documentation divides into two APIs, with clear responsibilities:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;API&lt;/th&gt;
&lt;th&gt;Path&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agents API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/projects/.../agents&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Create, update, delete agent settings (base_agent, tools, system_instruction)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Interactions API&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;/projects/.../interactions:create&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Interact with deployed agents&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Simply put: &lt;strong&gt;Agents = Configuration&lt;/strong&gt;, &lt;strong&gt;Interactions = Execution&lt;/strong&gt;. Creating an agent is a one-time task; running interactions is done every time a user message comes in. My LINE Bot only used the Agents API once during deployment to create the agent, and after that, Cloud Run only calls the Interactions API.&lt;/p&gt;

&lt;p&gt;The underlying base model is hardcoded as &lt;code&gt;antigravity-preview-05-2026&lt;/code&gt;, which is an agent-optimized version of the Gemini series (only this one is available during the Pre-GA preview period).&lt;/p&gt;




&lt;h2&gt;
  
  
  What Developers Truly Care About: Cost and Integration Cost
&lt;/h2&gt;

&lt;p&gt;This API is still in Pre-GA, and the official documentation emphasizes:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Antigravity is offered as Pre-General Availability software, which means it is not subject to any SLA or deprecation policy. Antigravity is not intended for production use or for use with sensitive data.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In plain language:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Cannot be used for production sensitive data&lt;/strong&gt; (for compliance scenarios, please wait for GA).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;No SLA&lt;/strong&gt;, the API shape might change someday.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Might be discontinued someday&lt;/strong&gt;, don't bet your company's life on it.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Billing is at standard Vertex AI rates&lt;/strong&gt;, with no additional sandbox runtime fees — this is super friendly for demos / internal tools / hackathons.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It's a very suitable entry point for personal side projects and POCs — you &lt;strong&gt;don't need to spend a month setting up sandbox infra yourself&lt;/strong&gt; to build an agent that can get things done. But don't throw enterprise customer data into it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Standard Workflow: 4 SDK Calls to Complete an Agent Interaction
&lt;/h2&gt;

&lt;p&gt;The minimum viable flow after organizing the official colab (&lt;a href="https://github.com/GoogleCloudPlatform/generative-ai/blob/main/agents/managed-agents/intro_managed_agents_python.ipynb" rel="noopener noreferrer"&gt;&lt;code&gt;intro_managed_agents_python.ipynb&lt;/code&gt;&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Enterprise mode client (this flag is crucial, will explain in pitfalls)
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enterprise&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-project&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Create agent (one-time, reusable)
&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;base_agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;antigravity-preview-05-2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Multi-stage research agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a research planner. The first line is the stage label PLAN/SEARCH/WRITE …&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code_execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;filesystem&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;google_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url_context&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. First interaction, open a new sandbox
&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PLAN&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;topic: Selection of SOTA open-source vector databases&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;remote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ⚠️ Must be True, will explain later
&lt;/span&gt;    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Continue with the same environment
&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SEARCH_COMPARE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environment_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;previous_interaction_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;r1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Connect history
&lt;/span&gt;    &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# poll for results
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;polled&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;polled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;status&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;polled&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;break&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No exaggeration, a &lt;strong&gt;multi-stage agent from scratch is less than 30 lines of code&lt;/strong&gt;. But the devil is in &lt;code&gt;background=True&lt;/code&gt; and that polling loop, which will be discussed in detail in the pitfalls section.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo Case: LINE Research Planner Bot
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw3mbf8icvpldu0h8h18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxw3mbf8icvpldu0h8h18.png" alt="image-20260602221558435" width="724" height="1456"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkndkeuaw3vqrkfv34ccp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkndkeuaw3vqrkfv34ccp.png" alt="image-20260602221619051" width="800" height="823"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;SDK examples alone are too abstract, so I built it into a working LINE Bot, open-sourced at &lt;a href="https://github.com/kkdai/line-research-bot" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/line-research-bot&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  The user sends a &lt;strong&gt;research topic&lt;/strong&gt; in the LINE chat box (e.g., "Research on the selection of SOTA open-source vector databases").&lt;/li&gt;
&lt;li&gt;  The Bot plans 4-8 search queries, runs google_search + url_context, compares sources, writes a report in Traditional Chinese, and publishes it as a public HTML link.&lt;/li&gt;
&lt;li&gt;  The user then sends " &lt;strong&gt;Chapter 2, go deeper, add Japanese sources&lt;/strong&gt; " → The Bot modifies the original file in the &lt;strong&gt;same sandbox&lt;/strong&gt;, re-renders it, and keeps a snapshot of the old version.&lt;/li&gt;
&lt;li&gt;  Deployment targets: GCP Cloud Run + Firestore + GCS + Cloud Tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture is very straightforward:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LINE Webhook&lt;/td&gt;
&lt;td&gt;FastAPI receives message events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Firestore&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;line_bot_users / line_bot_reports&lt;/code&gt; persistence&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Tasks&lt;/td&gt;
&lt;td&gt;Pushes long-running tasks from webhook to background worker (avoids LINE reply token 60-second limit)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Managed Agent&lt;/td&gt;
&lt;td&gt;Planning + Search comparison + Writing ( &lt;strong&gt;three-stage&lt;/strong&gt; chain)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cloud Run worker&lt;/td&gt;
&lt;td&gt;Renders markdown → HTML → Uploads to GCS ( &lt;strong&gt;Why not in the sandbox? Pitfall 2 will explain&lt;/strong&gt; )&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCS Bucket&lt;/td&gt;
&lt;td&gt;Public HTML hosting&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Comparing with the three core capabilities mentioned earlier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Persistent Sandbox&lt;/strong&gt;: The three stages PLAN → SEARCH_COMPARE → WRITE_REPORT are chained within the same &lt;code&gt;environment_id&lt;/code&gt;, and sources.json written once can be read by all three stages.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Built-in Tools&lt;/strong&gt;: The SEARCH_COMPARE stage uses &lt;code&gt;google_search&lt;/code&gt; + &lt;code&gt;url_context&lt;/code&gt;. The agent decides what to search, which pages to read, and how to summarize.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Multi-turn Session&lt;/strong&gt;: "Progressive deepening" directly uses &lt;code&gt;previous_interaction_id&lt;/code&gt; to continue from the last WRITE_REPORT, and the agent naturally understands "just modify that report".&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire repo is about 2,500 lines of Python (including tests), completing a " &lt;strong&gt;runnable, evolvable, traceable&lt;/strong&gt; research agent."&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Practice: Commit → Go Live Automatically
&lt;/h2&gt;

&lt;p&gt;It's not enough for the open-source example to just run; this time, the entire GCP infrastructure and CI/CD are integrated.&lt;/p&gt;

&lt;p&gt;I only provided the project ID + LINE secret, and it handled the rest end-to-end:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Enable 6 APIs&lt;/span&gt;
gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;aiplatform.googleapis.com run.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
    cloudtasks.googleapis.com firestore.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
    storage.googleapis.com secretmanager.googleapis.com

&lt;span class="c"&gt;# Create service account + assign 8 roles&lt;/span&gt;
gcloud iam service-accounts create line-bot-sa
&lt;span class="k"&gt;for &lt;/span&gt;role &lt;span class="k"&gt;in &lt;/span&gt;aiplatform.user datastore.user cloudtasks.enqueuer &lt;span class="se"&gt;\&lt;/span&gt;
            storage.objectAdmin secretmanager.secretAccessor &lt;span class="se"&gt;\&lt;/span&gt;
            iam.serviceAccountTokenCreator run.invoker logging.logWriter&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;gcloud projects add-iam-policy-binding line-vertex &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:line-bot-sa@line-vertex.iam.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
      &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/&lt;/span&gt;&lt;span class="nv"&gt;$role&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;None
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Secrets via stdin, no shell history&lt;/span&gt;
&lt;span class="nb"&gt;printf&lt;/span&gt; &lt;span class="s1"&gt;'%s'&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;LINE_TOKEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; | gcloud secrets create LINE_CHANNEL_ACCESS_TOKEN &lt;span class="nt"&gt;--data-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;-

&lt;span class="c"&gt;# Create Agent (one-time)&lt;/span&gt;
curl &lt;span class="nt"&gt;-sS&lt;/span&gt; &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Authorization: Bearer &lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;gcloud auth print-access-token&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"Content-Type: application/json"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;-d&lt;/span&gt; @agent-body.json &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="s2"&gt;"https://aiplatform.googleapis.com/v1beta1/projects/line-vertex/locations/global/agents"&lt;/span&gt;

&lt;span class="c"&gt;# Deploy Cloud Run&lt;/span&gt;
gcloud run deploy line-research-bot &lt;span class="nt"&gt;--source&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="nt"&gt;--timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;3600 &lt;span class="nt"&gt;--memory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;2Gi ...

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entire process took about 40 minutes — but &lt;strong&gt;30 of those minutes were spent chasing the five pitfalls described below&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfall Log: Five Pre-GA-Specific Issues
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall One: Synchronous Calls → Mysterious &lt;code&gt;RESOURCE_PROJECT_INVALID&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The first time I followed the doc and directly POSTed &lt;code&gt;interactions:create&lt;/code&gt; via REST, it returned this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Invalid resource field value in the request."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"INVALID_ARGUMENT"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"details"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RESOURCE_PROJECT_INVALID"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"service"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"aiplatform.googleapis.com"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}]&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I spent a full hour and a half wondering:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Project not allowlisted? (Couldn't find where to apply)&lt;/li&gt;
&lt;li&gt;  Use project number or ID? (Tried both, both wrong)&lt;/li&gt;
&lt;li&gt;  Change region? (All wrong)&lt;/li&gt;
&lt;li&gt;  Change agent? (All wrong)&lt;/li&gt;
&lt;li&gt;  Even &lt;code&gt;gemini-2.0-flash:generateContent&lt;/code&gt; returned &lt;code&gt;RESOURCE_PROJECT_INVALID&lt;/code&gt;!&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Until I carefully read the official colab and saw a line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;enterprise&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;...)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It differed from the &lt;code&gt;genai.Client()&lt;/code&gt; we used by one &lt;code&gt;enterprise=True&lt;/code&gt;. Then I ran the colab code and saw:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="p"&gt;...,&lt;/span&gt;
    &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;&lt;code&gt;background=True&lt;/code&gt;&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;I brought this back to REST: wrote SDK + background=True, and it immediately worked:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"error"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Chiliagon path must set background to true."&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;background&lt;/code&gt; was not included → 500 with a &lt;code&gt;Chiliagon&lt;/code&gt; message (this is an internal Google codename, not in the doc). If &lt;code&gt;enterprise=True&lt;/code&gt; was not included → routed to an old path not for Pre-GA → then returned &lt;code&gt;RESOURCE_PROJECT_INVALID&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: Pre-GA Managed Agents API currently &lt;strong&gt;only supports asynchronous calls&lt;/strong&gt;. Actual usage requires:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Using the &lt;code&gt;google-genai&lt;/code&gt; SDK with &lt;code&gt;enterprise=True&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;code&gt;interactions.create(background=True, store=True)&lt;/code&gt; to get an interaction ID&lt;/li&gt;
&lt;li&gt; &lt;code&gt;interactions.get(id)&lt;/code&gt; polling until &lt;code&gt;status == "completed"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Don't waste an hour stubbornly trying raw REST like I did.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall Two: &lt;code&gt;gsutil&lt;/code&gt; in the Sandbox is a &lt;strong&gt;Mock&lt;/strong&gt; (This one is the most insidious)
&lt;/h3&gt;

&lt;p&gt;My LINE Bot was originally designed for the agent to upload HTML to GCS itself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gsutil &lt;span class="nt"&gt;-h&lt;/span&gt; &lt;span class="s2"&gt;"Cache-Control:no-cache, max-age=0"&lt;/span&gt; &lt;span class="nb"&gt;cp&lt;/span&gt; /workspace/report.html &lt;span class="se"&gt;\&lt;/span&gt;
    gs://research-line/&lt;span class="o"&gt;{&lt;/span&gt;report_id&lt;span class="o"&gt;}&lt;/span&gt;/index.html
curl &lt;span class="nt"&gt;-sI&lt;/span&gt; https://storage.googleapis.com/research-line/&lt;span class="o"&gt;{&lt;/span&gt;report_id&lt;span class="o"&gt;}&lt;/span&gt;/index.html

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent finished happily and returned:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"report_id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"d4302f31..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"summary_500"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"This report focuses on mainstream open-source vector databases in 2026…"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"top_citations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"new_version"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LINE received the Flex card, clicked the button → &lt;strong&gt;404 NoSuchKey&lt;/strong&gt;. GCS was empty.&lt;/p&gt;

&lt;p&gt;I ran a diagnostic interaction to query the sandbox:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;interactions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;research-planner&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Run these and report verbatim:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1. echo &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;X&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt; &amp;gt; /tmp/diag.html&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2. gcloud auth list 2&amp;gt;&amp;amp;1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;3. gsutil cp /tmp/diag.html gs://research-line/probe.html 2&amp;gt;&amp;amp;1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;4. curl -sI https://storage.googleapis.com/research-line/probe.html&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5. gsutil ls gs://research-line/ 2&amp;gt;&amp;amp;1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Reply ONLY with: {&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;step1&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="se"&gt;\"&lt;/span&gt;&lt;span class="s"&gt;, ...}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ENV_ID&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;background&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The returned JSON made me jump out of my chair:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"step2"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"No credentialed accounts.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;To login, run:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt; $ gcloud auth login..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"step3"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mock gsutil: simulated copy to cp /tmp/diag.html gs://research-line/..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"step4"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"HTTP/2 200 OK&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"step5"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mock gsutil: simulated copy to ls gs://research-line/..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The sandbox has a fake command called "Mock gsutil"&lt;/strong&gt;, which returns "simulated copy" for any parameters and always pretends HTTP 200. &lt;code&gt;gcloud auth list&lt;/code&gt; showed &lt;strong&gt;no credentials&lt;/strong&gt;, so even if there was a real gsutil, it wouldn't have permission to write.&lt;/p&gt;

&lt;p&gt;At that moment, I finally understood — the Pre-GA sandbox &lt;strong&gt;does not provide any GCP authentication&lt;/strong&gt;. &lt;code&gt;gsutil&lt;/code&gt; is a placeholder behavior, and the agent doesn't know the upload failed (because curl also returned 200), so it happily reported success.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Completely refactor the architecture. The agent no longer attempts to upload; instead, the &lt;strong&gt;agent returns the complete markdown via the &lt;code&gt;report_md&lt;/code&gt; field&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# New system_instruction (excerpt)
&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
After writing /workspace/report.md, use code_execution to read it back
and return JSON:
{
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report_md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;full contents of /workspace/report.md&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
  &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;summary_500&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,
&lt;/span&gt;&lt;span class="gp"&gt;  ...&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="n"&gt;DO&lt;/span&gt; &lt;span class="n"&gt;NOT&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="n"&gt;gsutil&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt; &lt;span class="n"&gt;DO&lt;/span&gt; &lt;span class="n"&gt;NOT&lt;/span&gt; &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="n"&gt;curl&lt;/span&gt; &lt;span class="n"&gt;on&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;googleapis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;com&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="n"&gt;The&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt; &lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="n"&gt;handles&lt;/span&gt; &lt;span class="n"&gt;publishing&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;
&lt;span class="sh"&gt;"""&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then the Cloud Run worker, using a service account with real IAM, takes over:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/publisher.py
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.cloud&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GcsPublisher&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_bucket&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;storage&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;bucket&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;bucket_name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;report_md&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;snapshot_previous&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;snapshot_previous&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_snapshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;snapshot_previous&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;markdown&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;report_md&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;extensions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fenced_code&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tables&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;footnotes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_wrap_with_css&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;version&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;blob&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/index.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cache_control&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;no-cache, max-age=0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;blob&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_from_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;content_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text/html; charset=utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://storage.googleapis.com/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_bucket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;report_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/index.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clear division of responsibilities: &lt;strong&gt;the agent is responsible for thinking + writing; Cloud Run is responsible for infra&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Takeaway&lt;/strong&gt;: Do not assume the Pre-GA sandbox can access your GCP resources. For anything that needs to write to external systems, &lt;strong&gt;let the host service do it with a real SA&lt;/strong&gt;, and the agent only returns the payload. By the way, from the forum, it seems that after GA, the sandbox might provide ambient credentials, but not in Pre-GA.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall Three: Cloud Run's &lt;code&gt;/healthz&lt;/code&gt; is Intercepted by Google Frontend
&lt;/h3&gt;

&lt;p&gt;I wrote a &lt;code&gt;/healthz&lt;/code&gt; for Cloud Run health checks:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/healthz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;healthz&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After deployment, I called:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl https://line-research-bot-xxx.run.app/healthz

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It returned &lt;strong&gt;this&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="cp"&gt;&amp;lt;!DOCTYPE html&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;title&amp;gt;&lt;/span&gt;Error 404 (Not Found)!!1&lt;span class="nt"&gt;&amp;lt;/title&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;p&amp;gt;&amp;lt;b&amp;gt;&lt;/span&gt;404.&lt;span class="nt"&gt;&amp;lt;/b&amp;gt;&lt;/span&gt; The requested URL /healthz was not found on this server.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It was &lt;strong&gt;Google Frontend's 404 page&lt;/strong&gt;, not FastAPI's. But &lt;code&gt;/docs&lt;/code&gt;, &lt;code&gt;/webhook&lt;/code&gt;, &lt;code&gt;/openapi.json&lt;/code&gt; all worked. OpenAPI also listed the &lt;code&gt;GET /healthz&lt;/code&gt; route.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;/healthz&lt;/code&gt; is a &lt;strong&gt;special reserved path&lt;/strong&gt; in Cloud Run; Google Frontend intercepts it before the path even reaches the container.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Rename it to &lt;code&gt;/readyz&lt;/code&gt;. Solved in one second.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/readyz&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# /healthz was intercepted, renamed
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;readyz&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pitfall Four: Service Account Needs to &lt;code&gt;actAs&lt;/code&gt; &lt;strong&gt;Itself&lt;/strong&gt; for Cloud Tasks OIDC to Sign
&lt;/h3&gt;

&lt;p&gt;When pushing tasks from the webhook to Cloud Tasks, the task kept dispatching 0 times + dispatchDeadline expired. Cloud Run logs showed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;PERMISSION_DENIED: The principal lacks IAM permission "iam.serviceAccounts.actAs"
for the resource "line-bot-sa@line-vertex.iam.gserviceaccount.com"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I thought giving the SA &lt;code&gt;iam.serviceAccountTokenCreator&lt;/code&gt; was enough, right? &lt;strong&gt;Not enough&lt;/strong&gt;. Cloud Tasks needs to sign an OIDC token for the callback, which requires the SA to have &lt;code&gt;actAs&lt;/code&gt; permission for " &lt;strong&gt;itself&lt;/strong&gt; ":&lt;/p&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
shell
gcloud iam service-accounts add-iam-policy-binding \
    line
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>agents</category>
      <category>api</category>
      <category>gemini</category>
      <category>google</category>
    </item>
    <item>
      <title>Using Google's New AI Command-Line Assistant: Antigravity CLI (agy) and YOLO's No-Confirmation Mode</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Wed, 27 May 2026 09:44:36 +0000</pubDate>
      <link>https://dev.to/gde/using-googles-new-ai-command-line-assistant-antigravity-cli-agy-and-yolos-no-confirmation-mode-10d</link>
      <guid>https://dev.to/gde/using-googles-new-ai-command-line-assistant-antigravity-cli-agy-and-yolos-no-confirmation-mode-10d</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxwgbpizbh9swaodhn2y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvxwgbpizbh9swaodhn2y.png" alt="image-20260526212403006" width="769" height="420"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;With generative AI entering daily development, the AI assistant in the terminal has also ushered in an epic update! If you are a loyal supporter of the original Gemini CLI, you may already know that this tool will be officially retired on &lt;strong&gt;June 18, 2026&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Taking over the torch of this era is Google's stunning launch at I/O 2026, the next-generation lightweight, Go language-driven multi-agent terminal UI assistant —— &lt;strong&gt;Antigravity CLI (called &lt;code&gt;agy&lt;/code&gt; in the terminal)&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;However, the launch of new tools is always accompanied by various pitfalls and surprises. This article will focus on &lt;strong&gt;Antigravity CLI (agy)&lt;/strong&gt;, revealing how to deal with the "invisible color scheme hell", how to enable the addictive &lt;strong&gt;YOLO no-confirmation frenzy mode&lt;/strong&gt;, and those terminal black technologies and setting secrets hidden deep within &lt;code&gt;settings.json&lt;/code&gt;!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ga3z2bl9040httext94.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0ga3z2bl9040httext94.png" alt="image-20260526212808645" width="610" height="362"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h1&gt;
  
  
  🛠️ Step 1: Antigravity CLI (agy) Color Scheme Savior for Invisible Text!
&lt;/h1&gt;

&lt;p&gt;When installing and launching &lt;code&gt;agy&lt;/code&gt; for the first time, the first blow that many developers accustomed to macOS / Linux dark background terminals usually face is: &lt;strong&gt;"The font is all black, and the text is completely invisible!"&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is because the agy default configuration file may be configured with a light (Light) theme. We don't need to compromise and change our favorite terminal background, just modify &lt;code&gt;settings.json&lt;/code&gt; and it can be saved with one click!&lt;/p&gt;

&lt;h3&gt;
  
  
  🛠️ Steps to Fill the Pit
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Find the global configuration file for Antigravity CLI, the path is usually: &lt;code&gt;~/.gemini/antigravity-cli/settings.json&lt;/code&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the &lt;code&gt;"colorScheme"&lt;/code&gt; setting value from &lt;code&gt;"light"&lt;/code&gt; to &lt;code&gt;"dark"&lt;/code&gt;:&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;After saving the file and restarting the terminal, all outputs will automatically convert to a high-contrast dark mode color scheme, and your eyes will be saved instantly!&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h1&gt;
  
  
  🔥 The Main Event: YOLO Mode —— Unlock the "No Confirmation" Ultimate Move for Unlimited Automated Execution
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmd1ril0pft5r8wos2ip.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgmd1ril0pft5r8wos2ip.png" alt="image-20260526212326057" width="800" height="150"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When using AI to write code, the most annoying thing is that every time you make a file modification and execute a &lt;code&gt;git&lt;/code&gt; command, the CLI will pop up a question: "Are you sure you want to perform this operation? (y/N)". This is simply a double torment of fingers and spirit when performing large-scale refactoring or batch tasks.&lt;/p&gt;

&lt;p&gt;To this end, agy provides two levels of &lt;strong&gt;YOLO (You Only Live Once) no-confirmation automatic execution mode&lt;/strong&gt;, allowing AI to smoothly and continuously execute autonomously until the task is completed:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. ⚡ Extreme YOLO: &lt;code&gt;--dangerously-skip-permissions&lt;/code&gt; Parameter
&lt;/h3&gt;

&lt;p&gt;If you are in a completely isolated and secure sandbox environment, or have 100% confidence in the instructions generated by AI, you can add this ultimate move when starting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;agy --dangerously-skip-permissions

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Once this Flag is added, agy will completely skip all tool authorization and command execution confirmation prompts, and enter an "all the way to the top" automatic execution state. Suitable for letting it run complex automated tests or file migrations on its own!&lt;/p&gt;

&lt;h3&gt;
  
  
  2. 🛡️ Moderate Control: &lt;code&gt;/permissions&lt;/code&gt; Fine-grained Settings
&lt;/h3&gt;

&lt;p&gt;If you don't want to risk the AI executing &lt;code&gt;rm -rf&lt;/code&gt;, you can directly enter &lt;code&gt;/permissions&lt;/code&gt; in the CLI or directly modify &lt;code&gt;settings.json&lt;/code&gt;. Through a whitelist mechanism, only specific commands or paths are automatically approved:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{
  "permissions": {
    "allow": [
      "read_file(/Users/al03034132/Documents)",
      "command(git)",
      "command(npm test)"
    ],
    "deny": [
      "command(rm -rf)"
    ]
  }
}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way, you can allow Git operations and unit tests to enter the YOLO no-confirmation state, while also ensuring the security of the core file system!&lt;/p&gt;




&lt;h1&gt;
  
  
  🤫 Those Unknown agy Black Technologies and Setting Secrets
&lt;/h1&gt;

&lt;p&gt;As Google's latest official Code-first agent weapon, agy also has several hidden functions deep within the configuration file and commands, which are rarely seen in newspapers:&lt;/p&gt;

&lt;h3&gt;
  
  
  🧩 1. Asynchronous Subagents
&lt;/h3&gt;

&lt;p&gt;This is definitely agy's most revolutionary multi-agent architecture! You can directly call multiple subagents in the terminal to run complex tasks in the background:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  For example: one subagent goes online to check the latest API documentation, one runs unit tests in the background, and one performs code refactoring.&lt;/li&gt;
&lt;li&gt;  And your main terminal will not be blocked at all! You can enter &lt;code&gt;/agents&lt;/code&gt; to monitor the health and execution progress of all subagents in the background at any time.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  🧠 2. Change Brains at Any Time: &lt;code&gt;/model&lt;/code&gt; Secret
&lt;/h3&gt;

&lt;p&gt;agy not only supports the Gemini series models on Vertex AI, but if you need to, you can also use the built-in &lt;code&gt;/model&lt;/code&gt; slash command to directly switch seamlessly between Gemini, Claude, and even other open-source models with one click, helping you verify the same bug with different thinking models, which is super convenient!&lt;/p&gt;

&lt;h3&gt;
  
  
  🛡️ 3. Multi-Operating System Level Security Sandbox (Terminal Sandbox)
&lt;/h3&gt;

&lt;p&gt;In order to prevent AI from running out of control malicious code in YOLO mode, agy silently implements operating system level sandbox protection at the bottom!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;nsjail&lt;/code&gt; isolation will be automatically enabled on Linux.&lt;/li&gt;
&lt;li&gt;  macOS will automatically call the system's native &lt;code&gt;sandbox-exec&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  Even if AI writes a script that pollutes the file system, it will be perfectly confined in the sandbox and unable to move!&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  📦 4. Upgrade Old Things: Seamless Migration Mechanism from Gemini CLI
&lt;/h3&gt;

&lt;p&gt;Although Gemini CLI has gone down in history, agy has thoughtfully designed a "one-click import tool". When you start agy for the first time, it will automatically scan the old configuration path and perfectly align and migrate your original plugins, custom skills, and &lt;code&gt;settings.json&lt;/code&gt; accumulated in Gemini CLI!&lt;/p&gt;




&lt;h1&gt;
  
  
  Summary and Suggestions
&lt;/h1&gt;

&lt;p&gt;The upgrade from Gemini CLI to &lt;strong&gt;Antigravity CLI (agy)&lt;/strong&gt; is not just a change in the command line name, but a leap forward from single-model question and answer to &lt;strong&gt;Multi-Agent Workflows&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;By properly setting &lt;code&gt;permissions&lt;/code&gt; in &lt;code&gt;settings.json&lt;/code&gt;, combined with the no-confirmation function of YOLO mode, developers can allow AI to automatically and smoothly complete various medium and large tasks while ensuring the security of the host.&lt;/p&gt;

&lt;p&gt;Quickly open your terminal and enter &lt;code&gt;agy --dangerously-skip-permissions&lt;/code&gt; to experience this futuristic development artifact! See you next time for the actual combat!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cli</category>
      <category>google</category>
      <category>productivity</category>
    </item>
    <item>
      <title>GCP: Upgrading a LINE Bot with Vertex AI ADK Tools for Smart Business Cards and Backup Search</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Wed, 27 May 2026 09:44:24 +0000</pubDate>
      <link>https://dev.to/gde/gcp-upgrading-a-line-bot-with-vertex-ai-adk-tools-for-smart-business-cards-and-backup-search-3dpe</link>
      <guid>https://dev.to/gde/gcp-upgrading-a-line-bot-with-vertex-ai-adk-tools-for-smart-business-cards-and-backup-search-3dpe</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7e3kgauizlnidap7vw7u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7e3kgauizlnidap7vw7u.png" alt="image-20260526210750701" width="677" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Preface
&lt;/h1&gt;

&lt;p&gt;In the previous article, we successfully upgraded the LINE business card assistant robot (&lt;code&gt;linebot-namecard-python&lt;/code&gt;) from the AI Studio API Key verification mode to the enterprise-grade &lt;strong&gt;Google Cloud Vertex AI&lt;/strong&gt; mechanism, completely freeing us from the 429 quota anxiety.&lt;/p&gt;

&lt;p&gt;However, the original method of searching for business cards had significant limitations: &lt;strong&gt;We had to first fetch all the user's business cards from Firebase, package them into a huge JSON array, and then stuff them into the prompt, asking Gemini to select the most relevant business card object to return&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This approach has three major drawbacks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Token Waste&lt;/strong&gt;: With many business cards, each search is a ruthless blow to the token balance.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Lack of Flexibility&lt;/strong&gt;: The model can only search passively; it cannot proactively ask for details or perform data updates.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Unable to Link Operations&lt;/strong&gt;: If the user says, "Help me change David Wang's phone number," we have to write a bunch of complex NLP judgments and branches in the Webhook.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To solve these pain points, we decided to refactor the robot and embrace Google Cloud's latest, powerful, and code-friendly &lt;strong&gt;Agent Development Kit (ADK)&lt;/strong&gt;!&lt;/p&gt;

&lt;p&gt;This article will share with you how we completely refactored Firebase access into &lt;strong&gt;ADK Tools&lt;/strong&gt;, implemented dynamic closures, and the various top-tier blood and tears pitfalls we encountered during deployment on Cloud Run and with the Antigravity CLI tool!&lt;/p&gt;




&lt;h1&gt;
  
  
  Architecture Upgrade: Why Choose ADK and Tools?
&lt;/h1&gt;

&lt;p&gt;&lt;strong&gt;Agent Development Kit (ADK)&lt;/strong&gt; is a code-first agent development framework launched by Google Cloud. Previously, in order for us to allow large models to call external APIs, we had to manually write long OpenAPI schemas or complex function-calling descriptions; ADK simplifies all of this into simple Python functions!&lt;/p&gt;

&lt;p&gt;We planned five core data operation functions for the business card Agent and registered them as &lt;strong&gt;Tools&lt;/strong&gt; of the Agent in the form of &lt;strong&gt;Python functions&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;code&gt;get_all_namecards()&lt;/code&gt;: Reads the list of all business cards (including IDs) for the current user.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;get_namecard_by_id(card_id)&lt;/code&gt;: Retrieves the detailed content of a specific business card.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;display_namecard(card_id)&lt;/code&gt;: The core tool! Called when the model matches a business card, used to tell the Python main program "it's time to display this business card on the screen".&lt;/li&gt;
&lt;li&gt; &lt;code&gt;update_namecard_memo(card_id, memo)&lt;/code&gt;: Updates the business card memo.&lt;/li&gt;
&lt;li&gt; &lt;code&gt;update_namecard_field(card_id, field, value)&lt;/code&gt;: Directly updates the specified fields of the business card (name, phone, email, etc.) in natural language.&lt;/li&gt;
&lt;/ol&gt;




&lt;h1&gt;
  
  
  Core Code Rewrite: Dynamic Closure Tools Implementation
&lt;/h1&gt;

&lt;p&gt;In Webhook development, the most important thing is &lt;strong&gt;security&lt;/strong&gt;. We absolutely cannot allow user A to search or modify user B's business cards.&lt;/p&gt;

&lt;p&gt;Therefore, we cannot implement static, global Database Tools. Instead, in &lt;code&gt;handle_smart_query&lt;/code&gt;, we dynamically create exclusive Tools for each conversation request through the &lt;strong&gt;closure mechanism&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This approach not only perfectly binds the user's &lt;code&gt;user_id&lt;/code&gt; but also utilizes the &lt;code&gt;found_card_ids&lt;/code&gt; list in the closure to perfectly collect "all business card IDs that the model wants to present to the user" during the decision-making process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def make_adk_tools(user_id: str, found_card_ids: list):
    """Dynamically create exclusive Firebase data access and operation tools for a specific user"""
    def get_all_namecards() -&amp;gt; list[dict]:
        """Get the list of all business card data in the Firebase database for the current user.
        Each business card data contains a unique card_id field."""
        cards_dict = firebase_utils.get_all_cards(user_id)
        all_cards_list = []
        for card_id, card_data in cards_dict.items():
            card_data_with_id = card_data.copy()
            card_data_with_id['card_id'] = card_id
            all_cards_list.append(card_data_with_id)
        return all_cards_list

    def get_namecard_by_id(card_id: str) -&amp;gt; dict:
        """Get the detailed fields and data of a single business card through a specific card_id."""
        return firebase_utils.get_card_by_id(user_id, card_id)

    def display_namecard(card_id: str) -&amp;gt; str:
        """Display a specific business card to the user.
        When a business card matching the search is found, be sure to call this tool."""
        if card_id not in found_card_ids:
            found_card_ids.append(card_id)
        return f"已將名片 ID 標記為顯示：{card_id}"

    def update_namecard_memo(card_id: str, memo: str) -&amp;gt; bool:
        """Update the memo/note information of a specific business card."""
        return firebase_utils.update_namecard_memo(card_id, user_id, memo)

    def update_namecard_field(card_id: str, field: str, value: str) -&amp;gt; bool:
        """Update the specified field of a specific business card (optional fields: name, title, company, address, phone, email)."""
        return firebase_utils.update_namecard_field(
            user_id, card_id, field, value
        )

    return [
        get_all_namecards,
        get_namecard_by_id,
        display_namecard,
        update_namecard_memo,
        update_namecard_field
    ]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Refactored Main Webhook Logic (&lt;code&gt;handle_smart_query&lt;/code&gt;)
&lt;/h3&gt;

&lt;p&gt;Now, when LINE receives a text query, we only need to pass the message to the ADK &lt;code&gt;Runner&lt;/code&gt; to run once. Once the Agent decides to call &lt;code&gt;display_namecard&lt;/code&gt;, we combine the &lt;strong&gt;Agent's friendly Chinese explanation (text reply)&lt;/strong&gt; with the &lt;strong&gt;business card Flex Message (the entire business card)&lt;/strong&gt; in the LINE reply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;async def handle_smart_query(event: MessageEvent, user_id: str, msg: str):
    found_card_ids = []
    tools = make_adk_tools(user_id, found_card_ids)

    # 1. Create an ADK Agent equipped with exclusive Tools
    agent = Agent(
        name="namecard_agent",
        model="gemini-3-flash-preview",
        instruction=(
            "You are a smart and friendly LINE business card assistant. Your job is to help users manage their business card data.\n"
            "You can use the appropriate tools to read or modify business card records in the Firebase database.\n\n"
            "【Core Operation Guidelines】\n"
            "1. 【Query】When a user queries for someone's or a company's business card, please first call get_all_namecards to get all the data and perform analysis and comparison in the background.\n"
            "2. 【Display】As long as a business card that meets the conditions is found, 『must』 call the display_namecard tool to mark the card_id of that business card for display, so that the system can draw and present it on the LINE screen.\n"
            "3. 【Modify】If the user wants to modify a business card (e.g., phone number, Email, memo), please first compare and find the card_id, and then call the corresponding update tool (such as update_namecard_field or update_namecard_memo) to make the modification. After the modification is successful, 『must』 call display_namecard again to display the updated business card, allowing the user to confirm.\n"
            "4. 【Reply】Finally, please reply to the user with a friendly and concise traditional Chinese tone about the operation results or search progress."
        ),
        tools=tools,
    )

    # 2. Execute the Runner with an in-memory Session
    runner = Runner(
        app_name="namecard_bot_app",
        agent=agent,
        session_service=InMemorySessionService()
    )

    try:
        events = await runner.run_debug(
            msg, user_id=user_id, session_id=user_id
        )

        # Combine the Agent's text reply
        final_text = ""
        for ev in events:
            if ev.content and ev.content.parts:
                for part in ev.content.parts:
                    if part.text:
                        final_text += part.text

        final_text = final_text.strip() or "為您完成處理。"

        reply_msgs = [TextSendMessage(
            text=final_text,
            quick_reply=get_quick_reply_items()
        )]

        # 3. Get the business cards marked for display by the Agent and convert them to Flex Messages
        if found_card_ids:
            for card_id in found_card_ids[:5]:
                card_data = firebase_utils.get_card_by_id(user_id, card_id)
                if card_data:
                    reply_msgs.append(
                        flex_messages.get_namecard_flex_msg(card_data, card_id)
                    )

        await line_bot_api.reply_message(event.reply_token, reply_msgs)

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Blood and Tears Pitfalls During the Migration Process
&lt;/h1&gt;

&lt;p&gt;The refactoring process cannot be smooth sailing. In this upgrade, we encountered three top-tier deep pits, each of which almost prevented the online container from providing services. Here is valuable pit-filling experience:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Uvicorn Crashes the Event Loop at Startup
&lt;/h3&gt;

&lt;p&gt;When we excitedly pushed the container containing &lt;code&gt;google-adk&lt;/code&gt; onto Cloud Run, the deployment failed due to a health check timeout at the last moment! Checking the GCP Log, we were greeted with this heartbreaking RuntimeError:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  File "/app/app/bot_instance.py", line 7, in &amp;lt;module&amp;gt;
    session = aiohttp.ClientSession()
  File "/usr/local/lib/python3.10/site-packages/aiohttp/client.py", line 321, in __init__
    loop = loop or asyncio.get_running_loop()
RuntimeError: no running event loop

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: Under the new dependency environment, &lt;code&gt;app/bot_instance.py&lt;/code&gt; directly instantiated &lt;code&gt;aiohttp.ClientSession()&lt;/code&gt; globally when it was imported (Import Time). However, at this time, Uvicorn's asyncio Event Loop had not even started! This caused &lt;code&gt;aiohttp&lt;/code&gt; to throw an exception and crash directly because it couldn't find a running event loop.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: We designed a lazy-load &lt;code&gt;LazyLineBotApi&lt;/code&gt; wrapper, delaying the creation of &lt;code&gt;ClientSession&lt;/code&gt; and &lt;code&gt;AsyncLineBotApi&lt;/code&gt; until the first LINE Webhook request comes in (at this time, the Event Loop must be running), perfectly avoiding the Import Time initialization crash:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;class LazyLineBotApi:
    def __init__ (self):
        self._api = None
        self.session = None

    def _get_api(self):
        if self._api is None:
            self.session = aiohttp.ClientSession()
            async_http_client = AiohttpAsyncHttpClient(self.session)
            self._api = AsyncLineBotApi(
                config.CHANNEL_ACCESS_TOKEN, async_http_client
            )
        return self._api

    def __getattr__ (self, name):
        return getattr(self._get_api(), name)

line_bot_api = LazyLineBotApi()

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Pitfall 2: GCP's Default &lt;code&gt;GOOGLE_CLOUD_LOCATION&lt;/code&gt; and Region 404
&lt;/h3&gt;

&lt;p&gt;After successfully starting the container, we tried entering text in LINE, but saw a big red error again in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error executing ADK smart query: 404 NOT_FOUND. 
Publisher Model `projects/line-vertex/locations/asia-east1/publishers/google/models/gemini-3-flash-preview` was not found.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: Because our Cloud Run service is deployed in Taiwan (&lt;code&gt;asia-east1&lt;/code&gt;), GCP will automatically inject &lt;code&gt;GOOGLE_CLOUD_LOCATION=asia-east1&lt;/code&gt; into the environment variables. However, in the Vertex AI ecosystem, many of the latest and most powerful models (such as &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;) &lt;strong&gt;only provide services in the &lt;code&gt;global&lt;/code&gt; region&lt;/strong&gt;! When the underlying SDK of ADK automatically reads &lt;code&gt;asia-east1&lt;/code&gt; to search for models, it will naturally throw a 404.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: We directly override the environment variable at the first moment in the system's configuration entry &lt;a&gt;app/config.py&lt;/a&gt;, directing all Vertex AI model searches to the &lt;code&gt;global&lt;/code&gt; region:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Force GOOGLE_CLOUD_LOCATION to global so that Vertex AI and ADK look
# for models in the global region
os.environ["GOOGLE_CLOUD_LOCATION"] = "global"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h3&gt;
  
  
  Pitfall 3: Insurance Mechanism in Extreme Situations - Local Keyword Backup Search
&lt;/h3&gt;

&lt;p&gt;After the user's LINE bot goes live, any API quota explosion or network timeout should not cause the user to see a cold "server failure". To guarantee production-level SLA, we added a seamless &lt;strong&gt;keyword search backup mechanism (Local Keyword Fallback)&lt;/strong&gt; in the &lt;code&gt;except&lt;/code&gt; block of &lt;code&gt;handle_smart_query&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If Vertex AI or ADK encounters any exceptions during execution, the system will automatically enable Firebase local keyword matching in the background, still perfectly returning matching business card Flex messages, providing the user with the most elegant protection net:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    except Exception as e:
        print(f"Error executing ADK smart query: {e}")
        # Backup search mechanism: When Vertex AI or ADK API is abnormal, automatically enable local keyword filtering search to ensure service continuity
        try:
            all_cards_dict = firebase_utils.get_all_cards(user_id)
            fallback_matches = []
            if all_cards_dict:
                for card_id, card_data in all_cards_dict.items():
                    name = card_data.get("name", "").lower()
                    company = card_data.get("company", "").lower()
                    query_lower = msg.lower()
                    if query_lower in name or query_lower in company:
                        fallback_matches.append((card_id, card_data))

            if fallback_matches:
                reply_msgs = [TextSendMessage(
                    text="「智慧搜尋」服務暫時無法取得，"
                         "已自動啟用「關鍵字備援搜尋」為您找到以下相關名片：",
                    quick_reply=get_quick_reply_items()
                )]
                for card_id, card_data in fallback_matches[:5]:
                    reply_msgs.append(
                        flex_messages.get_namecard_flex_msg(card_data, card_id)
                    )
                await line_bot_api.reply_message(event.reply_token, reply_msgs)
                return
        except Exception as fallback_err:
            print(f"Fallback search also failed: {fallback_err}")

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h1&gt;
  
  
  Summary and Benefits
&lt;/h1&gt;

&lt;p&gt;After refactoring into an &lt;strong&gt;ADK Agent + Tools&lt;/strong&gt; architecture, it brought amazing substantial changes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Extreme Token Saving&lt;/strong&gt;: The model only calls &lt;code&gt;get_all_namecards&lt;/code&gt; when it needs to read business cards, and general conversations no longer need to repeatedly transmit huge JSON data.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Multi-step Natural Dialogue Linking&lt;/strong&gt;: The user only needs to type "Help me change David Wang's memo to 'Meeting next Monday'", and the model will automatically and continuously call &lt;code&gt;get_all_namecards()&lt;/code&gt; -&amp;gt; find the ID -&amp;gt; call &lt;code&gt;update_namecard_memo(id, ...)&lt;/code&gt; -&amp;gt; and then call &lt;code&gt;display_namecard(id)&lt;/code&gt; to show the latest results.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Code Quality Leap&lt;/strong&gt;: In this refactoring, we also strictly controlled through &lt;code&gt;flake8&lt;/code&gt;, completing 100% clean code formatting and zero-warning compilation.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The complete and linter-optimized code has been pushed to &lt;a href="https://github.com/kkdai/linebot-namecard-python" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; simultaneously. I hope this dynamic closure design and Cloud Run, Event Loop pit-filling practice can help everyone avoid more detours when building production-level AI Agent Web applications! See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>google</category>
      <category>python</category>
    </item>
    <item>
      <title>[Workshop][Gemini CLI] Building with AI 2026: Hands-on with Gemini CLI and Official MCP to Launch a Google Drive LINE Bot from Scratch</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 15 May 2026 00:45:26 +0000</pubDate>
      <link>https://dev.to/gde/workshopgemini-cli-building-with-ai-2026-hands-on-with-gemini-cli-and-official-mcp-to-launch-a-296d</link>
      <guid>https://dev.to/gde/workshopgemini-cli-building-with-ai-2026-hands-on-with-gemini-cli-and-official-mcp-to-launch-a-296d</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq8pxwsxuv84cm1bs358.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhq8pxwsxuv84cm1bs358.png" alt="image-20260514235640672" width="800" height="600"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Event: &lt;a href="https://developers.google.com/community/gdg" rel="noopener noreferrer"&gt;Build with AI 2026 @ Google Taipei 101&lt;/a&gt; / Presentation: &lt;a href="https://speakerdeck.com/line_developers_tw/20260514-build-with-ai-2026-build-line-bot-with-gemini-cli" rel="noopener noreferrer"&gt;SpeakerDeck&lt;/a&gt; / Materials: &lt;a href="https://github.com/kkdai/BwAI-2026" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/BwAI-2026&lt;/code&gt;&lt;/a&gt; / Example: &lt;a href="https://github.com/kkdai/bwai2026-sample" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/bwai2026-sample&lt;/code&gt;&lt;/a&gt;)&lt;/p&gt;

&lt;h1&gt;
  
  
  Background: When the CLI Becomes a "Thinking Colleague"
&lt;/h1&gt;

&lt;p&gt;After Google I/O in 2026, Gemini CLI is no longer just another terminal toy that packages LLM, but a development tool that &lt;strong&gt;can mount MCPs, plan on its own, run &lt;code&gt;gcloud&lt;/code&gt; on its own, and stop to ask you when it doesn't understand&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In this &lt;strong&gt;Build with AI 2026&lt;/strong&gt; workshop, I compressed this tool flow into two hands-on sessions:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Workshop 1: Environment Preparation + Two Essential Official MCPs&lt;/strong&gt; — Connecting Gemini CLI to Google's official knowledge and Maps Platform.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Workshop 2: Tell Gemini CLI a Sentence and Deploy a LINE Bot to Cloud Run&lt;/strong&gt; — No more hand-typing that long and painful &lt;code&gt;gcloud run deploy ...&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The entire teaching material has been open-sourced at &lt;a href="https://github.com/kkdai/BwAI-2026" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/BwAI-2026&lt;/code&gt;&lt;/a&gt;, the example project is at &lt;a href="https://github.com/kkdai/bwai2026-sample" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/bwai2026-sample&lt;/code&gt;&lt;/a&gt;, and the event slides are on &lt;a href="https://speakerdeck.com/line_developers_tw/20260514-build-with-ai-2026-build-line-bot-with-gemini-cli" rel="noopener noreferrer"&gt;SpeakerDeck&lt;/a&gt;. This is the full text version of the on-site walkthrough, including the three pitfalls we encountered on stage that day.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Gemini CLI + MCP? First, Look at the Timeline
&lt;/h2&gt;

&lt;p&gt;The update pace of Gemini API and its ecosystem has been very dense in the past year:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Time&lt;/th&gt;
&lt;th&gt;New Stuff&lt;/th&gt;
&lt;th&gt;Impact on Workflow&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2025/08&lt;/td&gt;
&lt;td&gt;Gemini YouTube Video Understanding&lt;/td&gt;
&lt;td&gt;Directly feed URLs of videos to the model&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025/11&lt;/td&gt;
&lt;td&gt;Gemini File Search&lt;/td&gt;
&lt;td&gt;Managed RAG, no need to connect your own vector DB&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025/12&lt;/td&gt;
&lt;td&gt;Google Search Grounding (Vertex)&lt;/td&gt;
&lt;td&gt;Model answers can be grounded to search results&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2025/12&lt;/td&gt;
&lt;td&gt;Maps Grounding &amp;amp; Maps Platform Assist MCP&lt;/td&gt;
&lt;td&gt;Native map scenarios&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026/02&lt;/td&gt;
&lt;td&gt;Google Developer Knowledge API + MCP Server&lt;/td&gt;
&lt;td&gt;Official documentation becomes a tool queryable by LLM&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2026/03&lt;/td&gt;
&lt;td&gt;Gemini 3 Flash + Tool Combo&lt;/td&gt;
&lt;td&gt;Single call chains multiple grounding tools&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Core Observation&lt;/strong&gt;: Google has made each new capability into an &lt;strong&gt;MCP Server&lt;/strong&gt;, which means that Gemini CLI can upgrade the IDE from "an LLM that can write code" to "an LLM that can write code using Google's official resources" with just one line of &lt;code&gt;gemini mcp add&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This workshop, I chose two MCPs that are most impactful for LINE Bot developers to demonstrate.&lt;/p&gt;




&lt;h1&gt;
  
  
  Workshop 1: Environment Preparation and Official MCP Installation
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Why It's Recommended to Start with Cloud Shell
&lt;/h2&gt;

&lt;p&gt;The biggest fear in on-site workshops is the environment issue like &lt;em&gt;"Teacher, I can't find Python 3.11 here"&lt;/em&gt;. I put the entire demonstration directly on &lt;strong&gt;Google Cloud Shell&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;gcloud&lt;/code&gt; is pre-installed.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;gemini&lt;/code&gt; CLI is pre-installed (the latest Cloud Shell image is built-in).&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;gcloud auth&lt;/code&gt; automatically links with the Cloud Shell account, saving the OAuth dance.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Go to &lt;a href="https://console.cloud.google.com/" rel="noopener noreferrer"&gt;https://console.cloud.google.com/&lt;/a&gt;, &lt;strong&gt;first confirm that the project is the one you just created&lt;/strong&gt; (don't accidentally open the company's official environment), and then click Cloud Shell in the upper right corner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Verify that both tools are there&lt;/span&gt;
gcloud &lt;span class="nt"&gt;--version&lt;/span&gt;
gemini &lt;span class="nt"&gt;--version&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;[!TIP] If you want to run it locally, you can follow the &lt;a href="https://github.com/google/gemini-cli" rel="noopener noreferrer"&gt;Gemini CLI official installation guide&lt;/a&gt;, but in the workshop, we all use Cloud Shell to avoid the tragedy of "everyone's environment is different".&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  What is MCP? Explained in Three Sentences
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;MCP (Model Context Protocol)&lt;/strong&gt; is an open protocol proposed by Anthropic that allows LLM clients to communicate with &lt;em&gt;external capability providers&lt;/em&gt; in a unified format.&lt;/li&gt;
&lt;li&gt;  Gemini CLI is the MCP &lt;strong&gt;client&lt;/strong&gt;, and you can &lt;code&gt;gemini mcp add ...&lt;/code&gt; to mount any server that complies with the MCP specification.&lt;/li&gt;
&lt;li&gt;  Google itself has now packaged several APIs into official MCP servers, which is equivalent to equipping your AI assistant with "Google's internal knowledge base".&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  MCP #1: Google Developer Knowledge
&lt;/h2&gt;

&lt;p&gt;This MCP turns the official documentation of the Google family (Cloud / Android / Web / Firebase / Workspace…) into a tool that Gemini can call. The advantage over web search is that: &lt;strong&gt;it returns chunks that have been officially indexed, with the correct source URL&lt;/strong&gt;, and will not be misled by outdated blogs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Setup Steps
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Enable &lt;strong&gt;Developer Knowledge API&lt;/strong&gt; at &lt;a href="https://console.cloud.google.com/marketplace/product/google/developerknowledge.googleapis.com" rel="noopener noreferrer"&gt;Google Cloud Console&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt; Create an &lt;strong&gt;API Key&lt;/strong&gt; in "Credentials" and restrict it to only call the Developer Knowledge API (the principle of least privilege).&lt;/li&gt;
&lt;li&gt; Run in Cloud Shell:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gemini mcp add &lt;span class="nt"&gt;-t&lt;/span&gt; http &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;-H&lt;/span&gt; &lt;span class="s2"&gt;"X-Goog-Api-Key: YOUR_API_KEY"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  google-developer-knowledge &lt;span class="se"&gt;\&lt;/span&gt;
  https://developerknowledge.googleapis.com/mcp &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--scope&lt;/span&gt; user

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;--scope user&lt;/code&gt; means that this MCP is valid for all your projects, and you don't need to install it again next time you change repos.&lt;/p&gt;

&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;

&lt;p&gt;Enter &lt;code&gt;gemini&lt;/code&gt; interactive mode, first type:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;/mcp list

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You should see &lt;code&gt;google-developer-knowledge&lt;/code&gt; with the status &lt;strong&gt;Connected&lt;/strong&gt;. Then throw a typical question:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Please help me query the latest deployment limits of Google Cloud Run (Deployment Limits) and list the top three.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Correct behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Gemini will call the &lt;code&gt;google-developer-knowledge&lt;/code&gt; tool.&lt;/li&gt;
&lt;li&gt;  The answer content is referenced from official pages like &lt;code&gt;cloud.google.com/run/quotas&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  Finally, it includes a reference URL.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  MCP #2: Google Maps Platform Code Assist
&lt;/h2&gt;

&lt;p&gt;This MCP is specifically designed to help you write code for Google Maps integration — including the latest calling methods for Maps JavaScript API, Places API, and Routes API. It is extremely friendly to developers who "want map features but are too lazy to flip through three docs".&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gemini mcp add &lt;span class="nt"&gt;-s&lt;/span&gt; user &lt;span class="nt"&gt;-t&lt;/span&gt; http &lt;span class="se"&gt;\&lt;/span&gt;
  maps-code-assist-mcp &lt;span class="se"&gt;\&lt;/span&gt;
  https://mapscodeassist.googleapis.com/mcp

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Verification
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;I want to embed a Google map in a webpage, please write a basic JavaScript code for me,
with the center point set to Taipei 101.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Expected behavior:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Gemini calls &lt;code&gt;maps-code-assist-mcp&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  The generated code &lt;strong&gt;will not use the deprecated &lt;code&gt;new google.maps.Map()&lt;/code&gt; synchronous loader&lt;/strong&gt;, but will use the currently recommended &lt;code&gt;importLibrary&lt;/code&gt; async pattern.&lt;/li&gt;
&lt;li&gt;  It will proactively remind you to get the Maps JavaScript API Key and make referer restrictions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you see it still generating the old writing style from 2020, then the MCP is not mounted correctly — re-&lt;code&gt;/mcp list&lt;/code&gt; to check the status.&lt;/p&gt;




&lt;h1&gt;
  
  
  Workshop 2: Deploying a LINE Bot to Cloud Run
&lt;/h1&gt;

&lt;p&gt;This part uses the example project &lt;a href="https://github.com/kkdai/bwai2026-sample" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/bwai2026-sample&lt;/code&gt;&lt;/a&gt;. It is a &lt;strong&gt;LINE Bot file backup helper&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Users put images / videos / audio / PDFs into the LINE chat box.&lt;/li&gt;
&lt;li&gt;  The bot automatically saves the files to &lt;em&gt;the user's own&lt;/em&gt; Google Drive, in folders by &lt;code&gt;YYYY-MM&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  Supports commands like &lt;code&gt;/recent_files&lt;/code&gt;, &lt;code&gt;/search_files &amp;lt;keyword&amp;gt;&lt;/code&gt;, &lt;code&gt;/disconnect_drive&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Tech stack: &lt;strong&gt;Go + LINE Messaging API SDK + Google Drive API + Firestore (to store OAuth token) + Cloud Run&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/kkdai/bwai2026-sample
&lt;span class="nb"&gt;cd &lt;/span&gt;bwai2026-sample

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Deployment Flow Overview
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Phase One] Get LINE Keys (Channel Secret + Access Token)
      ↓
[Phase Two] GCP Project Setup (Enable Run / Build / Firestore / Artifact / Drive API)
      ↓
[Phase Three] Set up OAuth Consent Screen + Gemini CLI Login
      ↓
[Phase Four] Tell Gemini CLI a sentence in Chinese and deploy to Cloud Run
      ↓
[Phase Five] Fill in the Webhook URL in LINE Developers Console

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase One: LINE Keys
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; Create an official account at &lt;a href="https://manager.line.biz/" rel="noopener noreferrer"&gt;LINE Official Account Manager&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt; In the background, "Settings → Messaging API" &lt;strong&gt;enable Messaging API&lt;/strong&gt;, and create a Provider.&lt;/li&gt;
&lt;li&gt; Back to &lt;a href="https://developers.line.biz/console/" rel="noopener noreferrer"&gt;LINE Developers Console&lt;/a&gt; corresponding Channel:

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;Basic settings&lt;/code&gt; → Get &lt;strong&gt;Channel Secret&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;Messaging API&lt;/code&gt; → Click &lt;strong&gt;Issue&lt;/strong&gt; to get &lt;strong&gt;Channel Access Token (long-lived)&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Very important&lt;/strong&gt;: Go back to OA Manager and &lt;strong&gt;disable "Auto-reply messages"&lt;/strong&gt;, otherwise your code will never be able to get the messages to reply to.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Phase Two: GCP Project Activation
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Switch to the clean project used in the workshop&lt;/span&gt;
gcloud config &lt;span class="nb"&gt;set &lt;/span&gt;project your-cool-project-id

&lt;span class="c"&gt;# Enable the entire set of services in one go&lt;/span&gt;
gcloud services &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  run.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  cloudbuild.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  firestore.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  artifactregistry.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  drive.googleapis.com

&lt;span class="c"&gt;# Build Firestore (used to store per-user OAuth token + state anti-counterfeiting)&lt;/span&gt;
gcloud firestore databases create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;firestore-native

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;[!NOTE] &lt;code&gt;--type=firestore-native&lt;/code&gt; This value will be explained in the third pitfall, why it's easy to get wrong.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Phase Three: OAuth Consent Screen + Gemini CLI Login
&lt;/h2&gt;

&lt;p&gt;Because the Bot needs to represent "the user themselves" to upload files to their Google Drive, this path must go through OAuth.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Go to &lt;a href="https://console.cloud.google.com/apis/credentials/consent" rel="noopener noreferrer"&gt;OAuth Consent Screen&lt;/a&gt;:

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;User Type&lt;/strong&gt;: External.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Application Name&lt;/strong&gt;: &lt;code&gt;My LINE Bot&lt;/code&gt; (or whatever name you want to call it).&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Support Email / Developer Contact Email&lt;/strong&gt;: Fill in your own Gmail.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Be sure to click "Publish App"&lt;/strong&gt; after filling it out — if you don't publish it, only accounts in the Test Users list can use it.&lt;/li&gt;
&lt;li&gt; Create an OAuth client ID:

&lt;ul&gt;
&lt;li&gt;  Select &lt;strong&gt;Web Application&lt;/strong&gt; for the type.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Authorized redirect URI&lt;/strong&gt;: Temporarily fill in &lt;code&gt;https://placeholder/oauth/callback&lt;/code&gt;, and come back to modify it after getting the Cloud Run URL in Phase Four.&lt;/li&gt;
&lt;li&gt;  Save the &lt;strong&gt;Client ID&lt;/strong&gt; and &lt;strong&gt;Client Secret&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt; Run locally:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud auth application-default login

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will write ADC (Application Default Credentials) to the local machine, and Gemini CLI will use this credential when running &lt;code&gt;gcloud&lt;/code&gt;, without popping up a browser to re-auth halfway.&lt;/p&gt;

&lt;h2&gt;
  
  
  Phase Four: Deploy to Cloud Run with Gemini CLI (The Highlight)
&lt;/h2&gt;

&lt;p&gt;This part is where the participants in the workshop were most "wow".&lt;/p&gt;

&lt;p&gt;After entering the project directory, start Gemini CLI interactive mode:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gemini

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then say a sentence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Help me deploy to Cloud Run using gcloud, and stop to ask me if you need any data.
Refer to repo https://github.com/kkdai/bwai2026-sample,
region use asia-east1, environment variables will use
ChannelSecret, ChannelAccessToken, GOOGLE_CLIENT_ID,
GOOGLE_CLIENT_SECRET, GOOGLE_REDIRECT_URL.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini CLI will then:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;&lt;code&gt;ls&lt;/code&gt; and &lt;code&gt;cat Dockerfile&lt;/code&gt; by itself&lt;/strong&gt; to confirm the project structure.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Generate a plan&lt;/strong&gt;: First use &lt;code&gt;PENDING&lt;/code&gt; to reserve the deployment → get the URL → supplement the OAuth redirect → update env vars.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Stop and ask you for confirmation before execution&lt;/strong&gt; (this is the CLI's confirm mode, enabled by default, and will not yolo).&lt;/li&gt;
&lt;li&gt; Run a command that looks like this:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy linebot-backup-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_CLOUD_PROJECT=your-cool-project-id,&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
ChannelSecret=YOUR_LINE_SECRET_XXXX,&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
ChannelAccessToken=YOUR_LINE_TOKEN_XXXX,&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
GOOGLE_CLIENT_ID=PENDING,&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
GOOGLE_CLIENT_SECRET=PENDING,&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
GOOGLE_REDIRECT_URL=PENDING"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After 3 to 5 minutes, get the Service URL, such as &lt;code&gt;https://linebot-backup-service-xxxxx.a.run.app&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Supplement the Real OAuth Settings
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Go back to the Console and change the &lt;code&gt;https://placeholder/oauth/callback&lt;/code&gt; you just filled in to &lt;code&gt;https://linebot-backup-service-xxxxx.a.run.app/oauth/callback&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Paste the real Client ID / Secret to Gemini CLI and ask it to help you update:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run services update linebot-backup-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--update-env-vars&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="s2"&gt;"GOOGLE_REDIRECT_URL=https://linebot-backup-service-xxxxx.a.run.app/oauth/callback,&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
GOOGLE_CLIENT_ID=real-client-id.apps.googleusercontent.com,&lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;&lt;span class="s2"&gt;
GOOGLE_CLIENT_SECRET=real-secret-xxxx"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Phase Five: Point the LINE Webhook to Cloud Run
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt; Go back to &lt;a href="https://developers.line.biz/console/" rel="noopener noreferrer"&gt;LINE Developers Console&lt;/a&gt; → Messaging API tab.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Webhook URL&lt;/strong&gt;: Fill in &lt;code&gt;https://linebot-backup-service-xxxxx.a.run.app/callback&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Press &lt;strong&gt;Verify&lt;/strong&gt;, and expect to see &lt;code&gt;Success&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Toggle &lt;strong&gt;Use webhook&lt;/strong&gt; to on.&lt;/li&gt;
&lt;li&gt; Finally, go back to OA Manager and reconfirm that "Auto-reply messages" is off and "Webhook" is on.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Open LINE, add the Bot as a friend, throw a picture, run OAuth once, and see a folder &lt;code&gt;LINE Bot Uploads/2026-05/...&lt;/code&gt; in Drive — the entire process is complete.&lt;/p&gt;




&lt;h2&gt;
  
  
  Common Maintenance Commands
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Function&lt;/th&gt;
&lt;th&gt;Command&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Redeploy&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gcloud run deploy linebot-backup-service --source . --region asia-east1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Change env vars&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gcloud run services update linebot-backup-service --update-env-vars "KEY=VALUE"&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Real-time log&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gcloud beta run services logs tail linebot-backup-service&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Check service status&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gcloud run services describe linebot-backup-service --region asia-east1&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The entire maintenance can actually be given to Gemini CLI: "&lt;strong&gt;Help me check the logs of linebot-backup-service for the last 5 minutes, and find 5xx&lt;/strong&gt;" is enough.&lt;/p&gt;




&lt;h2&gt;
  
  
  Workshop On-Site Pitfall Records
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall One: Billing Not Enabled, Red Error on First Deploy
&lt;/h3&gt;

&lt;p&gt;The first &lt;code&gt;gcloud run deploy&lt;/code&gt; directly spewed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FAILED_PRECONDITION: Billing account for project [your-cool-project-id] is not found.
Please ensure that you have linked an active billing account.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: Most workshop participants open new projects to do this, and new projects don't have Billing bound by default. Cloud Run, Cloud Build, and Artifact Registry all require billing to run — even within the free tier, you must have a "billing account with a linked card" attached to the project.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Check the current billing status of the project&lt;/span&gt;
gcloud beta billing projects describe your-cool-project-id

&lt;span class="c"&gt;# List available billing accounts&lt;/span&gt;
gcloud beta billing accounts list

&lt;span class="c"&gt;# Bind&lt;/span&gt;
gcloud beta billing projects &lt;span class="nb"&gt;link &lt;/span&gt;your-cool-project-id &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--billing-account&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;0X0X0X-0X0X0X-0X0X0X

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you can't or don't want to bind a card, we used the " &lt;strong&gt;sandbox project with billing already&lt;/strong&gt; " as a demonstration on site.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall Two: Firestore type Parameter Name
&lt;/h3&gt;

&lt;p&gt;The first version of the teaching material (even what AI guessed the first time) was written as &lt;code&gt;--type=native&lt;/code&gt; or &lt;code&gt;--type=native-mode&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;ERROR: argument --type: Invalid choice: 'native-mode'.
  Valid choices: ['firestore-native', 'datastore-mode']

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: After an update in 2024, &lt;code&gt;gcloud firestore databases create&lt;/code&gt; changed the type parameter value to the more explicit &lt;code&gt;firestore-native&lt;/code&gt; / &lt;code&gt;datastore-mode&lt;/code&gt;. Old documents and old answers (including LLM training data) will give you the old values.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud firestore databases create &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;firestore-native

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This pitfall just demonstrated why you should install the &lt;strong&gt;Google Developer Knowledge MCP&lt;/strong&gt; — after mounting it, Gemini will check the latest official documentation and will not give you outdated type values.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall Three: Forgot to Enable Drive API, OAuth Passed but Can't Write In
&lt;/h3&gt;

&lt;p&gt;After deployment, Webhook is set up, OAuth consent screen is completed, and the token is obtained, &lt;strong&gt;but the first picture upload is 500&lt;/strong&gt;. Check the log:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;googleapi: Error 403: Google Drive API has not been used in project
your-cool-project-id before or it is disabled.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: If you miss &lt;code&gt;drive.googleapis.com&lt;/code&gt; in the &lt;code&gt;gcloud services enable ...&lt;/code&gt; string in Phase Two, OAuth can pass (because the Consent Screen and Drive API are two different things), but your server will be blocked when it uses the access token to call &lt;code&gt;drive.googleapis.com&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution (Quickest)&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;drive.googleapis.com

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Solution (Fundamental)&lt;/strong&gt;: Enable all the APIs you need at once, list them in the checklist of the teaching material, and run along with it on site so you won't miss it. I specifically wrote &lt;code&gt;drive.googleapis.com&lt;/code&gt; into the string in Phase Two to block this pitfall.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;[!TIP] A good habit for debugging: &lt;strong&gt;As long as the server has the correct token but is 403&lt;/strong&gt;, first go to &lt;a href="https://console.cloud.google.com/apis/library" rel="noopener noreferrer"&gt;API Library&lt;/a&gt; to confirm that the corresponding API is enabled, then check the OAuth scope, and finally look at IAM. The wrong order will waste a lot of time.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Why is this combination worth learning?
&lt;/h2&gt;

&lt;p&gt;After the workshop, I asked the on-site participants what moment they felt the most, and the answer was almost unanimous: &lt;strong&gt;"Deploying the service just by speaking Chinese to Gemini CLI"&lt;/strong&gt; that moment.&lt;/p&gt;

&lt;p&gt;So why does it feel that way? Breaking it down:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Previously, DevOps was stuck on &lt;em&gt;remembering which command&lt;/em&gt;, now it's stuck on &lt;em&gt;expressing clearly what you want to do&lt;/em&gt;&lt;/strong&gt;. The latter is much lower in threshold, with newcomers getting started in three days vs. three months before daring to touch &lt;code&gt;gcloud&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;MCP injects official knowledge into Gemini in advance&lt;/strong&gt;. You no longer need to RTFM yourself first, then translate it into a prompt for LLM; MCP is equivalent to letting LLM have the ability to RTFM itself.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Error messages return to the tool itself&lt;/strong&gt;. Previously, you had to Google + StackOverflow for errors, now you can directly paste them back to the CLI, which reads the error and then decides the next step — forming a complete plan-act-observe loop.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;The entire workflow is reproducible&lt;/strong&gt;. The teaching materials, examples, and prompts are all in the GitHub repo, and anyone can clone it and follow along, and the results should be consistent.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Want to go deeper? Recommended Advanced Reading
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;  Official Materials: &lt;a href="https://github.com/kkdai/BwAI-2026" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/BwAI-2026&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Example Project: &lt;a href="https://github.com/kkdai/bwai2026-sample" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/bwai2026-sample&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Slides: &lt;a href="https://speakerdeck.com/line_developers_tw/20260514-build-with-ai-2026-build-line-bot-with-gemini-cli" rel="noopener noreferrer"&gt;SpeakerDeck&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Gemini CLI: &lt;a href="https://github.com/google/gemini-cli" rel="noopener noreferrer"&gt;github.com/google/gemini-cli&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  MCP Specification: &lt;a href="https://modelcontextprotocol.io/" rel="noopener noreferrer"&gt;modelcontextprotocol.io&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Extension: &lt;a href="https://dev.to/gde/gemini-cli-google-developer-knowledge-api-and-mcp-server-equipping-your-ai-assistant-with-an-3gee"&gt;Using Gemini CLI + Developer Knowledge MCP&lt;/a&gt;, &lt;a href="https://dev.to/gde/geminigoogle-maps-building-location-aware-ai-apps-with-the-google-maps-grounding-api-4l36"&gt;Map MCP Grounding&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Postscript: Come to LINE and Make Things Together
&lt;/h2&gt;

&lt;p&gt;This workshop is also one of the recruitment events for our LINE Taiwan DevRel. If you read this and feel:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Want to play with the integration of LINE Messaging API + Google Cloud + Gemini for a long time.&lt;/li&gt;
&lt;li&gt;  Like to write production code while making the process into teaching materials that can be copied by others.&lt;/li&gt;
&lt;li&gt;  Can invest more than three days a week and are willing to become a full-time partner after the internship.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Welcome to send me a private message or email to chat, we have a &lt;strong&gt;flexible internship program of three days a week&lt;/strong&gt;, and if you do well, you have the opportunity to become a long-term partner.&lt;/p&gt;

&lt;p&gt;Finally, thank you to all the developers who came to the site and did hands-on together — those who are willing to spend their weekends on "using new tools to get through the entire pipeline" are always the most admirable group in the community. See you next time!&lt;/p&gt;

</description>
      <category>cli</category>
      <category>gemini</category>
      <category>mcp</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Gemini API File Search: Enhanced Multimodal Capabilities with Embedding 2, Including Open-Source LINE Bot Implementation</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Tue, 12 May 2026 04:17:48 +0000</pubDate>
      <link>https://dev.to/gde/gemini-api-file-search-enhanced-multimodal-capabilities-with-embedding-2-including-open-source-g72</link>
      <guid>https://dev.to/gde/gemini-api-file-search-enhanced-multimodal-capabilities-with-embedding-2-including-open-source-g72</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faeqghkoo1xi76898n5zl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faeqghkoo1xi76898n5zl.png" alt="image-20260511221639333" width="800" height="325"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image source: &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Google Blog - Gemini API File Search is now multimodal: build efficient, verifiable RAG&lt;/a&gt;)&lt;/p&gt;

&lt;h1&gt;
  
  
  Recap: RAG Finally Doesn't Need to Build Legos
&lt;/h1&gt;

&lt;p&gt;In the past few years, whenever developers thought about RAG (Retrieval-Augmented Generation), the component list that came to mind probably looked like this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A chunker (langchain? Write it yourself?)&lt;/li&gt;
&lt;li&gt;An embedding model (OpenAI text-embedding-3? Cohere? BGE?)&lt;/li&gt;
&lt;li&gt;A vector database (ChromaDB, FAISS, pgvector, Pinecone… which one to choose is a battle)&lt;/li&gt;
&lt;li&gt;A retrieval + rerank process&lt;/li&gt;
&lt;li&gt;And then the LLM&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Not to mention that multimodal RAG needs another layer: How to embed images? Do you need to OCR first? Do you need to split two stores, one for text and one for images? How to calculate scores for mixed text and image search? Just these few questions can take up a sprint.&lt;/p&gt;

&lt;p&gt;Recently, Google released &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Expanded Gemini API File Search for multimodal RAG&lt;/a&gt; on the developer blog, turning the long pipeline above into " &lt;strong&gt;calling a managed API&lt;/strong&gt; ", and &lt;strong&gt;images are natively supported&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article will do two things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Explain the new features clearly, including what &lt;strong&gt;Gemini Embedding 2&lt;/strong&gt; is doing behind the scenes.&lt;/li&gt;
&lt;li&gt; Use an &lt;strong&gt;open-source&lt;/strong&gt; LINE Bot (&lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt;) as a live demonstration to see how the new features are combined in actual production code — and share the two typical pitfalls I encountered during debugging to help everyone avoid them.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Three Major Highlights of the New Features
&lt;/h2&gt;

&lt;p&gt;According to the official blog, the core of this upgrade is three things:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. True Multimodal File Search (Native Multimodal File Search)
&lt;/h3&gt;

&lt;p&gt;In the past, File Search was pure text retrieval, and images could only be indexed by OCRing them into text.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“File Search now processes images and text together. Powered by the Gemini Embedding 2 model, the tool understands native image data.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Now you can &lt;strong&gt;directly put images into the File Search Store&lt;/strong&gt;, and index them together with text. The engine behind it is &lt;strong&gt;Gemini Embedding 2&lt;/strong&gt; — text, images, videos, audio, and documents &lt;strong&gt;share the same vector space&lt;/strong&gt;, so you can "find text with images", "find images with text", or "find images with images" without having to align the spaces yourself.&lt;/p&gt;

&lt;p&gt;For us product people, this means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Mixed text and image search is no longer a research topic&lt;/strong&gt;, it's an API call.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;No need to maintain two stores&lt;/strong&gt; (one for text chunks and one for CLIP-style image embeddings).&lt;/li&gt;
&lt;li&gt;  Scientific charts, UI screenshots, reports, photo albums... these &lt;strong&gt;things that used to lose most of their meaning after OCR&lt;/strong&gt; can now retain the original visual information for retrieval.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Custom Metadata and Server-side Filtering
&lt;/h3&gt;

&lt;p&gt;Each file you put into the store can now be tagged with key-value labels:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"user_id"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"U1234abcd..."&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"department"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Legal"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"key"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"status"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"string_value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Final"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use the &lt;a href="https://google.aip.dev/160" rel="noopener noreferrer"&gt;google.aip.dev/160&lt;/a&gt; filter syntax (same format as most GCP list APIs) when querying:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight sql"&gt;&lt;code&gt;&lt;span class="n"&gt;metadata_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'department="Legal" AND status="Final"'&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Filtering is done &lt;strong&gt;first on Google's side&lt;/strong&gt;, not retrieving a bunch and then discarding. After reducing the noise, the &lt;strong&gt;speed and accuracy will both increase&lt;/strong&gt;, which is a lifesaver for multi-tenant SaaS — one store with metadata filters can separate tenants, without the need to isolate N stores.&lt;/p&gt;

&lt;p&gt;My LINE Bot uses this directly to do &lt;strong&gt;per-user data isolation&lt;/strong&gt;: each time a file is uploaded, it's tagged with the LINE &lt;code&gt;user_id&lt;/code&gt;, and when querying, a filter is applied, so user A will never see user B's data in the Q&amp;amp;A.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. Page-level Citations
&lt;/h3&gt;

&lt;p&gt;Each cited snippet in the response will now include the &lt;strong&gt;page number&lt;/strong&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“captures the page number for every piece of indexed information.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is super critical for enterprise customers. "AI says Y is mentioned on page X of the contract" vs. "AI says Y is mentioned in the contract" — the former can be directly accepted by legal/auditing, while the latter requires manual effort to flip through the book for verification. Page numbers unlock the final mile of "LLM answers cannot be traced back to the source".&lt;/p&gt;




&lt;h2&gt;
  
  
  The Multimodal Engine: Gemini Embedding 2
&lt;/h2&gt;

&lt;p&gt;The core of the new feature is this &lt;a href="https://deepmind.google/models/gemini/embedding/" rel="noopener noreferrer"&gt;Gemini Embedding 2&lt;/a&gt; model. Quote its specifications for your selection decisions:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6qi7ndky7i4xyvit5vs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6qi7ndky7i4xyvit5vs.png" alt="image-20260511221801984" width="800" height="534"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Item&lt;/th&gt;
&lt;th&gt;Specification&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Supported Input&lt;/td&gt;
&lt;td&gt;
&lt;strong&gt;Text, images, videos, audio, documents&lt;/strong&gt; (same embedding space)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Input token limit&lt;/td&gt;
&lt;td&gt;8,192 tokens&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output dimensions&lt;/td&gt;
&lt;td&gt;128 ～ 3,072 (using Matryoshka Representation Learning, small dimensions can also maintain similar accuracy)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Multilingual support&lt;/td&gt;
&lt;td&gt;100+ languages&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Several key benchmarks (recall@1):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Text-to-Image Search&lt;/strong&gt;: TextCaps &lt;strong&gt;89.6&lt;/strong&gt; / Docci &lt;strong&gt;93.4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Image-to-Text Search&lt;/strong&gt;: TextCaps &lt;strong&gt;97.4&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Multilingual (MTEB)&lt;/strong&gt;: mean &lt;strong&gt;69.9&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Video-Text Matching&lt;/strong&gt;: Vatex ndcg@10 &lt;strong&gt;68.8&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Speech-Text Retrieval&lt;/strong&gt;: MSEB mrr@10 &lt;strong&gt;73.9&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Several key observations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Matryoshka is not a buzzword&lt;/strong&gt;: You can store it with 3072 dimensions first, and when running retrieval, switch to 768 dimensions to run faster and maintain quality. Storage/scoring costs can be optimized in stages.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cross-modal scores are very real&lt;/strong&gt;: 97.4% recall@1 (image→text) means that if you have an image and want to find the corresponding descriptive text, you'll find it almost immediately. This can be directly implemented for use cases like "take a picture of a product label and find the corresponding page of the user manual".&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;100+ languages&lt;/strong&gt;: This is a very real difference for the Taiwan/Japan/Korea/Southeast Asia markets.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  What Developers Really Care About: Price and Access Cost
&lt;/h2&gt;

&lt;p&gt;From the official tutorial article &lt;a href="https://dev.to/googleai/multimodal-rag-with-the-gemini-api-file-search-tool-a-developer-guide-5878"&gt;Multimodal RAG with the Gemini API File Search tool: a developer guide&lt;/a&gt;, there are two sections that developers sensitive to cost should highlight:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Fully managed, with no vector database overhead.”&lt;/p&gt;

&lt;p&gt;“Storage and query-time embeddings are free. You only pay for indexing and tokens.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In plain English:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;You don't pay for the vector database&lt;/strong&gt;, nor do you pay for the monthly salary of the people maintaining it.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Storage is free&lt;/strong&gt;, and &lt;strong&gt;embedding calculations at query time are also free&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  You only have two things to pay for: &lt;strong&gt;the embedding fee for the initial indexing&lt;/strong&gt; and &lt;strong&gt;the LLM tokens consumed when generating the answer&lt;/strong&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is a friendly cost curve for personal side projects and early startups — you don't need to decide on day one "can I afford the baseline of the vector DB".&lt;/p&gt;




&lt;h2&gt;
  
  
  Standard Workflow: 4 SDK calls to complete a RAG
&lt;/h2&gt;

&lt;p&gt;Organized from the dev.to guide, the minimum viable workflow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Create a store (specify the multimodal embedding model)
&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_search_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-multimodal-rag&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;embedding_model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;models/gemini-embedding-2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Upload files + custom metadata
&lt;/span&gt;&lt;span class="n"&gt;operation&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;file_search_stores&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;upload_to_file_search_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;file_search_store_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report-q1.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;display_name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Q1 Report&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;custom_metadata&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;department&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
            &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;year&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;string_value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Upload is a long-running operation, needs to poll:
# operation = client.operations.get(operation)
&lt;/span&gt;
&lt;span class="c1"&gt;# 3. Feed file_search as a tool to generate_content
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3-flash-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;What was the revenue growth rate in the first quarter of last year?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_search&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FileSearch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;file_search_store_names&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;metadata_filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;department=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Finance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt; AND year=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2026&lt;/span&gt;&lt;span class="sh"&gt;"'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;))],&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Get citations (including page numbers)
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;grounding_metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;grounding_chunks&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;citation&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;web&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# or the corresponding file/page fields
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To provide citations with images to the user, there is also &lt;code&gt;client.file_search_stores.download_media()&lt;/code&gt; that can be called.&lt;/p&gt;

&lt;p&gt;It's no exaggeration, &lt;strong&gt;the entire multimodal RAG is less than 30 lines of code&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo Case: Putting These New Features into a LINE Bot
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax8dlyjqm7ty2z00fv10.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fax8dlyjqm7ty2z00fv10.png" alt="image-20260511221916359" width="800" height="1734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzg2z445gd33i5ianvxd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdzg2z445gd33i5ianvxd.png" alt="image-20260511221851736" width="800" height="1734"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It's abstract just looking at the SDK examples, so I made it into a LINE Bot that can be put to work, open-sourced at &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Users drop &lt;strong&gt;PDFs / images / text files&lt;/strong&gt; into the LINE chat box → Bot indexes into the File Search Store.&lt;/li&gt;
&lt;li&gt;  Users type questions → Gemini finds answers from the data &lt;strong&gt;uploaded by the user themselves&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;  Users drop an image and ask a question → The same can be done for image-to-text retrieval.&lt;/li&gt;
&lt;li&gt;  Deployment target: GCP Cloud Run + Cloud Build automatic deployment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The architecture is very intuitive (key fields):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;LINE Webhook&lt;/td&gt;
&lt;td&gt;FastAPI receives message events&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;GCS&lt;/td&gt;
&lt;td&gt;Persists original files (&lt;code&gt;uploads/{user_id}/{message_id}.{ext}&lt;/code&gt;)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini File Search Store&lt;/td&gt;
&lt;td&gt;The only index layer (managed)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom metadata &lt;code&gt;user_id&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Multi-tenant isolation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;FastAPI BackgroundTasks&lt;/td&gt;
&lt;td&gt;Avoid the LINE reply token 30-second limit&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Comparing to the three major new features mentioned earlier:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Multimodal&lt;/strong&gt;: Users drop images, drop PDFs, all go into the same store, and all consume the same pipeline during search.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Custom metadata&lt;/strong&gt;: Files for each LINE user are tagged with &lt;code&gt;user_id&lt;/code&gt;, filtered during queries, achieving server-side forced isolation.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Page-level citations&lt;/strong&gt;: In the future, to display "the answer comes from XX.pdf page 5" in LINE messages, directly consume &lt;code&gt;grounding_metadata&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The entire repo is about 600 lines of Python, and it completes a " &lt;strong&gt;your own private multimodal knowledge base chat Bot&lt;/strong&gt; ".&lt;/p&gt;




&lt;h2&gt;
  
  
  Deployment Battle: commit → automatic online
&lt;/h2&gt;

&lt;p&gt;It's not enough for the open-source example to just run; to demonstrate it at the workshop, it needs to be at the level of "code changes, push to GitHub, and automatically deploy". This time, I asked &lt;a href="https://docs.anthropic.com/en/docs/claude-code" rel="noopener noreferrer"&gt;Claude Code&lt;/a&gt; to be my co-pilot to help me connect CI/CD.&lt;/p&gt;

&lt;p&gt;I only dropped one sentence:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Help me create a Cloud Build connection to GitHub, and trigger a build to deploy to Cloud Run after committing to main."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Claude Code first scanned &lt;code&gt;cloudbuild.yaml&lt;/code&gt;, existing Cloud Run settings, Secret Manager, and Artifact Registry, and listed a "current problem", and then &lt;strong&gt;stopped to ask me a key decision&lt;/strong&gt;: Should I keep the existing service name or change the yaml? Does GitHub need authorization? After I answered, it built the missing resources in one go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Build Artifact Registry repo&lt;/span&gt;
gcloud artifacts repositories create linebot &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repository-format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1

&lt;span class="c"&gt;# Secret migration: move from the current service to Secret Manager (via stdin, don't leave shell history)&lt;/span&gt;
gcloud run services describe linebot-gemini-file-search &lt;span class="nt"&gt;--region&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'value(...)'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  | gcloud secrets create LINE_CHANNEL_SECRET &lt;span class="nt"&gt;--data-file&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;-

&lt;span class="c"&gt;# Give Cloud Build / Compute SA the roles needed for deployment&lt;/span&gt;
&lt;span class="k"&gt;for &lt;/span&gt;role &lt;span class="k"&gt;in &lt;/span&gt;run.admin iam.serviceAccountUser artifactregistry.writer &lt;span class="se"&gt;\&lt;/span&gt;
            secretmanager.secretAccessor storage.objectAdmin logging.logWriter&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;do
  &lt;/span&gt;gcloud projects add-iam-policy-binding your-cool-project-id &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--member&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"serviceAccount:660825558664-compute@developer.gserviceaccount.com"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"roles/&lt;/span&gt;&lt;span class="nv"&gt;$role&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;--condition&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;None
&lt;span class="k"&gt;done&lt;/span&gt;

&lt;span class="c"&gt;# Build trigger&lt;/span&gt;
gcloud builds triggers create github &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linebot-multimodal-rag-main &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--repo-owner&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;kkdai &lt;span class="nt"&gt;--repo-name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;linebot-multimodal-rag &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--branch-pattern&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"^main$"&lt;/span&gt; &lt;span class="nt"&gt;--build-config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;cloudbuild.yaml

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The only thing that couldn't be automated was &lt;strong&gt;GitHub OAuth authorization&lt;/strong&gt; — Claude Code directly admitted to me that "this step can only be done by clicking in the Console", and provided the URL and step-by-step instructions. After finishing the one-minute click, the trigger ran through.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfalls Record: Two Traps Directly Related to the New Features
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Hardcoded Model ID is Outdated
&lt;/h3&gt;

&lt;p&gt;The default values in &lt;code&gt;cloudbuild.yaml&lt;/code&gt; and code both write &lt;code&gt;gemini-3.1-flash&lt;/code&gt;, but after looking at the &lt;a href="https://ai.google.dev/gemini-api/docs/models" rel="noopener noreferrer"&gt;Gemini API's current model id list&lt;/a&gt;: there's no such model at all. The correct ID for Gemini 3 Flash is &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this happened&lt;/strong&gt;: multimodal RAG is a very new feature, and related documents, tutorials, and examples are still being created in large numbers, and the naming has also been slightly adjusted. The initial version of the Repo can easily write an id that "looks like it but doesn't actually exist".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Change the entire repo to &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;, and also confirm that the embedding model is &lt;code&gt;models/gemini-embedding-2&lt;/code&gt; (correct, didn't step on the trap). After pushing, Cloud Build automatically triggered, and a new revision went online in three minutes.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Mysterious "Upload has already been terminated"
&lt;/h3&gt;

&lt;p&gt;This trap was directly stepped on the " &lt;strong&gt;image upload&lt;/strong&gt; " path newly supported by File Search Store — it's also the most worth sharing, because it demonstrates that "the error messages of new APIs are sometimes very euphemistic".&lt;/p&gt;

&lt;p&gt;I sent a JPG from LINE to the Bot and clicked "store in database", and the result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;❌ Failed to store: 400 Bad Request. {'message': 'Upload has already been terminated.', 'status': 'Bad Request'}

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Couldn't see the reason at all. Cloud Logging only had the same error, no stack trace. After looking around on the &lt;a href="https://discuss.ai.google.dev/" rel="noopener noreferrer"&gt;Google AI Developers Forum&lt;/a&gt;, I found that several file types (.md / .xlsx / large CSV) had encountered similar reports.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The real culprit&lt;/strong&gt; is hidden in this seemingly innocent code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/gemini_service.py (before modification)
&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;mimetypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;guess_extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;tmp_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tmp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Before Python 3.13, &lt;code&gt;mimetypes.guess_extension("image/jpeg")&lt;/code&gt; &lt;strong&gt;returns &lt;code&gt;.jpe&lt;/code&gt;, not &lt;code&gt;.jpg&lt;/code&gt;&lt;/strong&gt;. The reason is that in the MIME table of the standard library, &lt;code&gt;.jpe&lt;/code&gt; is lexicographically before &lt;code&gt;.jpg&lt;/code&gt;, and this quirk has existed for nearly twenty years.&lt;/p&gt;

&lt;p&gt;Gemini File Search Store doesn't recognize the file extension &lt;code&gt;.jpe&lt;/code&gt;, but the API's message uses "Upload has already been terminated" in a way that is very easy to mislead — at first, I thought it was because the upload size exceeded, or it was choked by concurrency, or there was a race inside the SDK.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Take the file extension directly from &lt;code&gt;display_name&lt;/code&gt; (handlers have already been correctly set to &lt;code&gt;image_&amp;lt;id&amp;gt;.jpg&lt;/code&gt;), and use an explicit MIME comparison table as a backup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# app/gemini_service.py (after modification)
&lt;/span&gt;&lt;span class="n"&gt;_MIME_TO_EXT&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.jpg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/webp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.webp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;application/pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pdf&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ...
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rsplit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;lower&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;suffix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;_MIME_TO_EXT&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;mimetypes&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;guess_extension&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.bin&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[BG Store] uploading display_name=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;display_name&lt;/span&gt;&lt;span class="si"&gt;!r}&lt;/span&gt;&lt;span class="s"&gt; mime=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
      &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;size=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; tmp_suffix=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Also, add &lt;code&gt;traceback.format_exc()&lt;/code&gt; to the &lt;code&gt;except&lt;/code&gt; part, so that the next time something goes wrong, Cloud Logging will have the full stack.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The takeaway from this story&lt;/strong&gt;: When you're running on a new modality on a "newly GA'd API", please be sure to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;First confirm on the client side that the filename / file extension you generate is the format expected by the API&lt;/strong&gt;, don't trust the &lt;code&gt;mimetypes&lt;/code&gt; standard library to guess for you.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Write the stack trace into the log&lt;/strong&gt;, otherwise you can't save yourself from the esoteric discussions on the forum like "just change a file".&lt;/li&gt;
&lt;li&gt; Compare the file extension you generate with the &lt;a href="https://ai.google.dev/gemini-api/docs/file-search" rel="noopener noreferrer"&gt;Gemini File Search official supported format list&lt;/a&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary: The Entry Fee for Multimodal RAG, the Lowest in History
&lt;/h2&gt;

&lt;p&gt;This time's Gemini API File Search upgrade compresses a feature line that used to take 3 months to go online into " &lt;strong&gt;dozens of lines of code + a managed API&lt;/strong&gt; " to run:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Native multimodal support&lt;/strong&gt;: Text, images, videos, audio, and documents share the same embedding space, goodbye to the OCR transition layer.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Custom metadata + server-side filter&lt;/strong&gt;: Multi-tenant SaaS doesn't need to struggle with how many stores to split.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Page-level citations&lt;/strong&gt;: Enterprise compliance scenarios finally have native grounding.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Friendly to money&lt;/strong&gt;: Storage / query embedding are both free, only pay for indexing + LLM tokens.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Cross-modal scores of Embedding 2&lt;/strong&gt;: 97.4% recall@1 is not a demo number, it's the level that can directly support the product.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to directly see a production-shaped end-to-end example: &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;&lt;code&gt;kkdai/linebot-multimodal-rag&lt;/code&gt;&lt;/a&gt; the entire repo PR welcome, and you're also welcome to use it to modify it into your own domain's RAG application — Notion knowledge base, employee manual Q&amp;amp;A machine, photo album manager, research paper index... probably only imagination will limit you.&lt;/p&gt;

&lt;p&gt;If you want to get started, the recommended reading order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Google official blog: &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/expanded-gemini-api-file-search-multimodal-rag/" rel="noopener noreferrer"&gt;Expanded Gemini API File Search for multimodal RAG&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Gemini Embedding 2 specification page: &lt;a href="https://deepmind.google/models/gemini/embedding/" rel="noopener noreferrer"&gt;deepmind.google/models/gemini/embedding&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; Developer implementation guide: &lt;a href="https://dev.to/googleai/multimodal-rag-with-the-gemini-api-file-search-tool-a-developer-guide-5878"&gt;Multimodal RAG with the Gemini API File Search tool: a developer guide&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt; My open-source example: &lt;a href="https://github.com/kkdai/linebot-multimodal-rag" rel="noopener noreferrer"&gt;github.com/kkdai/linebot-multimodal-rag&lt;/a&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Welcome everyone to try out this very powerful Multimodal RAG support!&lt;/p&gt;

</description>
      <category>api</category>
      <category>gemini</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>[GCP Practice][BwAI] AI-Powered Development: Quickly Deploy a LINE Bot Cloud Backup Tool with Gemini CLI</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Thu, 07 May 2026 04:36:40 +0000</pubDate>
      <link>https://dev.to/gde/gcp-practicebwai-ai-powered-development-quickly-deploy-a-line-bot-cloud-backup-tool-with-4ghi</link>
      <guid>https://dev.to/gde/gcp-practicebwai-ai-powered-development-quickly-deploy-a-line-bot-cloud-backup-tool-with-4ghi</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpdr5gzv1yj95xvae4ss.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqpdr5gzv1yj95xvae4ss.png" alt="Preview Program 2026-05-05 12.38.54" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;In the upcoming &lt;strong&gt;Build With AI 2026&lt;/strong&gt; workshop, we're bringing a very practical project: the &lt;strong&gt;LINE Bot File Backup Robot&lt;/strong&gt;. It allows you to directly upload images and files from your LINE chatroom to Google Drive, and it will automatically create folders by month to keep things organized.&lt;/p&gt;

&lt;p&gt;Traditionally, putting a project like this, which includes OAuth authorization, a Firestore database, and Cloud Run container deployment, on the cloud would often leave beginners struggling with lengthy &lt;code&gt;gcloud&lt;/code&gt; commands.&lt;/p&gt;

&lt;p&gt;But this time it's different, we have a secret weapon: &lt;strong&gt;Gemini CLI&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This article will document how we used AI as a DevOps engineer, completing the entire complex deployment process by "talking," and of course, including the various real pitfalls we encountered along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Preparation: Summoning the AI Assistant
&lt;/h2&gt;

&lt;p&gt;Before we start, besides the basic &lt;code&gt;gcloud&lt;/code&gt; installation and login, you only need to install &lt;a href="https://github.com/google/gemini-cli" rel="noopener noreferrer"&gt;Gemini CLI&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Prepare the following "confidential parameters" (all are Mock processed in this article):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PROJECT_ID&lt;/strong&gt;: &lt;code&gt;your-cool-project-id&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LINE Channel Secret&lt;/strong&gt;: &lt;code&gt;YOUR_LINE_SECRET_XXXX&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LINE Access Token&lt;/strong&gt;: &lt;code&gt;YOUR_LINE_TOKEN_XXXX&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;After entering the project folder, I only said one sentence to Gemini CLI:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Help me deploy to Cloud Run using gcloud, and stop and ask me if you need any information. Refer to the repo…"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Next, it's time to witness miracles (and fix bugs).&lt;/p&gt;




&lt;h2&gt;
  
  
  Practical Deployment Process: AI Leading the Way
&lt;/h2&gt;

&lt;p&gt;Gemini CLI intelligently analyzed &lt;code&gt;Dockerfile&lt;/code&gt; and &lt;code&gt;main.go&lt;/code&gt; and immediately listed a set of battle plans.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Environment Detection and API Enablement
&lt;/h3&gt;

&lt;p&gt;The AI first confirmed my current project settings in &lt;code&gt;gcloud&lt;/code&gt; and then enabled the necessary services in one go:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;firestore.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  cloudbuild.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  run.googleapis.com &lt;span class="se"&gt;\&lt;/span&gt;
  artifactregistry.googleapis.com

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Creating a Firestore Database (Encountering the First Pitfall)
&lt;/h3&gt;

&lt;p&gt;Our Bot needs to record the OAuth State anti-counterfeiting mark, so Firestore is needed. The AI tried to execute the command, but we immediately encountered an error. &lt;em&gt;(See the pitfall record below for details)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;After correction, the correct command is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud firestore databases create &lt;span class="nt"&gt;--location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1 &lt;span class="nt"&gt;--type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;firestore-native

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Deploying Cloud Run First, Filling in the Blanks Later
&lt;/h3&gt;

&lt;p&gt;This is a classic "chicken or the egg" problem: Google OAuth needs to know your Cloud Run URL (Redirect URI), but your Cloud Run deployment needs to fill in the OAuth Client ID and Secret.&lt;/p&gt;

&lt;p&gt;Gemini CLI's strategy is great: &lt;strong&gt;Deploy with placeholders first!&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run deploy linebot-backup-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--source&lt;/span&gt; &lt;span class="nb"&gt;.&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--set-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_CLOUD_PROJECT=your-cool-project-id,ChannelSecret=YOUR_LINE_SECRET_XXXX,ChannelAccessToken=YOUR_LINE_TOKEN_XXXX,GOOGLE_CLIENT_ID=PENDING,GOOGLE_CLIENT_SECRET=PENDING,GOOGLE_REDIRECT_URL=PENDING"&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--allow-unauthenticated&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--quiet&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After successful deployment, we got a string of fragrant URLs: &lt;code&gt;https://linebot-backup-service-xxxxx.a.run.app&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: Completing Google OAuth Settings and Environment Variable Updates
&lt;/h3&gt;

&lt;p&gt;With the URL, I can go to the "API &amp;amp; Services" in Google Cloud Console to complete the settings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Create an &lt;strong&gt;OAuth consent screen&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Create credentials for a &lt;strong&gt;Web application&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt; Fill in the "Authorized redirect URI" with the URL we just got, plus &lt;code&gt;/oauth/callback&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;After getting the real ID and Secret, I directly pasted the information to Gemini CLI, and it automatically updated the service for me:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud run services update linebot-backup-service &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--region&lt;/span&gt; asia-east1 &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--update-env-vars&lt;/span&gt; &lt;span class="s2"&gt;"GOOGLE_REDIRECT_URL=https://[YOUR_URL]/oauth/callback,GOOGLE_CLIENT_ID=real-client-id.apps.googleusercontent.com,GOOGLE_CLIENT_SECRET=real-secret-xxxx"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Done! Finally, just go to the LINE Developers Console and fill in the Webhook.&lt;/p&gt;




&lt;h2&gt;
  
  
  Blood and Tears Pitfall Records During the Deployment Process
&lt;/h2&gt;

&lt;p&gt;It looks smooth, but in fact, the AI and I hit a few walls together. This is also the most real experience of using CLI tools.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Forgetting to Bind a Credit Card, the 390001 Error
&lt;/h3&gt;

&lt;p&gt;When executing the first &lt;code&gt;gcloud run deploy&lt;/code&gt;, the terminal directly spewed red text all over the face:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;FAILED_PRECONDITION: Billing account for project is not found...&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: Cloud Run and Cloud Build require the project to enable billing (Billing Enabled). This is a brand new test project, and I forgot to bind the billing account. &lt;strong&gt;Solution&lt;/strong&gt;: The AI immediately checked the project status for me (&lt;code&gt;gcloud beta billing projects describe&lt;/code&gt;) and asked me if I wanted to switch to a project with billing, or to fix it. I obediently went to the Console to bind my credit card, and the deployment was able to continue.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: The Evolution of Command Parameter Syntax
&lt;/h3&gt;

&lt;p&gt;When creating Firestore, the AI initially gave the command &lt;code&gt;--type=native-mode&lt;/code&gt; or &lt;code&gt;--type=native&lt;/code&gt;, but gcloud didn't appreciate it:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;ERROR: argument --type: Invalid choice: 'native-mode'&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: The CLI parameters of &lt;code&gt;gcloud&lt;/code&gt; will change with version updates. &lt;strong&gt;Solution&lt;/strong&gt;: Carefully look at the gcloud error message, and now the correct parameter values are &lt;code&gt;firestore-native&lt;/code&gt; or &lt;code&gt;datastore-mode&lt;/code&gt;. After changing to &lt;code&gt;--type=firestore-native&lt;/code&gt;, it passed smoothly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: The Invisible "Drive API"
&lt;/h3&gt;

&lt;p&gt;When everything was deployed, we encountered a permission error when testing "upload to Google Drive". &lt;strong&gt;Reason&lt;/strong&gt;: This is a Bot that helps you upload files to Drive, but when we enabled the API in the first step, we actually forgot to enable the protagonist: &lt;strong&gt;Google Drive API&lt;/strong&gt;! Without it, even if OAuth authorization is successful, the program will still be blocked. &lt;strong&gt;Solution&lt;/strong&gt;: I only entered the mysterious &lt;code&gt;"3."&lt;/code&gt; (implying the third checkpoint) into the terminal, and the AI immediately understood and added this critical blow:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud services &lt;span class="nb"&gt;enable &lt;/span&gt;drive.googleapis.com

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Through Gemini CLI, the originally tedious and error-prone infrastructure construction work has become a "two-person pair programming" session.&lt;/p&gt;

&lt;p&gt;AI can help you remember lengthy gcloud parameters, help you sort out the deployment logic (deploy with PENDING first and then update), and even adjust strategies quickly based on error messages when you encounter errors.&lt;/p&gt;

&lt;p&gt;This is the core spirit that &lt;strong&gt;Build With AI 2026&lt;/strong&gt; wants to convey: let AI handle the tedious DevOps chores, so that developers can focus more energy on innovation in core business logic.&lt;/p&gt;

&lt;p&gt;If you are still manually typing long and ugly gcloud commands, I strongly recommend you install Gemini CLI and give it a try!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>GCP Hands-on: Deploying OpenAB - Building a Gemini ACP Bridge for Telegram on GCE</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 12:01:51 +0000</pubDate>
      <link>https://dev.to/gde/gcp-hands-on-deploying-openab-building-a-gemini-acp-bridge-for-telegram-on-gce-1bd</link>
      <guid>https://dev.to/gde/gcp-hands-on-deploying-openab-building-a-gemini-acp-bridge-for-telegram-on-gce-1bd</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez62lmp04cnsnbxlwngj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez62lmp04cnsnbxlwngj.png" alt="image-20260502171732526" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Recently, in order to enable AI coding assistants (such as Claude Code or Gemini CLI) to be used directly on chat platforms, I started researching &lt;strong&gt;&lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;OpenAB&lt;/a&gt;&lt;/strong&gt;. This is a powerful bridge that can connect Slack, Discord, or Telegram to CLI tools that comply with the &lt;strong&gt;ACP (Agent Client Protocol)&lt;/strong&gt; standard.&lt;/p&gt;

&lt;p&gt;This article documents the complete practical process of deploying &lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;OpenAB&lt;/a&gt; on Google Cloud, specifically how to bypass authentication restrictions, handle Telegram's HTTPS requirements, and resolve path and permission issues in containerized deployments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;OpenAB Reference Documentation&lt;/strong&gt;: &lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;https://openabdev.github.io/openab/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OpenAB Repo&lt;/strong&gt;: &lt;a href="https://github.com/openabdev/openab" rel="noopener noreferrer"&gt;https://github.com/openabdev/openab&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Deployment Decision: Why GCE instead of Cloud Run?
&lt;/h2&gt;

&lt;p&gt;Although Cloud Run is my first choice, when dealing with OpenAB, &lt;strong&gt;Google Compute Engine (GCE)&lt;/strong&gt; is the best solution. There are two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Stateful Session:&lt;/strong&gt; OpenAB will start a child process (such as Gemini CLI) for each conversation thread. These processes must reside for a long time to maintain the conversation context. Cloud Run's automatic scaling mechanism will kill these processes, leading to conversation interruption.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Authentication Persistence&lt;/strong&gt;: The AI CLI's Token needs to be stored on the local disk. GCE, combined with Persistent Disk, can ensure that the login status does not disappear after restarting.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Practical Steps: Step-by-Step Deployment Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Writing an Automated Startup Script
&lt;/h3&gt;

&lt;p&gt;To standardize the deployment, we wrote a &lt;code&gt;setup-openab.sh&lt;/code&gt;. Its core task is to install Docker, create persistent directories, and dynamically generate &lt;code&gt;config.toml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The most critical part is the &lt;strong&gt;custom Docker Image&lt;/strong&gt;. Since the official OpenAB image does not necessarily include all AI tools, we install Node.js and &lt;code&gt;@google/gemini-cli&lt;/code&gt; on-site through Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ghcr.io/openabdev/openab:latest&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; curl &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://deb.nodesource.com/setup_20.x | bash - &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nodejs &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @google/gemini-cli
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; 1000&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Using gcloud to Create a GCE Instance
&lt;/h3&gt;

&lt;p&gt;We chose the &lt;code&gt;e2-medium&lt;/code&gt; specification and passed sensitive information (such as Bot Token) through Metadata to avoid hardcoding it in the script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute instances create openab-server &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-project-id &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1-b &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--machine-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;e2-medium &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-family&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;debian-11 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;debian-cloud &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata-from-file&lt;/span&gt; startup-script&lt;span class="o"&gt;=&lt;/span&gt;setup-openab.sh &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;tg_bot_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_BOT_TOKEN

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Configuring the Gemini API Key
&lt;/h3&gt;

&lt;p&gt;Unlike Kiro, which requires interactive login, &lt;code&gt;gemini-cli&lt;/code&gt; can directly read environment variables. We inject the API Key into OpenAB's &lt;code&gt;config.toml&lt;/code&gt; to make it run automatically in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[agent]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gemini"&lt;/span&gt;
&lt;span class="py"&gt;args&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"--acp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;GEMINI_API_KEY&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"AIzaSy..."&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Using Cloudflare Tunnel to Solve HTTPS Requirements
&lt;/h3&gt;

&lt;p&gt;Telegram Webhook strictly requires &lt;strong&gt;HTTPS&lt;/strong&gt;. Instead of setting up a complex Nginx + SSL, I chose to use &lt;strong&gt;Cloudflare Quick Tunnel&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Run on VM: &lt;code&gt;cloudflared tunnel --url http://localhost:8080&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Get a randomly generated HTTPS URL.&lt;/li&gt;
&lt;li&gt; Register Webhook: &lt;code&gt;curl "https://api.telegram.org/bot&amp;lt;TOKEN&amp;gt;/setWebhook?url=&amp;lt;CF_URL&amp;gt;/webhook/telegram&amp;amp;secret_token=&amp;lt;SECRET&amp;gt;"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Blood and Tears in the Migration Process: Technical Summary
&lt;/h2&gt;

&lt;p&gt;During the deployment process, we debugged several times, and here are the three major "pits" summarized:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Confusion of Image Sources
&lt;/h3&gt;

&lt;p&gt;At first, I tried to Pull &lt;code&gt;openabdev/openab&lt;/code&gt; from Docker Hub, but it always failed. Finally, I found that the current stable image of the project is placed in &lt;strong&gt;GitHub Container Registry (GHCR)&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: You must use &lt;code&gt;ghcr.io/openabdev/openab:latest&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Hardcoded Configuration Path
&lt;/h3&gt;

&lt;p&gt;OpenAB's Dockerfile expects the configuration file path to be &lt;code&gt;/etc/openab/config.toml&lt;/code&gt;. I initially mounted it to &lt;code&gt;/app/config.toml&lt;/code&gt;, which caused the container to crash immediately after startup and report an error.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: Correct the Docker Volume mount path to &lt;code&gt;/etc/openab/config.toml&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Security Secret Token Verification Failed
&lt;/h3&gt;

&lt;p&gt;Even if the URL is correct, Telegram messages are still rejected by the Gateway. The log shows &lt;code&gt;invalid or missing secret_token&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Reason&lt;/strong&gt;: &lt;code&gt;openab-gateway&lt;/code&gt; generates an internal checksum to prevent illegal requests.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: You must extract the Token from the Gateway container and pass it as the &lt;code&gt;secret_token&lt;/code&gt; parameter when &lt;code&gt;setWebhook&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary: The Perfect AI Bridging Solution
&lt;/h2&gt;

&lt;p&gt;Through this architecture, I successfully built a fully self-hosted, secure, and efficient AI assistant on GCP. It does not rely on expensive subscriptions, but directly utilizes Gemini's API capabilities and uses Telegram as the interaction interface.&lt;/p&gt;

&lt;p&gt;If you also want to set up a dedicated ACP bridge on the cloud, this combination of GCE + Docker + Cloudflare Tunnel will be the most balanced and stable choice.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
