<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Evan Lin</title>
    <description>The latest articles on DEV Community by Evan Lin (@evanlin).</description>
    <link>https://dev.to/evanlin</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F409957%2Fc150d4a7-cb20-469d-a230-bac27232c577.jpeg</url>
      <title>DEV Community: Evan Lin</title>
      <link>https://dev.to/evanlin</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/evanlin"/>
    <language>en</language>
    <item>
      <title>GCP Hands-on: Deploying OpenAB - Building a Gemini ACP Bridge for Telegram on GCE</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 12:01:51 +0000</pubDate>
      <link>https://dev.to/gde/gcp-hands-on-deploying-openab-building-a-gemini-acp-bridge-for-telegram-on-gce-1bd</link>
      <guid>https://dev.to/gde/gcp-hands-on-deploying-openab-building-a-gemini-acp-bridge-for-telegram-on-gce-1bd</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez62lmp04cnsnbxlwngj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fez62lmp04cnsnbxlwngj.png" alt="image-20260502171732526" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Recently, in order to enable AI coding assistants (such as Claude Code or Gemini CLI) to be used directly on chat platforms, I started researching &lt;strong&gt;&lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;OpenAB&lt;/a&gt;&lt;/strong&gt;. This is a powerful bridge that can connect Slack, Discord, or Telegram to CLI tools that comply with the &lt;strong&gt;ACP (Agent Client Protocol)&lt;/strong&gt; standard.&lt;/p&gt;

&lt;p&gt;This article documents the complete practical process of deploying &lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;OpenAB&lt;/a&gt; on Google Cloud, specifically how to bypass authentication restrictions, handle Telegram's HTTPS requirements, and resolve path and permission issues in containerized deployments.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;OpenAB Reference Documentation&lt;/strong&gt;: &lt;a href="https://openabdev.github.io/openab/" rel="noopener noreferrer"&gt;https://openabdev.github.io/openab/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;OpenAB Repo&lt;/strong&gt;: &lt;a href="https://github.com/openabdev/openab" rel="noopener noreferrer"&gt;https://github.com/openabdev/openab&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Deployment Decision: Why GCE instead of Cloud Run?
&lt;/h2&gt;

&lt;p&gt;Although Cloud Run is my first choice, when dealing with OpenAB, &lt;strong&gt;Google Compute Engine (GCE)&lt;/strong&gt; is the best solution. There are two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Stateful Session:&lt;/strong&gt; OpenAB will start a child process (such as Gemini CLI) for each conversation thread. These processes must reside for a long time to maintain the conversation context. Cloud Run's automatic scaling mechanism will kill these processes, leading to conversation interruption.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Authentication Persistence&lt;/strong&gt;: The AI CLI's Token needs to be stored on the local disk. GCE, combined with Persistent Disk, can ensure that the login status does not disappear after restarting.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Practical Steps: Step-by-Step Deployment Process
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Writing an Automated Startup Script
&lt;/h3&gt;

&lt;p&gt;To standardize the deployment, we wrote a &lt;code&gt;setup-openab.sh&lt;/code&gt;. Its core task is to install Docker, create persistent directories, and dynamically generate &lt;code&gt;config.toml&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The most critical part is the &lt;strong&gt;custom Docker Image&lt;/strong&gt;. Since the official OpenAB image does not necessarily include all AI tools, we install Node.js and &lt;code&gt;@google/gemini-cli&lt;/code&gt; on-site through Dockerfile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight docker"&gt;&lt;code&gt;&lt;span class="k"&gt;FROM&lt;/span&gt;&lt;span class="s"&gt; ghcr.io/openabdev/openab:latest&lt;/span&gt;
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; root&lt;/span&gt;
&lt;span class="k"&gt;RUN &lt;/span&gt;apt-get update &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; curl &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://deb.nodesource.com/setup_20.x | bash - &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nodejs &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="se"&gt;\
&lt;/span&gt;    npm &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-g&lt;/span&gt; @google/gemini-cli
&lt;span class="k"&gt;USER&lt;/span&gt;&lt;span class="s"&gt; 1000&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Using gcloud to Create a GCE Instance
&lt;/h3&gt;

&lt;p&gt;We chose the &lt;code&gt;e2-medium&lt;/code&gt; specification and passed sensitive information (such as Bot Token) through Metadata to avoid hardcoding it in the script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute instances create openab-server &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;your-project-id &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;asia-east1-b &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--machine-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;e2-medium &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-family&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;debian-11 &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;debian-cloud &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata-from-file&lt;/span&gt; startup-script&lt;span class="o"&gt;=&lt;/span&gt;setup-openab.sh &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nv"&gt;tg_bot_token&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_BOT_TOKEN

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Configuring the Gemini API Key
&lt;/h3&gt;

&lt;p&gt;Unlike Kiro, which requires interactive login, &lt;code&gt;gemini-cli&lt;/code&gt; can directly read environment variables. We inject the API Key into OpenAB's &lt;code&gt;config.toml&lt;/code&gt; to make it run automatically in the background:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight toml"&gt;&lt;code&gt;&lt;span class="nn"&gt;[agent]&lt;/span&gt;
&lt;span class="py"&gt;command&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"gemini"&lt;/span&gt;
&lt;span class="py"&gt;args&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"--acp"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="py"&gt;env&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="py"&gt;GEMINI_API_KEY&lt;/span&gt; &lt;span class="p"&gt;=&lt;/span&gt; &lt;span class="s"&gt;"AIzaSy..."&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 4: Using Cloudflare Tunnel to Solve HTTPS Requirements
&lt;/h3&gt;

&lt;p&gt;Telegram Webhook strictly requires &lt;strong&gt;HTTPS&lt;/strong&gt;. Instead of setting up a complex Nginx + SSL, I chose to use &lt;strong&gt;Cloudflare Quick Tunnel&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Run on VM: &lt;code&gt;cloudflared tunnel --url http://localhost:8080&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Get a randomly generated HTTPS URL.&lt;/li&gt;
&lt;li&gt; Register Webhook: &lt;code&gt;curl "https://api.telegram.org/bot&amp;lt;TOKEN&amp;gt;/setWebhook?url=&amp;lt;CF_URL&amp;gt;/webhook/telegram&amp;amp;secret_token=&amp;lt;SECRET&amp;gt;"&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Blood and Tears in the Migration Process: Technical Summary
&lt;/h2&gt;

&lt;p&gt;During the deployment process, we debugged several times, and here are the three major "pits" summarized:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: Confusion of Image Sources
&lt;/h3&gt;

&lt;p&gt;At first, I tried to Pull &lt;code&gt;openabdev/openab&lt;/code&gt; from Docker Hub, but it always failed. Finally, I found that the current stable image of the project is placed in &lt;strong&gt;GitHub Container Registry (GHCR)&lt;/strong&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: You must use &lt;code&gt;ghcr.io/openabdev/openab:latest&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Hardcoded Configuration Path
&lt;/h3&gt;

&lt;p&gt;OpenAB's Dockerfile expects the configuration file path to be &lt;code&gt;/etc/openab/config.toml&lt;/code&gt;. I initially mounted it to &lt;code&gt;/app/config.toml&lt;/code&gt;, which caused the container to crash immediately after startup and report an error.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: Correct the Docker Volume mount path to &lt;code&gt;/etc/openab/config.toml&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Security Secret Token Verification Failed
&lt;/h3&gt;

&lt;p&gt;Even if the URL is correct, Telegram messages are still rejected by the Gateway. The log shows &lt;code&gt;invalid or missing secret_token&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Reason&lt;/strong&gt;: &lt;code&gt;openab-gateway&lt;/code&gt; generates an internal checksum to prevent illegal requests.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Solution&lt;/strong&gt;: You must extract the Token from the Gateway container and pass it as the &lt;code&gt;secret_token&lt;/code&gt; parameter when &lt;code&gt;setWebhook&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary: The Perfect AI Bridging Solution
&lt;/h2&gt;

&lt;p&gt;Through this architecture, I successfully built a fully self-hosted, secure, and efficient AI assistant on GCP. It does not rely on expensive subscriptions, but directly utilizes Gemini's API capabilities and uses Telegram as the interaction interface.&lt;/p&gt;

&lt;p&gt;If you also want to set up a dedicated ACP bridge on the cloud, this combination of GCE + Docker + Cloudflare Tunnel will be the most balanced and stable choice.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>googlecloud</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>GCP in Action: Building a Persistent AI Assistant with GCE, Hermes Agent, and Telegram</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 12:01:42 +0000</pubDate>
      <link>https://dev.to/gde/gcp-in-action-building-a-persistent-ai-assistant-with-gce-hermes-agent-and-telegram-1mlg</link>
      <guid>https://dev.to/gde/gcp-in-action-building-a-persistent-ai-assistant-with-gce-hermes-agent-and-telegram-1mlg</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4z6ajwagaqahnm2rv2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4z6ajwagaqahnm2rv2k.png" alt="image-20260502161538962" width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;After solving the LINE Bot's Vertex AI migration, I started thinking: Could there be an AI assistant that is "more proactive" and "has long-term memory"? At this time, I set my sights on &lt;a href="https://github.com/nousresearch/hermes-agent" rel="noopener noreferrer"&gt;NousResearch's open-source &lt;strong&gt;Hermes Agent&lt;/strong&gt;&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Unlike a typical Chatbot, Hermes is designed as an "operating system that breathes". It can execute Shell commands, write Python scripts, manage long-term memory, and even stay in touch with you via different Gateways (Telegram, Discord) at any time.&lt;/p&gt;

&lt;p&gt;To make it available 24/7, I chose to deploy it on &lt;strong&gt;Google Compute Engine (GCE)&lt;/strong&gt;. This article will document the deployment process from scratch, as well as the pitfalls I encountered when configuring the latest &lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt; model.&lt;/p&gt;




&lt;h2&gt;
  
  
  Environment Parameter Preparation
&lt;/h2&gt;

&lt;p&gt;Before you start, please make sure you have these necessary parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PROJECT_ID&lt;/strong&gt;: &lt;code&gt;YOUR_PROJECT_ID&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;LOCATION&lt;/strong&gt;: &lt;code&gt;global&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GOOGLE_API_KEY&lt;/strong&gt;: &lt;code&gt;YOUR_GOOGLE_API_KEY&lt;/code&gt; (Obtained from Google AI Studio)&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Step 1: Create a GCE Instance
&lt;/h2&gt;

&lt;p&gt;Hermes Agent needs some computing power to handle tool use. It is recommended to use the &lt;code&gt;e2-medium&lt;/code&gt; specification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute instances create hermes-agent-vm &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;YOUR_PROJECT_ID &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1-a &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--machine-type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;e2-medium &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-family&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ubuntu-2204-lts &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--image-project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;ubuntu-os-cloud &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--boot-disk-size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;30GB &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nt"&gt;--metadata&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;startup-script&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'#!/bin/bash
        apt-get update
        apt-get install -y git curl python3-pip python3-venv nodejs npm
    '&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 2: Install Hermes Agent
&lt;/h2&gt;

&lt;p&gt;After SSHing into the VM, use the official one-click installation script directly.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Enter the VM&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;gcloud compute ssh hermes-agent-vm &lt;span class="nt"&gt;--zone&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;us-central1-a

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Execute the installation&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash
&lt;span class="nb"&gt;source&lt;/span&gt; ~/.bashrc

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Step 3: Configure Gemini 2.5 Flash (SOP Practice)
&lt;/h2&gt;

&lt;p&gt;This is the most likely place to step on a landmine in the entire exercise. Hermes may default to pointing to non-existent or outdated model identifiers.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Create a configuration file&lt;/strong&gt;: In &lt;code&gt;~/.hermes/config.yaml&lt;/code&gt;, we must precisely specify &lt;strong&gt;Gemini 2.5 Flash&lt;/strong&gt;, and &lt;strong&gt;do not include the &lt;code&gt;google/&lt;/code&gt; prefix&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Set the API Key&lt;/strong&gt;: Write the key and permission settings in &lt;code&gt;~/.hermes/.env&lt;/code&gt;:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Step 4: Connect to Telegram and Background Persistence
&lt;/h2&gt;

&lt;p&gt;To prevent the Agent from disappearing after the SSH connection is lost, we use &lt;strong&gt;Systemd&lt;/strong&gt; to manage it.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Create a Systemd service&lt;/strong&gt; (&lt;code&gt;/etc/systemd/system/hermes.service&lt;/code&gt;):
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight systemd"&gt;&lt;code&gt;&lt;span class="k"&gt;[Unit]&lt;/span&gt;
&lt;span class="nt"&gt;Description&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;Hermes Agent Gateway
&lt;span class="nt"&gt;After&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;network.target

&lt;span class="k"&gt;[Service]&lt;/span&gt;
&lt;span class="nt"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;simple
&lt;span class="nt"&gt;User&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;root
&lt;span class="nt"&gt;Environment&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;HOME=/root
&lt;span class="nt"&gt;Environment&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;PYTHONUNBUFFERED=1
&lt;span class="nt"&gt;ExecStart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;/usr/local/lib/hermes-agent/venv/bin/hermes gateway run
&lt;span class="nt"&gt;Restart&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;always
&lt;span class="nt"&gt;RestartSec&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;10

&lt;span class="k"&gt;[Install]&lt;/span&gt;
&lt;span class="nt"&gt;WantedBy&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;multi-user.target

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Start the service&lt;/strong&gt;:
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl daemon-reload
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl &lt;span class="nb"&gt;enable &lt;/span&gt;hermes
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart hermes

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Blood and Tears in the Migration Process: Why Isn't My Agent Responding?
&lt;/h2&gt;

&lt;p&gt;Even with the correct configuration, I still encountered the dilemma of "the Agent reads messages but doesn't reply". After checking the logs (&lt;code&gt;journalctl -u hermes&lt;/code&gt;), I found several deep pitfalls:&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 1: The 404 Ghost of Gemini 3.0
&lt;/h3&gt;

&lt;p&gt;I tried to pursue the latest version when configuring, and used &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;. As a result, the logs spewed out a bunch of &lt;strong&gt;404 Model Not Found&lt;/strong&gt;. &lt;strong&gt;Reason&lt;/strong&gt;: The internal &lt;code&gt;auxiliary_client.py&lt;/code&gt; of Hermes hardcodes many &lt;code&gt;gemini-3-flash-preview&lt;/code&gt; as the default value. When these auxiliary functions (such as generating titles) report errors, it will affect the reply logic of the entire Gateway. &lt;strong&gt;Solution&lt;/strong&gt;: Manually define all &lt;code&gt;auxiliary&lt;/code&gt; models as &lt;code&gt;gemini-2.5-flash&lt;/code&gt; in &lt;code&gt;config.yaml&lt;/code&gt;, or directly patch the source code with &lt;code&gt;sed&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 2: Prefix Confusion of Model Identifiers
&lt;/h3&gt;

&lt;p&gt;In different SDKs, some people use &lt;code&gt;google/gemini-2.5-flash&lt;/code&gt;, and some people use &lt;code&gt;gemini-2.5-flash&lt;/code&gt;. &lt;strong&gt;Experience&lt;/strong&gt;: In Hermes' Gemini Provider, &lt;strong&gt;using the short name &lt;code&gt;gemini-2.5-flash&lt;/code&gt; directly is the safest&lt;/strong&gt;. Adding &lt;code&gt;google/&lt;/code&gt; will instead cause API routing errors.&lt;/p&gt;

&lt;h3&gt;
  
  
  Pitfall 3: Conflict between Systemd and "Processes Already Running"
&lt;/h3&gt;

&lt;p&gt;When you manually run &lt;code&gt;hermes gateway&lt;/code&gt; and then start the service, the system will report &lt;code&gt;Gateway already running (PID xxxx)&lt;/code&gt;. &lt;strong&gt;Solution&lt;/strong&gt;: Before &lt;code&gt;ExecStart&lt;/code&gt; in Systemd, you can add an &lt;code&gt;ExecStartPre=/usr/bin/pkill -9 -f hermes || true&lt;/code&gt; to ensure a clean environment every time you start.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Now, my dedicated Hermes Agent is running stably on GCE and is available via Telegram at any time. It can not only help me find information, but also run some simple computing scripts for me directly on the cloud VM.&lt;/p&gt;

&lt;p&gt;This deployment taught me: &lt;strong&gt;In the face of rapidly updating models, the official documentation (or MCP tool query) is the only truth&lt;/strong&gt;. Don't blindly pursue the latest version number; ensuring that the identifier matches the current API environment is the key to stable operation.&lt;/p&gt;

&lt;p&gt;If you also want a 24-hour AI digital double, get a machine set up according to this SOP!&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>googlecloud</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Gemini 3.1: Native TTS for Easier, More Powerful Summary Reading</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 10:08:42 +0000</pubDate>
      <link>https://dev.to/gde/gemini-31-native-tts-for-easier-more-powerful-summary-reading-2ep9</link>
      <guid>https://dev.to/gde/gemini-31-native-tts-for-easier-more-powerful-summary-reading-2ep9</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwuiw99cubzhnf5ex409y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fwuiw99cubzhnf5ex409y.png" alt="Finder 2026-04-16 21.43.57" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;In the previous practical session, we used Gemini 3.1 Flash Live to achieve speech recognition, and through the "side-attack" method of the Gemini 2.5 Live API, we barely achieved the text-to-speech (TTS) function.&lt;/p&gt;

&lt;p&gt;But in April 2026, Google officially released &lt;a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-tts/" rel="noopener noreferrer"&gt;&lt;strong&gt;Gemini 3.1 Flash TTS&lt;/strong&gt;&lt;/a&gt;. This is a native model specifically designed for audio output, no longer requiring a Live WebSocket, and can directly output high-quality audio through the standard &lt;code&gt;generate_content&lt;/code&gt; process.&lt;/p&gt;

&lt;p&gt;As a developer, of course, you want to follow up immediately with a more elegant and native solution. This article will share how to upgrade the LINE Bot's text-to-speech summary function to Gemini 3.1 Native TTS, and the "asynchronous pit" encountered in the process.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Upgrade: From Live API to Native TTS
&lt;/h2&gt;

&lt;p&gt;The previous reading function was simulated using the Gemini 2.5 Live API. Although it was usable, it had several shortcomings:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;High complexity&lt;/strong&gt;: Requires managing the WebSocket connection lifecycle.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Model limitations&lt;/strong&gt;: Must use a specific &lt;code&gt;native-audio&lt;/code&gt; model, and primarily supports &lt;code&gt;us-central1&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Fixed return format&lt;/strong&gt;: The sampling rate is usually fixed at 16kHz.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The emergence of &lt;strong&gt;Gemini 3.1 Flash TTS&lt;/strong&gt; changed all this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Model name&lt;/strong&gt;: &lt;code&gt;gemini-3.1-flash-tts-preview&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Consistent interface&lt;/strong&gt;: Uses the familiar &lt;code&gt;generate_content_stream&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Dynamic parameters&lt;/strong&gt;: Supports automatically detecting the sampling rate from the returned MIME type (usually increased to 24kHz, better sound quality).&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Core Code Evolution (tools/tts_tool.py)
&lt;/h2&gt;

&lt;p&gt;The new implementation has become more concise, with the focus on the &lt;code&gt;response_modalities=["audio"]&lt;/code&gt; setting:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;text_to_speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;GOOGLE_AI_API_KEY&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;http_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_version&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1beta&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="c1"&gt;# Add localization instructions to make the tone more natural
&lt;/span&gt;                &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please use Traditional Chinese with Taiwanese vocabulary, and read the following summary in a friendly and natural tone. ## Transcript:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;response_modalities&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;speech_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;SpeechConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;VoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;prebuilt_voice_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;PrebuiltVoiceConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;voice_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Zephyr&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;pcm_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;24000&lt;/span&gt; &lt;span class="c1"&gt;# Default value
&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# ⚠️ This is the big pit that almost made me stay up all night fixing it
&lt;/span&gt;        &lt;span class="n"&gt;response_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-tts-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response_stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                        &lt;span class="c1"&gt;# Get the sampling rate dynamically from the MIME type (e.g. audio/L16;rate=24000)
&lt;/span&gt;                        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                            &lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;parse_rate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TTS Error: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt;

    &lt;span class="n"&gt;pcm_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sample_rate&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# Subsequently, it is also converted to m4a via ffmpeg and sent to LINE...
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  The Pitfall: The Missing &lt;code&gt;await&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;This upgrade encountered a very subtle &lt;code&gt;TypeError&lt;/code&gt;, which kept popping up after remote deployment:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;TypeError: 'async for' requires an object with __aiter__ method, got coroutine&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  ❌ Incorrect Writing
&lt;/h3&gt;

&lt;p&gt;When I wrote it according to the example, I intuitively thought I could directly &lt;code&gt;async for&lt;/code&gt; a method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is wrong!
&lt;/span&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content_stream&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ✅ Correct Solution
&lt;/h3&gt;

&lt;p&gt;In the asynchronous version of the Google GenAI Python SDK, &lt;code&gt;generate_content_stream&lt;/code&gt; itself is an &lt;code&gt;async&lt;/code&gt; function, and it &lt;strong&gt;returns&lt;/strong&gt; an iterator. So you must &lt;code&gt;await&lt;/code&gt; to get that iterator, and then perform &lt;code&gt;async for&lt;/code&gt; on it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Correct approach: two steps
&lt;/span&gt;&lt;span class="n"&gt;response_stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content_stream&lt;/span&gt;&lt;span class="p"&gt;(...)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response_stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This detail may not exist in general synchronous code or some older SDKs, but when dealing with the asynchronous stream of 3.1 Flash TTS, this is the key to whether it can run successfully.&lt;/p&gt;




&lt;h2&gt;
  
  
  Localization Adjustment: Making the Bot Speak "Taiwanese"
&lt;/h2&gt;

&lt;p&gt;Although the summary itself is already in Traditional Chinese, the TTS model sometimes has non-native accents or vocabulary when reading. We solved this problem through Prompt Engineering:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"Please use &lt;strong&gt;Taiwanese vocabulary&lt;/strong&gt; in Traditional Chinese, and read it in a &lt;strong&gt;friendly and natural&lt;/strong&gt; tone..."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After adding this line of instruction, the audio output by Gemini is closer to the habits of Taiwanese users in terms of intonation and sentence breaks, which greatly enhances the friendliness of the "reading summary".&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary: Changes Brought by Native TTS
&lt;/h2&gt;

&lt;p&gt;After migrating from Live API to Native TTS:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;More stable connection&lt;/strong&gt;: No longer need to maintain a long-term WebSocket.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Improved sound quality&lt;/strong&gt;: Native support for 24kHz sampling rate.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Easy to maintain&lt;/strong&gt;: The amount of code is reduced by about 30%, and the logic is more direct.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This experience also reminds me that even a seemingly mature SDK should carefully check the return value type when dealing with the &lt;code&gt;async&lt;/code&gt; mode.&lt;/p&gt;

&lt;p&gt;If you also want your LINE Bot to speak, Gemini 3.1 Flash TTS is definitely the best choice at the moment.&lt;/p&gt;

&lt;p&gt;The complete code has been updated to &lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, see you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gemini</category>
      <category>google</category>
    </item>
    <item>
      <title>GCP in Action: Migrating a LINE Bot from AI Studio to Vertex AI to Solve 429 Errors</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 02 May 2026 10:08:31 +0000</pubDate>
      <link>https://dev.to/gde/gcp-in-action-migrating-a-line-bot-from-ai-studio-to-vertex-ai-to-solve-429-errors-47jo</link>
      <guid>https://dev.to/gde/gcp-in-action-migrating-a-line-bot-from-ai-studio-to-vertex-ai-to-solve-429-errors-47jo</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4oswk3rx7cz9bcvye4uy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4oswk3rx7cz9bcvye4uy.png" alt="image-20260421011411264" width="713" height="369"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Recently, the LINE business card assistant robot (&lt;code&gt;linebot-namecard-python&lt;/code&gt;) deployed on Google Cloud Run suddenly went down. After checking the logs with &lt;code&gt;gcloud logging read&lt;/code&gt;, the following ruthless error appeared:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;google.api_core.exceptions.ResourceExhausted: 429 Your billing account has exceeded its monthly spending cap.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;It turned out that we used the API Key provided by Google AI Studio (&lt;code&gt;google.generativeai&lt;/code&gt; package) for rapid development, and as a result, we silently maxed out the monthly free quota.&lt;/p&gt;

&lt;p&gt;As a developer who needs to launch a service, it's time to "level up" the architecture and migrate the model calls to the enterprise-grade &lt;strong&gt;Google Cloud Vertex AI&lt;/strong&gt;, directly using GCP's IAM permissions and billing system. This article will share the migration process and the various pitfalls encountered along the way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Technical Upgrade: From AI Studio to Vertex AI
&lt;/h2&gt;

&lt;p&gt;To migrate a project from the Google AI Studio SDK to Vertex AI, there are three main steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Replace the dependency package&lt;/strong&gt;: In &lt;code&gt;requirements.txt&lt;/code&gt;, remove the old &lt;code&gt;google.generativeai&lt;/code&gt; and replace it with &lt;code&gt;google-cloud-aiplatform&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Update environment variable settings&lt;/strong&gt;: In &lt;code&gt;config.py&lt;/code&gt;, we no longer need &lt;code&gt;GEMINI_API_KEY&lt;/code&gt;, but instead use GCP's &lt;code&gt;PROJECT_ID&lt;/code&gt; and &lt;code&gt;LOCATION&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;PROJECT_ID&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PROJECT_ID&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;LOCATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Default to global
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Core code rewriting (gemini_utils.py)&lt;/strong&gt;: Although the SDK interface of Vertex AI is similar, the handling of multimodal data (such as images) is slightly stricter. We need to convert &lt;code&gt;PIL.Image&lt;/code&gt; to the &lt;code&gt;vertexai.generative_models.Part&lt;/code&gt; format:&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Pitfall 1: Residual Old SDK Causing Cloud Run Startup Failure
&lt;/h2&gt;

&lt;p&gt;Happily, I updated the environment variables with &lt;code&gt;gcloud run services update&lt;/code&gt;, but the Cloud Run deployment failed, and the container couldn't even start.&lt;/p&gt;

&lt;p&gt;After checking the logs, I found:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;ModuleNotFoundError: No module named 'google.generativeai'&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason&lt;/strong&gt;: Although &lt;code&gt;gemini_utils.py&lt;/code&gt; has been rewritten, the main program &lt;code&gt;app/main.py&lt;/code&gt; still contains &lt;code&gt;import google.generativeai as genai&lt;/code&gt; and the initialization code &lt;code&gt;genai.configure(api_key=...)&lt;/code&gt;. Since the package has been removed from &lt;code&gt;requirements.txt&lt;/code&gt;, the container will naturally fail to find the module and crash during startup.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution&lt;/strong&gt;: Globally grep the project, completely remove all references to the old SDK, and then repackage the Docker image using Cloud Build and push it again.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfall 2: Vertex AI Model Name and Region Restrictions (404 Not Found)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxectrt95bp3j5f92txsp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxectrt95bp3j5f92txsp.png" alt="Google Chrome 2026-04-21 01.12.46" width="800" height="705"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The code is cleaned up, and the container also starts successfully, but when I send a business card image on LINE, the robot throws a 500 error. After reviewing the logs again, this time it's:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;google.api_core.exceptions.NotFound: 404 Publisher Model ... gemini-1.5-flash was not found or your project does not have access to it.&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the biggest pit I encountered this time! In Google AI Studio, you can casually use the alias &lt;code&gt;gemini-1.5-flash&lt;/code&gt;; &lt;strong&gt;but in certain regions of Vertex AI (such as &lt;code&gt;asia-east1&lt;/code&gt; Taiwan), you must specify the exact version number&lt;/strong&gt;, such as &lt;code&gt;gemini-1.5-flash-002&lt;/code&gt;, otherwise the API will directly tell you that the model cannot be found.&lt;/p&gt;

&lt;h3&gt;
  
  
  Advanced Challenge: I want to try Gemini 3.0 Flash Preview!
&lt;/h3&gt;

&lt;p&gt;To solve this problem, I had an idea. Since I'm going to change it, why not upgrade directly to the latest &lt;code&gt;gemini-3-flash-preview&lt;/code&gt;!&lt;/p&gt;

&lt;p&gt;As a result, I wrote a test script and found:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  ❌ &lt;code&gt;asia-east1&lt;/code&gt; (Taiwan): 404 Not Found&lt;/li&gt;
&lt;li&gt;  ❌ &lt;code&gt;us-central1&lt;/code&gt; (Central US): 404 Not Found&lt;/li&gt;
&lt;li&gt;  ✅ &lt;strong&gt;&lt;code&gt;global&lt;/code&gt; (Global): SUCCESS!&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's right, currently this preview model on Vertex AI &lt;strong&gt;is only available in the &lt;code&gt;global&lt;/code&gt; region&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Final Solution&lt;/strong&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Change the default region in &lt;code&gt;config.py&lt;/code&gt; to &lt;code&gt;global&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; Call &lt;code&gt;vertexai.init(project="line-vertex", location="global")&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; The Cloud Run environment variable &lt;code&gt;--update-env-vars="LOCATION=global"&lt;/code&gt; must also be aligned.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary: Changes Brought by Vertex AI
&lt;/h2&gt;

&lt;p&gt;After some effort, the business card robot finally came back to life and used the latest Gemini 3 Flash model. After migrating from AI Studio to Vertex AI, several significant benefits have been brought:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Get rid of Quota Anxiety&lt;/strong&gt;: No longer limited by AI Studio's free quota or Spending Cap, directly deduct through GCP billing, suitable for production environments.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Security Enhancement&lt;/strong&gt;: Removed the plaintext API Key in the environment variables and used GCP's Default Application Credentials (IAM) for authentication, making the architecture more secure.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Stability&lt;/strong&gt;: Enterprise-grade SLA guarantee.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This experience also reminded me that when using Vertex AI on GCP, &lt;strong&gt;you must first check the official documentation to confirm the correspondence between "Region" and "Model Name"&lt;/strong&gt; to avoid being overwhelmed by 404 errors after deployment.&lt;/p&gt;

&lt;p&gt;If you also have a project that is about to move from AI Studio to Vertex AI, I hope this pitfall record can help you avoid some detours!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>googlecloud</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>[Gemini] Building a LINE E-commerce Chatbot That Can "Tell Stories from Images"</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:08:30 +0000</pubDate>
      <link>https://dev.to/evanlin/gemini-building-a-line-e-commerce-chatbot-that-can-tell-stories-from-images-41i0</link>
      <guid>https://dev.to/evanlin/gemini-building-a-line-e-commerce-chatbot-that-can-tell-stories-from-images-41i0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc7ulj3k2ehr5j0fwdch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuc7ulj3k2ehr5j0fwdch.png" alt="image-20260225234804185" width="800" height="860"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxox886v85qv8909apuo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxxox886v85qv8909apuo.png" alt="image-20260225234701217" width="800" height="858"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://ai.google.dev/gemini-api/docs/function-calling?hl=zh-tw#multimodal" rel="noopener noreferrer"&gt;Gemini API - Function Calling with Multimodal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub: linebot-gemini-multimodel-funcal&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/function-calling#mm-fr" rel="noopener noreferrer"&gt;Vertex AI - Multimodal Function Response&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Complete code &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;I believe many people have used the combination of LINE Bot + Function Calling. When a user asks "What clothes did I buy last month?", the Bot calls the database query function, retrieves the order data, and then Gemini answers based on that JSON:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="err"&gt;Traditional&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;process&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;designed&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;by&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;developers:&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;span class="err"&gt;User:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Help me see the jacket I bought before"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Bot:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="err"&gt;Call&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;get_order_history()&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Function&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;returns:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"product_name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Brown pilot jacket"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"order_date"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-01-15"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="err"&gt;Gemini:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"You bought a brown pilot jacket on January 15th for NT$1,890."&lt;/span&gt;&lt;span class="w"&gt;

&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The answer is completely correct, but it always feels like something is missing—the user is talking about "that jacket," and Gemini is just restating the text in the JSON, with no way to "confirm" what the jacket looks like. If there happen to be three jackets in the database, the AI can't even determine which one is the one the user remembers.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;AI can read text, but it can't see pictures&lt;/strong&gt;—this limitation has always been a blind spot in the traditional Function Calling architecture.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkluxi9r5zkhj1vys100.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvkluxi9r5zkhj1vys100.png" alt="Google Chrome 2026-02-26 10.34.51" width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3gpn1tkbj80ifh65vsr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fw3gpn1tkbj80ifh65vsr.png" alt="Google Chrome 2026-02-26 10.34.58" width="800" height="443"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This problem was truly solved only after Gemini introduced &lt;strong&gt;Multimodal Function Response&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Multimodal Function Response?
&lt;/h2&gt;

&lt;p&gt;The traditional Function Calling process is as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User message] → Gemini → [function_call] → [Execute function] → [Return JSON] → Gemini → [Text answer]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Multimodal Function Response&lt;/strong&gt; changes that middle step. The function can not only return JSON, but also include images (JPEG/PNG/WebP) or documents (PDF) in the same response:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge58c6ayrjas18sl2qjz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fge58c6ayrjas18sl2qjz.png" alt="Google Chrome 2026-02-25 23.04.28" width="800" height="439"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[User message] → Gemini → [function_call] → [Execute function] → [Return JSON + image bytes] → Gemini → [Text answer that has seen the image]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When Gemini generates the next round of answers, it can "see" both the structured data and the image returned by the function, thereby generating richer and more accurate responses.&lt;/p&gt;

&lt;p&gt;The official currently supported media formats:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Category&lt;/th&gt;
&lt;th&gt;Supported formats&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Image&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;image/jpeg&lt;/code&gt;, &lt;code&gt;image/png&lt;/code&gt;, &lt;code&gt;image/webp&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Document&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;application/pdf&lt;/code&gt;, &lt;code&gt;text/plain&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The application scenarios for this feature are very broad: e-commerce customer service (identifying product images), medical consultation (analyzing PDF inspection reports), design review (giving suggestions based on screenshots)... almost all scenarios that require "functions to return visual data for AI analysis" are applicable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Goal
&lt;/h2&gt;

&lt;p&gt;This time, I used Multimodal Function Response to create a &lt;strong&gt;LINE e-commerce customer service robot&lt;/strong&gt;, demonstrating the following scenario:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User: "Help me see the jacket I bought before" Bot (traditional): "You bought a brown pilot jacket." Bot (Multimodal): "From the photo, you can see that this is a brown pilot jacket, made of lightweight nylon, with metal zipper pockets on the sides. This is your January 15th order ORD-2026-0115, for a total of NT$1,890, and it has been delivered." + &lt;strong&gt;Product photo&lt;/strong&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference is obvious: Gemini really "saw" the jacket, rather than just restating the text in the database.&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Why not use Google ADK?
&lt;/h3&gt;

&lt;p&gt;Originally, this repo used Google ADK (Agent Development Kit) to manage the Agent. The &lt;code&gt;Runner&lt;/code&gt; and &lt;code&gt;Agent&lt;/code&gt; of ADK encapsulated the entire process of Function Calling, which was very convenient.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;But Multimodal Function Response requires manually including image bytes in the &lt;code&gt;parts&lt;/code&gt; of the function response, and ADK completely encapsulates this layer, so it can't be intervened.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;So this time, I directly used &lt;code&gt;google.genai.Client&lt;/code&gt; to implement the iterative cycle of function calls myself:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old architecture (ADK)
&lt;/span&gt;&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;...)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;runner&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_async&lt;/span&gt;&lt;span class="p"&gt;(...):&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt; &lt;span class="c1"&gt;# ADK handles all function calls for you, but you can't control the response content
&lt;/span&gt;
&lt;span class="c1"&gt;# New architecture (directly use google.genai)
&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Handle function calls yourself, include images yourself
&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Overall architecture
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LINE User
    │
    ▼ POST /
FastAPI Webhook Handler
    │
    ▼
EcommerceAgent.process_message(text, line_user_id)
    │
    ├─ ① Call Gemini (with conversation history)
    │
    ├─ ② Gemini decides to call a tool → function_call
    │
    ├─ ③ _execute_tool()
    │ ├─ Execute query function (search_products / get_order_history / get_product_details)
    │ └─ Read real product photos in the img/ directory (Unsplash JPEG)
    │
    ├─ ④ Construct Multimodal Function Response
    │ └─ FunctionResponsePart(inline_data=FunctionResponseBlob(data=image_bytes))
    │
    ├─ ⑤ Call Gemini again (Gemini sees the image + data)
    │
    └─ ⑥ Return (ai_text, image_bytes)
    │
    ▼
LINE Reply:
  TextSendMessage(text=ai_text)
  ImageSendMessage(url=BOT_HOST_URL/images/{uuid}) ← FastAPI /images endpoint provides

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How to get product images?
&lt;/h3&gt;

&lt;p&gt;This demo uses real &lt;strong&gt;Unsplash clothing photography photos&lt;/strong&gt;. Each of the five products corresponds to an actual photo of the item, stored in the &lt;code&gt;img/&lt;/code&gt; directory. The reading logic is very simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;generate_product_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Read the product image and return JPEG bytes.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image_path&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each product in &lt;code&gt;PRODUCTS_DB&lt;/code&gt; has an &lt;code&gt;image_path&lt;/code&gt; field pointing to the corresponding image file:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Product ID&lt;/th&gt;
&lt;th&gt;Name&lt;/th&gt;
&lt;th&gt;Image&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;P001&lt;/td&gt;
&lt;td&gt;Brown pilot jacket&lt;/td&gt;
&lt;td&gt;tobias-tullius-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P002&lt;/td&gt;
&lt;td&gt;White cotton university T&lt;/td&gt;
&lt;td&gt;mediamodifier-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P003&lt;/td&gt;
&lt;td&gt;Dark blue denim jacket&lt;/td&gt;
&lt;td&gt;caio-coelho-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P004&lt;/td&gt;
&lt;td&gt;Beige knitted shawl&lt;/td&gt;
&lt;td&gt;milada-vigerova-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;P005&lt;/td&gt;
&lt;td&gt;Light blue simple T-shirt&lt;/td&gt;
&lt;td&gt;cristofer-maximilian-…-unsplash.jpg&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The image bytes read have two uses:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; As &lt;code&gt;FunctionResponseBlob&lt;/code&gt; to include for Gemini analysis—real photos allow Gemini to describe the actual fabric texture and tailoring details&lt;/li&gt;
&lt;li&gt; Temporarily stored in the &lt;code&gt;image_cache&lt;/code&gt; dict, provided to the LINE Bot for display through the FastAPI &lt;code&gt;/images/{uuid}&lt;/code&gt; endpoint&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Detailed explanation of the core code
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Define tools (FunctionDeclaration)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionDeclaration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Query the current user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s order history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;time_range&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Time range: all / last_month / last_3_months&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;enum&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_month&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;last_3_months&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                    &lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;required&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="c1"&gt;# ... search_products, get_product_details
&lt;/span&gt;    &lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Function call cycle (up to 5 iterations)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;contents&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_history&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_iteration&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# Up to 5 times, to prevent infinite loops
&lt;/span&gt;        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;system_instruction&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;_SYSTEM_INSTRUCTION&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ECOMMERCE_TOOLS&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;model_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Find all function_call parts
&lt;/span&gt;        &lt;span class="n"&gt;fc_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;fc_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# No function call → final text response
&lt;/span&gt;            &lt;span class="n"&gt;final_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;model_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;

        &lt;span class="c1"&gt;# Has function call → execute tool, include image
&lt;/span&gt;        &lt;span class="n"&gt;tool_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;fc_part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;fc_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_execute_tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;tool_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_multimodal_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fc_part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tool&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_parts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Construct Multimodal Function Response (the most critical step)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_build_multimodal_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;func_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;multimodal_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# ⚠️ Note: Use FunctionResponseBlob here, not types.Blob!
&lt;/span&gt;        &lt;span class="n"&gt;multimodal_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# raw bytes, SDK handles base64 internally
&lt;/span&gt;                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;func_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result_dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# Structured JSON data
&lt;/span&gt;        &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;multimodal_parts&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Image is here! Gemini can "see" it after receiving it
&lt;/span&gt;    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gemini will receive both &lt;code&gt;result_dict&lt;/code&gt; (order JSON) and &lt;code&gt;image_bytes&lt;/code&gt; (product image) in the next &lt;code&gt;generate_content&lt;/code&gt; call, and the generated answer can therefore describe the visual content of the image.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 4: LINE Bot simultaneously returns text + image
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# main.py
&lt;/span&gt;
&lt;span class="n"&gt;ai_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;ecommerce_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg_text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;reply_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ai_text&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;image_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;uuid&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;image_cache&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="c1"&gt;# Temporary storage
&lt;/span&gt;    &lt;span class="n"&gt;image_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;BOT_HOST_URL&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/images/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# FastAPI provides service
&lt;/span&gt;    &lt;span class="n"&gt;reply_messages&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nc"&gt;ImageSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;original_content_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;preview_image_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;get_line_bot_api&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;reply_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;LINE Bot's &lt;code&gt;reply_message&lt;/code&gt; supports returning multiple messages at once (up to 5), so text and images can be sent simultaneously.&lt;/p&gt;




&lt;h2&gt;
  
  
  Potholes
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;FunctionResponseBlob&lt;/code&gt; is not &lt;code&gt;Blob&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;The most common pitfall: When constructing multimodal image parts, &lt;strong&gt;you cannot use &lt;code&gt;types.Blob&lt;/code&gt;&lt;/strong&gt;, you must use &lt;code&gt;types.FunctionResponseBlob&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error (will TypeError)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Blob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although both have &lt;code&gt;mime_type&lt;/code&gt; and &lt;code&gt;data&lt;/code&gt; fields, the &lt;code&gt;inline_data&lt;/code&gt; field type of &lt;code&gt;FunctionResponsePart&lt;/code&gt; is &lt;code&gt;FunctionResponseBlob&lt;/code&gt;, and Pydantic validation will directly reject &lt;code&gt;Blob&lt;/code&gt;. You can confirm this with &lt;code&gt;python -c "from google.genai import types; print(types.FunctionResponsePart.model_fields)"&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: &lt;code&gt;aiohttp.ClientSession&lt;/code&gt; cannot be created at the module level
&lt;/h3&gt;

&lt;p&gt;The original code directly created &lt;code&gt;aiohttp.ClientSession()&lt;/code&gt; at the module level:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Old method: module level
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Will warn or error if there is no running event loop
&lt;/span&gt;&lt;span class="n"&gt;async_http_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AiohttpAsyncHttpClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When importing &lt;code&gt;main.py&lt;/code&gt; in pytest tests, because there is no running event loop, &lt;code&gt;RuntimeError: no running event loop&lt;/code&gt; will appear. The solution is to change to lazy initialization, and create it only when it is actually needed for the first time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ New method: lazy init
&lt;/span&gt;&lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_line_bot_api&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="c1"&gt;# Called within the async route handler, guaranteeing an event loop
&lt;/span&gt;        &lt;span class="n"&gt;_line_bot_api&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AsyncLineBotApi&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;channel_access_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;AiohttpAsyncHttpClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;_line_bot_api&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❌ Pitfall 3: LINE Bot needs HTTPS URL to send images
&lt;/h3&gt;

&lt;p&gt;Gemini receives raw bytes, but LINE Bot's &lt;code&gt;ImageSendMessage&lt;/code&gt; requires a &lt;strong&gt;publicly accessible HTTPS URL&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The solution is to add a &lt;code&gt;/images/{image_id}&lt;/code&gt; endpoint in FastAPI, temporarily store the read image bytes in the &lt;code&gt;image_cache&lt;/code&gt; dict, and LINE gets the image through this endpoint:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@app.get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/images/{image_id}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;serve_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;image_cache&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;image_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;image_bytes&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;HTTPException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;status_code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;404&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;detail&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Image not found&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;media_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Use &lt;code&gt;ngrok&lt;/code&gt; to expose port 8000 for local development, and use the service URL directly after Cloud Run deployment.&lt;/p&gt;




&lt;h2&gt;
  
  
  Demo Display
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Mock database (default data for demo)
&lt;/h3&gt;

&lt;p&gt;The system has 5 built-in products (all with real Unsplash photos), and each LINE user automatically binds two demo orders when querying orders for the first time:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Order number&lt;/th&gt;
&lt;th&gt;Date&lt;/th&gt;
&lt;th&gt;Product&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;ORD-2026-0115&lt;/td&gt;
&lt;td&gt;2026-01-15&lt;/td&gt;
&lt;td&gt;P001 Brown pilot jacket&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;ORD-2026-0108&lt;/td&gt;
&lt;td&gt;2026-01-08&lt;/td&gt;
&lt;td&gt;P003 Dark blue denim jacket&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h3&gt;
  
  
  Scenario 1: "Help me see the jacket I bought before"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: "Help me see the jacket I bought before"

[Gemini → function_call]
  get_order_history(time_range="all")

[_execute_tool execution]
  - get_order_history() returns two orders (P001, P003)
  - Read img/tobias-tullius-...-unsplash.jpg → Brown pilot jacket real photo bytes

[Multimodal Function Response]
  Part.from_function_response(
    name="get_order_history",
    response={"orders": [...], "order_count": 2},
    parts=[FunctionResponsePart(inline_data=FunctionResponseBlob(data=&amp;lt;photo&amp;gt;))]
  )

[Gemini responds after seeing the real photo]
  "From the photo, you can see that this is a brown pilot jacket, made of lightweight nylon with
   a glossy feel, and a metal zipper pocket on the left sleeve. This is your January 15, 2026
   order ORD-2026-0115, for a total of NT$1,890, status: delivered."

LINE displays: [Text] + [Brown pilot jacket real photo]

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: "Are there any dark blue jackets?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Gemini → function_call]
  search_products(description="dark blue jacket", color="dark blue")

[Gemini sees the real photo of the P003 dark blue denim jacket]
  "Yes! This dark blue denim jacket (P003) in the photo features a retro stitching design,
   a lapel with metal buttons, and a very complete garment feel, priced at NT$1,490, with 8 in stock."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 3: "What are the features of the P004 knitted shawl?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Gemini → function_call]
  get_product_details(product_id="P004")

[Gemini sees the real photo of the beige knitted shawl]
  "The photo shows a beige handmade crochet shawl, with a V-neck design and tassels at the bottom,
   you can see the light lace-like mesh weave, elegant texture, priced at NT$1,290."

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Traditional Function Response vs Multimodal Function Response
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Traditional&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Multimodal&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Function return&lt;/td&gt;
&lt;td&gt;Pure JSON&lt;/td&gt;
&lt;td&gt;JSON + image/PDF bytes&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gemini perception&lt;/td&gt;
&lt;td&gt;Text data&lt;/td&gt;
&lt;td&gt;Text + visual content&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Answer quality&lt;/td&gt;
&lt;td&gt;"You bought a brown pilot jacket"&lt;/td&gt;
&lt;td&gt;"You can see the nylon texture in the photo, with a zipper pocket on the left sleeve..."&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API difference&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Part.from_function_response(name, response)&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;Part.from_function_response(name, response, parts=[...])&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Applicable scenarios&lt;/td&gt;
&lt;td&gt;Pure text data queries&lt;/td&gt;
&lt;td&gt;Scenarios that require visual recognition/confirmation&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;This implementation gave me a new understanding of Gemini's Function Calling capabilities.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem that Multimodal Function Response truly solves&lt;/strong&gt; is to allow AI agents to bring in visual information in the action of "calling an external system" itself, instead of first checking text and then uploading images separately. This will be an important basic capability in areas highly related to visuals, such as e-commerce, medicine, and design.&lt;/p&gt;

&lt;p&gt;However, there are still a few limitations worth noting:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Image URLs cannot be used directly&lt;/strong&gt;: Gemini's &lt;code&gt;FunctionResponseBlob&lt;/code&gt; requires raw bytes, and URLs cannot be filled in directly (this is different from bringing images directly in the prompt). If the image is originally a URL, you need to download it with &lt;code&gt;requests.get()&lt;/code&gt; to bytes before passing it in.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;No display_name can also be used&lt;/strong&gt;: The official documentation examples have &lt;code&gt;display_name&lt;/code&gt; and &lt;code&gt;$ref&lt;/code&gt; JSON reference, but in actual testing in google-genai 1.49.0, it can also work normally without filling in display_name, and Gemini can still see and analyze the image.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Model limitations&lt;/strong&gt;: The official mark supports the Gemini 3 series, but &lt;code&gt;gemini-2.0-flash&lt;/code&gt; can also handle it normally in actual testing, and the API structure is the same.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;There are many directions that can be extended in the future: let users send their own product photos for the Bot to compare, include PDF catalogs in the function response for Gemini to read directly, or let the Bot analyze the report images converted from DICOM in medical scenarios... As long as visual data can be obtained from external systems, Multimodal Function Response can make the AI's answers more in-depth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The focus of this LINE Bot implementation is only one sentence: &lt;strong&gt;Let the function response carry the image, and Gemini's answer will be upgraded from "restating data" to "telling a story based on the picture"&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The core API is just these few lines, but it takes a lot of details to get the whole process working:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# The complete way for Gemini to see the image returned by the function
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;get_order_history&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;orders&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[...]},&lt;/span&gt;
    &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponsePart&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponseBlob&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c1"&gt;# ← Not types.Blob!
&lt;/span&gt;                &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;image/jpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;image_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The complete code is on &lt;a href="https://github.com/kkdai/linebot-gemini-multimodel-funcal" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to clone and play with it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>api</category>
      <category>gemini</category>
      <category>llm</category>
    </item>
    <item>
      <title>Gemini Tool Combo: Building a LINE Meetup Helper with Maps Grounding and Places API in a Single API Call</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:07:59 +0000</pubDate>
      <link>https://dev.to/gde/gemini-tool-combo-building-a-line-meetup-helper-with-maps-grounding-and-places-api-in-a-single-api-3ppd</link>
      <guid>https://dev.to/gde/gemini-tool-combo-building-a-line-meetup-helper-with-maps-grounding-and-places-api-in-a-single-api-3ppd</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ljj7q6yd4dju6v6uxg2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1ljj7q6yd4dju6v6uxg2.png" alt="image-20260327164715459" width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference articles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;Gemini API tooling updates: context circulation, tool combos and Maps grounding for Gemini 3&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://developers.google.com/maps/documentation/places/web-service/nearby-search" rel="noopener noreferrer"&gt;Google Places API (New) - searchNearby&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub: linebot-spot-finder&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;  Complete code &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; (Meeting Helper LINE Bot Spot Finder)&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;The combination of LINE Bot + Gemini is already very common. Whether it's using Google Search Grounding to let the model look up real-time information or using Function Calling to let the model call custom logic, they are both mature when used alone.&lt;/p&gt;

&lt;p&gt;But what if you want to achieve both "map location context" and "query real ratings" &lt;strong&gt;in the same question&lt;/strong&gt;?&lt;/p&gt;

&lt;p&gt;Taking restaurant search as an example, the traditional approach usually looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User: "Help me find a hot pot restaurant nearby with a rating of 4 stars or above"

Solution A (using only Maps Grounding):
Gemini has map context, but the rating information is described by AI itself, and accuracy is not guaranteed.

Solution B (using only Places API):
You can get real ratings, but there is no map context, and Gemini doesn't know where the user is.

To have both, you usually need to make two API calls, or manually connect them yourself.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;AI can search maps and call external APIs, but doing both in a single call&lt;/strong&gt;—has always been an awkward blank in the old Gemini API architecture.&lt;/p&gt;

&lt;p&gt;Until March 17, 2026, Google released &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;Gemini API Tooling Updates&lt;/a&gt; (by Mariano Cocirio), which provided an official solution to this problem.&lt;/p&gt;




&lt;h2&gt;
  
  
  What are Tool Combinations?
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl69w6em7cc4jzvdxmdiu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl69w6em7cc4jzvdxmdiu.png" alt="image-20260327163136077" width="800" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Google announced three core features in this &lt;a href="https://blog.google/innovation-and-ai/technology/developers-tools/gemini-api-tooling-updates/" rel="noopener noreferrer"&gt;update&lt;/a&gt;:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Tool Combinations&lt;/strong&gt; Developers can now attach built-in tools (such as Google Search, Google Maps) and custom Function Declarations simultaneously in a &lt;strong&gt;single Gemini API call&lt;/strong&gt;. The model decides which tool to call and when to call it, and finally integrates the results to generate an answer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Maps Grounding&lt;/strong&gt; Gemini can now directly perceive map data, not just text descriptions of "location", but truly has spatial context—knowing where the user is and what's nearby.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Context Circulation&lt;/strong&gt; Allows the context between multi-turn tool calls to flow naturally, and the model can fully remember the results of the first tool call when making the second call.&lt;/p&gt;

&lt;p&gt;The key to this change is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Old approach (two tools cannot coexist)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;google_search&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleSearch&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MY_FN&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# New approach (the same Tool object, both coexist)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;MY_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One line of modification opens up a whole new combination method.&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Goal
&lt;/h2&gt;

&lt;p&gt;This time, I used Tool Combinations to transform the existing &lt;strong&gt;linebot-spot-finder&lt;/strong&gt;, upgrading it from "only Maps Grounding for rough answers" to "Google Maps context + Places API real data":&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;After the user sends their GPS location, they enter: "Please find a hot pot restaurant with a rating of 4 stars or above, suitable for group dining, and list the name, address, and review summary."&lt;/p&gt;

&lt;p&gt;Bot (old Maps Grounding): "There are several hot pot restaurants nearby, and the ratings are good." (AI describes it itself, which may not be accurate)&lt;/p&gt;

&lt;p&gt;Bot (new Tool Combo): "Lao Wang Hot Pot | 100 Shimin Avenue, Xinyi District, Taipei City | Rating 4.6 (312) | Reviews: Large portions, great value for money, suitable for group dining; efficient service, fast serving."&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The difference is: Gemini now receives both map context (where you are) and the &lt;strong&gt;real structured data&lt;/strong&gt; (rating numbers, review text) from the Places API, so the answer changes from a "vague description" to "informed information".&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Overall Message Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LINE User sends GPS location
    │
    ▼
handle_location() → session.metadata stores lat/lng
    │
    └──► Returns Quick Reply (restaurant / gas station / parking lot)

LINE User sends text question (e.g. "Find a hot pot restaurant with a rating of 4 stars or above")
    │
    ▼
handle_text()
    │
    ├── session has lat/lng?
    │ Yes → tool_combo_search(query, lat, lng) ← Focus of this article
    │ No → fallback: Gemini Chat + Google Search
    │
    └──► Returns natural language answer
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Tool Combo Agentic Loop
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;tool_combo_search(query, lat, lng)
         │
         ▼
  Step 1: generate_content()
  tools = [google_maps + search_nearby_restaurants]
         │
         ▼
  response.candidates[0].content.parts has function_call?
       ╱ ╲
      Yes   No
      │     │
      ▼     ▼
  _execute_function()  Directly returns response.text
  → _call_places_api()
    (Places API searchNearby)
    Returns rating, address, reviews
      │
      ▼
  Collect into a single Content(role="user")
  Add to history
      │
      ▼
  Step 3: generate_content(contents=history)
  Gemini integrates map context + Places data
      │
      ▼
  Returns final.text
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why not put lat/lng in Function Declaration?
&lt;/h3&gt;

&lt;p&gt;This is an important design decision.&lt;/p&gt;

&lt;p&gt;If you add &lt;code&gt;lat&lt;/code&gt;/&lt;code&gt;lng&lt;/code&gt; to the parameters of &lt;code&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/code&gt;, Gemini will fill in the coordinates itself—but it fills in the "approximate location" inferred from the conversation, not the user's actual GPS coordinates, and the error can be as high as several kilometers.&lt;/p&gt;

&lt;p&gt;The correct approach is to let the Python dispatcher extract the precise coordinates from &lt;code&gt;session.metadata&lt;/code&gt; and &lt;strong&gt;inject&lt;/strong&gt; them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_execute_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_nearby_restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;_call_places_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Inject from session, don't let Gemini guess
&lt;/span&gt;            &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;float&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Core Code Details
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Define Function Declaration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionDeclaration&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search_nearby_restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search for nearby restaurants using Google Places API, and return the rating, address, and user reviews.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lat/lng is automatically included by the system and does not need to be provided.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OBJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;properties&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;keyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;STRING&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Restaurant type or keyword, such as: hot pot, hot pot, Italian&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;min_rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NUMBER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Minimum rating threshold (1–5), default 4.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;radius_m&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Schema&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Type&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;INTEGER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Search radius (meters), default 1000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The description clearly tells the model "lat/lng is included by the system", avoiding the model filling in the coordinates itself in the args.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Places API Call
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;

&lt;span class="n"&gt;PLACES_API_URL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://places.googleapis.com/v1/places:searchNearby&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;PLACES_FIELD_MASK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.displayName,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.rating,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.userRatingCount,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.formattedAddress,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places.reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_call_places_api&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;4.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;radius_m&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;includedTypes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restaurant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResultCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;locationRestriction&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;circle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;center&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;latitude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;longitude&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;radiusMeters&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;radius_m&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;PLACES_API_URL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Goog-Api-Key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_MAPS_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;X-Goog-FieldMask&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;PLACES_FIELD_MASK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;raise_for_status&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="n"&gt;restaurants&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;places&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="n"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;min_rating&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;reviews&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[])[:&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;restaurants&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;displayName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;address&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;formattedAddress&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;rating&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rating_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;place&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userRatingCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reviews&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;reviews&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;restaurants&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;restaurants&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Tool Combo Main Function (Agentic Loop)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;tool_combo_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;http_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;HttpOptions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;api_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;enriched_query&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s current location: latitude &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;, longitude &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please answer in traditional Chinese using Taiwanese terminology, and do not use markdown format.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Question: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;tool_config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;# ← Maps grounding
&lt;/span&gt;                &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# ← Places API
&lt;/span&gt;            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="c1"&gt;# ── Step 1 ──────────────────────────────────────────────────────
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_COMBO_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;

    &lt;span class="c1"&gt;# ── Step 2：Processing function_call ──────────────────────────────────
&lt;/span&gt;    &lt;span class="n"&gt;function_response_parts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;fn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function_call&lt;/span&gt;
            &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_execute_function&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}),&lt;/span&gt; &lt;span class="n"&gt;lat&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lng&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;function_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;function_response_parts&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

        &lt;span class="c1"&gt;# ── Step 3 ────────────────────────────────────────────────────
&lt;/span&gt;        &lt;span class="n"&gt;final&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TOOL_COMBO_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;tool_config&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;final&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Pitfalls Encountered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;Part.from_function_response()&lt;/code&gt; does not accept the &lt;code&gt;id&lt;/code&gt; parameter
&lt;/h3&gt;

&lt;p&gt;This is the easiest pitfall to step into this time, and the error only explodes when &lt;strong&gt;real model calls&lt;/strong&gt; are made, and unit tests almost never detect it.&lt;/p&gt;

&lt;p&gt;Originally, I wrote it like this, referring to the official example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error——TypeError occurs at runtime
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_function_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← This parameter does not exist!
&lt;/span&gt;    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The actual signature of &lt;code&gt;from_function_response&lt;/code&gt; is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;(*, name: str, response: dict, parts: Optional[list] = None) -&amp;gt; Part
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is no &lt;code&gt;id&lt;/code&gt; parameter at all. Every time the model actually triggers a function_call, the program will throw a &lt;code&gt;TypeError&lt;/code&gt; at this line, and then silently enter the except of Step 3, returning an error message, and the results of the Places API are never truly returned to Gemini.&lt;/p&gt;

&lt;p&gt;The correct way is to directly construct &lt;code&gt;types.FunctionResponse&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;function_response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;FunctionResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;fn_name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can immediately confirm the parameter list with &lt;code&gt;python -c "from google.genai import types; help(types.Part.from_function_response)"&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: &lt;code&gt;include_server_side_tool_invocations=True&lt;/code&gt; causes Pydantic to explode
&lt;/h3&gt;

&lt;p&gt;I thought I should add this parameter after seeing the official documentation example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GenerateContentConfig&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[...],&lt;/span&gt;
    &lt;span class="n"&gt;include_server_side_tool_invocations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← The installed SDK version does not support it
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In &lt;code&gt;google-genai 1.49.0&lt;/code&gt;, this field is not in the model fields of &lt;code&gt;GenerateContentConfig&lt;/code&gt;, and Pydantic will directly throw an &lt;code&gt;extra_forbidden&lt;/code&gt; validation error. Just remove it, and the function is completely normal.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 3: &lt;code&gt;textQuery&lt;/code&gt; is a parameter of &lt;code&gt;searchText&lt;/code&gt;, not &lt;code&gt;searchNearby&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;I thought "if there is a keyword, then bring it into the Places API", and intuitively added it to the request body:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error——Invalid field for searchNearby endpoint
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;body&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;textQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;keyword&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;searchNearby&lt;/code&gt; only accepts fields such as &lt;code&gt;includedTypes&lt;/code&gt;, &lt;code&gt;locationRestriction&lt;/code&gt;; &lt;code&gt;textQuery&lt;/code&gt; is a parameter of the &lt;code&gt;searchText&lt;/code&gt; endpoint. Adding this field will not report an error (in some versions), but the keyword will not take effect at all.&lt;/p&gt;

&lt;p&gt;The correct approach is to leave the keyword in the description of the Function Declaration for Gemini to refer to, let the model translate the intent to &lt;code&gt;enriched_query&lt;/code&gt;, let Maps Grounding handle the keyword semantics, and Places API is only responsible for returning real rating data.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 4: No guard for &lt;code&gt;response.candidates[0]&lt;/code&gt;
&lt;/h3&gt;

&lt;p&gt;When the model encounters security filtering, RECITATION, or other abnormal termination, &lt;code&gt;candidates&lt;/code&gt; may be an empty list, and then directly &lt;code&gt;response.candidates[0]&lt;/code&gt; is &lt;code&gt;IndexError&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ No guard
&lt;/span&gt;&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;enriched_query&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← If candidates is empty, it will explode
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Add guard
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;（Unable to get a reply）&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;history&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[...]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Demo Display
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvionmd6lsyr2srm5gg87.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvionmd6lsyr2srm5gg87.png" alt="image-20260327163200329" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Scenario 1: "Find a hot pot restaurant with a rating of 4 stars or above for group dining"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: GPS location (Xinyi District, Taipei City, 25.0441, 121.5598)

User enters: "Please find a hot pot restaurant with a rating of 4 stars or above, suitable for group dining, and list the name, address, and review summary."

[Step 1: Gemini receives query + map context]
  → Detects the need for restaurant data, emit function_call:
    search_nearby_restaurants(keyword="hot pot", min_rating=4.0)

[Step 2: Python calls Places API]
  → lat=25.0441, lng=121.5598 injected from session
  → Returns 3 restaurants with a rating ≥ 4.0, including review text

[Step 3: Gemini integrates Maps context + Places data]
  → "Lao Wang Hot Pot｜100 Shimin Avenue, Xinyi District｜⭐ 4.6 (312)
      Review summary: Large portions, great value for money, a top choice for friends to dine; fast service, fresh dishes.
     ... (3 restaurants in total)"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: "Are there any high-value Japanese restaurants?"
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User enters: "Are there any high-value Japanese restaurants nearby?"

[Step 1: Gemini]
  → function_call: search_nearby_restaurants(keyword="Japanese cuisine", min_rating=4.0)

[Step 2: Places API]
  → Returns 2 Japanese restaurants that meet the rating criteria

[Step 3: Gemini]
  → "There are two recommendations:
      Washoku ○○｜...｜⭐ 4.4｜Reviews: Weekday lunch set is only 280 yuan, very fresh.
      ..."
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Demo Script Quick Test
&lt;/h3&gt;

&lt;p&gt;No need for LINE Bot, directly on the local machine:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Only test Tool Combo (main function)&lt;/span&gt;
python demo.py combo

&lt;span class="c"&gt;# Run all three functions&lt;/span&gt;
python demo.py all
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Old Architecture vs. New Architecture
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Old Architecture (Maps Grounding only)&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;New Architecture (Tool Combo)&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Tool&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google_maps&lt;/code&gt; (built-in)&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;google_maps&lt;/code&gt; + &lt;code&gt;search_nearby_restaurants&lt;/code&gt; (custom)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Rating Data&lt;/td&gt;
&lt;td&gt;Gemini describes it itself (may not be accurate)&lt;/td&gt;
&lt;td&gt;Places API real numbers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reviews&lt;/td&gt;
&lt;td&gt;AI generated&lt;/td&gt;
&lt;td&gt;Real user reviews (up to 3)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API Call Count&lt;/td&gt;
&lt;td&gt;1 time&lt;/td&gt;
&lt;td&gt;1 time (Step1) + 1 time (Step3) = 2 times, but transparent to the user&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Accuracy&lt;/td&gt;
&lt;td&gt;Medium&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Custom Filtering&lt;/td&gt;
&lt;td&gt;Rely on prompt&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;min_rating&lt;/code&gt;, &lt;code&gt;radius_m&lt;/code&gt; precise control&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;This implementation has given me a clearer understanding of the potential of Gemini Tool Combinations.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The problem that Tool Combinations truly solves&lt;/strong&gt; is that Grounding and Function Calling are no longer mutually exclusive. Previously, to achieve "map context + real external data", you could only manually connect two APIs yourself at the application layer, or use Gemini's text generation to "simulate" external data (unreliable). Now the model itself knows when to use map context and when to call the Places API, and developers only need to attach the tools.&lt;/p&gt;

&lt;p&gt;However, there are also a few things to note about this implementation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;&lt;code&gt;lat/lng&lt;/code&gt; injection mode is very important&lt;/strong&gt;: You can't let the model guess the coordinates itself, you must inject them from the session, otherwise the positioning accuracy will be very poor. This mode also applies to all function calling scenarios that "have session status".&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The cost of two &lt;code&gt;generate_content&lt;/code&gt; calls&lt;/strong&gt;: The agentic loop of Tool Combo requires two model calls, and the token consumption is about 1.5–2 times that of a single call. This needs to be especially considered for scenarios with high latency requirements.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;SDK version differences&lt;/strong&gt;: Different versions of &lt;code&gt;google-genai&lt;/code&gt; have different support for the fields of &lt;code&gt;GenerateContentConfig&lt;/code&gt;, and new fields like &lt;code&gt;include_server_side_tool_invocations&lt;/code&gt; should be used after confirming the version number, otherwise Pydantic validation errors are hard to track.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Future directions that can be extended:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Connect the Postback quick replies (click the "Find Restaurant" button) to Tool Combo, so that each entry can get real ratings&lt;/li&gt;
&lt;li&gt;  Add the &lt;code&gt;searchText&lt;/code&gt; endpoint to support more complex keyword searches (e.g. Michelin recommendations)&lt;/li&gt;
&lt;li&gt;  Tool Combo combined with other built-in tools (such as &lt;code&gt;google_search&lt;/code&gt;) to achieve more complex multi-tool chaining&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The core concept of this modification is only one sentence: &lt;strong&gt;Put Google Maps grounding and the Places API function tool in the same &lt;code&gt;types.Tool&lt;/code&gt;, and Gemini will coordinate the two in a single conversation.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The key code is only these few lines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# This is all the magic of Tool Combo
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Tool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;google_maps&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GoogleMaps&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="c1"&gt;# ← Maps context
&lt;/span&gt;    &lt;span class="n"&gt;function_declarations&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;SEARCH_NEARBY_RESTAURANTS_FN&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="c1"&gt;# ← Places API
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But to make it really work, you also need to pay attention to: the construction method of &lt;code&gt;FunctionResponse&lt;/code&gt;, the guard of &lt;code&gt;candidates&lt;/code&gt;, the correct fields of the Places API endpoint, and the injection of &lt;code&gt;lat/lng&lt;/code&gt; from the session instead of letting the model guess.&lt;/p&gt;

&lt;p&gt;The complete code is on &lt;a href="https://github.com/kkdai/linebot-spot-finder" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to clone and play with it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Gemini 3.1: Real-World Voice Recognition with Flash Live: Making Your LINE Bot Understand You</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 29 Mar 2026 02:07:26 +0000</pubDate>
      <link>https://dev.to/gde/gemini-31-real-world-voice-recognition-with-flash-live-making-your-line-bot-understand-you-560o</link>
      <guid>https://dev.to/gde/gemini-31-real-world-voice-recognition-with-flash-live-making-your-line-bot-understand-you-560o</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjl6ycarc8j4uczflmtoj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjl6ycarc8j4uczflmtoj.png" alt="image-20260328203306501" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;Google released &lt;strong&gt;Gemini 3.1 Flash Live&lt;/strong&gt; at the end of March 2026 &lt;a href="https://blog.google/innovation-and-ai/models-and-research/gemini-models/gemini-3-1-flash-live/" rel="noopener noreferrer"&gt;March&lt;/a&gt;, focusing on "making audio AI more natural and reliable." This model is specifically designed for real-time two-way voice conversations, with low latency, interruptibility, and multi-language support.&lt;/p&gt;

&lt;p&gt;I happened to have a LINE Bot project (&lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;linebot-helper-python&lt;/a&gt;) on hand, which already handles text, images, URLs, PDFs, and YouTube, but completely ignores voice messages:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends a voice message
Bot: (Silence)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time, I'll add voice support and share a few pitfalls I encountered.&lt;/p&gt;




&lt;h2&gt;
  
  
  Design Decision: Flash Live or Standard Gemini API?
&lt;/h2&gt;

&lt;p&gt;The first question: Gemini 3.1 Flash Live is designed for &lt;strong&gt;real-time streaming&lt;/strong&gt;, but LINE's voice messages are &lt;strong&gt;pre-recorded m4a files&lt;/strong&gt;, not real-time audio streams.&lt;/p&gt;

&lt;p&gt;Using Flash Live to process pre-recorded files is like using a live streaming camera to take photos – technically feasible, but the wrong tool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Decided to use the standard Gemini API&lt;/strong&gt; – directly passing the audio bytes as inline data, and getting the transcribed text in one call. It's simpler and more suitable for this scenario.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdx9agqz7jujs89xky8x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Frdx9agqz7jujs89xky8x.png" alt="image-20260328203340798" width="800" height="322"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Architecture Design
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Integration Approach
&lt;/h3&gt;

&lt;p&gt;This repo already has a complete Orchestrator architecture, which automatically routes to different Agents (Chat, Content, Location, Vision, GitHub) based on the message content. The goal for voice messages is clear:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Convert voice to text, and then treat it as a regular text message and pass it into the Orchestrator – allowing all existing features to automatically support voice input.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;User says "Help me search for nearby gas stations" → transcribed into text → Orchestrator determines it's a location query → LocationAgent processes it. No need to implement separate logic for voice.&lt;/p&gt;

&lt;h3&gt;
  
  
  Complete Flow
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends AudioMessage (m4a)
    │
    ▼ handle_audio_message()
    │
    ├─ ① LINE SDK downloads audio bytes
    │ get_message_content(message_id) → iter_content()
    │
    ├─ ② Gemini transcription
    │ tools/audio_tool.py → transcribe_audio()
    │ model: gemini-3.1-flash-lite-preview
    │
    ├─ ③ Reply #1: "You said: {transcription}"
    │ reply_message() (consumes reply token)
    │
    └─ ④ Reply #2: Orchestrator routing
            handle_text_message_via_orchestrator(push_user_id=user_id)
            ↓
            push_message() (reply token already used, use push instead)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why two replies?
&lt;/h3&gt;

&lt;p&gt;The replies are divided into two parts to let the user &lt;strong&gt;see the transcription result immediately&lt;/strong&gt;, without waiting for the Orchestrator to finish processing to know if the Bot understood what they said.&lt;/p&gt;




&lt;h2&gt;
  
  
  Core Code Explanation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Step 1: Audio Transcription Tool (tools/audio_tool.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.genai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;

&lt;span class="n"&gt;TRANSCRIPTION_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-lite-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;transcribe_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;audio/mp4&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Transcribe audio bytes to text using Gemini.
    LINE voice messages are always m4a, MIME type is always audio/mp4.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_PROJECT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;audio_part&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_bytes&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;mime_type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;models&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;generate_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;TRANSCRIPTION_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="n"&gt;audio_part&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Design principle: The function itself does not catch exceptions, allowing the upper-level handler to handle error responses uniformly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 2: Handler Main Flow (main.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_audio_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageEvent&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Handle audio (voice) messages — transcribe and route through Orchestrator.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;source&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;
    &lt;span class="n"&gt;replied&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt; &lt;span class="c1"&gt;# Track if the reply token has been used
&lt;/span&gt;    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Download audio
&lt;/span&gt;        &lt;span class="n"&gt;message_content&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_message_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;audio_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;iter_content&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;audio_bytes&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;chunk&lt;/span&gt;

        &lt;span class="c1"&gt;# Transcription
&lt;/span&gt;        &lt;span class="n"&gt;transcription&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;transcribe_audio&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audio_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Empty transcription (silent or too short)
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Unable to recognize voice content, please re-record.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="c1"&gt;# Reply #1: Let the user confirm the transcription result (consumes reply token)
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You said: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;replied&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;

        &lt;span class="c1"&gt;# Reply #2: Send to Orchestrator, using push_message (token already used)
&lt;/span&gt;        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;handle_text_message_via_orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;transcription&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
            &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;error&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Error handling audio for &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc_info&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;error_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;LineService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_error_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing voice message&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;error_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;replied&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# reply token has been consumed, use push instead
&lt;/span&gt;            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 3: Enabling Orchestrator to Support External Text Input
&lt;/h3&gt;

&lt;p&gt;The original &lt;code&gt;handle_text_message_via_orchestrator&lt;/code&gt; directly reads &lt;code&gt;event.message.text&lt;/code&gt;. AudioMessage doesn't have &lt;code&gt;.text&lt;/code&gt;, so add two optional parameters:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_text_message_via_orchestrator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MessageEvent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← External text input (voice transcription)
&lt;/span&gt;    &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="c1"&gt;# ← Use push_message when set
&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;orchestrator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;process_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response_text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;format_orchestrator_response&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;reply_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response_text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reply_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;reply_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;error_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;TextSendMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LineService&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;format_error_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;processing your question&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;push_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;push_user_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;line_bot_api&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reply_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reply_token&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;error_msg&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;text is not None&lt;/code&gt; (instead of &lt;code&gt;text or ...&lt;/code&gt;) is intentional – in case the voice transcription results in an empty string, allow the empty string to pass through (and then be intercepted by the upper-level &lt;code&gt;if not transcription.strip()&lt;/code&gt;), instead of falling back to &lt;code&gt;event.message.text&lt;/code&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Pitfalls Encountered
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 1: &lt;code&gt;Part.from_text()&lt;/code&gt; does not accept positional arguments
&lt;/h3&gt;

&lt;p&gt;The first TypeError encountered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Error (TypeError: Part.from_text() takes 1 positional argument but 2 were given)
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_text&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please transcribe the above audio content into text completely, preserving the original language, and do not add any explanations or prefixes.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this version of the SDK, &lt;code&gt;text&lt;/code&gt; in &lt;code&gt;Part.from_text()&lt;/code&gt; is a keyword argument, or use the &lt;code&gt;Part(text=...)&lt;/code&gt; constructor directly for more safety.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 2: LINE reply token can only be used once
&lt;/h3&gt;

&lt;p&gt;LINE's reply token is &lt;strong&gt;one-time use&lt;/strong&gt;. Once &lt;code&gt;reply_message()&lt;/code&gt; is called, the token is invalidated.&lt;/p&gt;

&lt;p&gt;This project's voice flow will call twice:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Reply #1 (display transcription text) → &lt;strong&gt;consumes token&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt; Reply #2 (Orchestrator result) → &lt;strong&gt;token is invalid, will receive LINE 400 error&lt;/strong&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The solution is to have the Orchestrator handler support &lt;code&gt;push_message&lt;/code&gt; mode (via the &lt;code&gt;push_user_id&lt;/code&gt; parameter), and Reply #2 changes to &lt;code&gt;push_message&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Error handling should also be noted: if Orchestrator throws an exception after Reply #1 succeeds, the &lt;code&gt;reply_message&lt;/code&gt; cannot be used in the except block, and it also needs to be changed to &lt;code&gt;push_message&lt;/code&gt;. This is the purpose of the &lt;code&gt;replied&lt;/code&gt; flag in the code.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 3: Gemini Flash Live is not suitable for pre-recorded files
&lt;/h3&gt;

&lt;p&gt;Not a real "pitfall", but worth clarifying:&lt;/p&gt;

&lt;p&gt;Gemini 3.1 Flash Live is designed for &lt;strong&gt;real-time two-way streaming&lt;/strong&gt;, which has the overhead of connection establishment and streaming protocols. LINE voice messages are complete pre-recorded m4a files, which can be processed once.&lt;/p&gt;

&lt;p&gt;Using &lt;code&gt;client.aio.models.generate_content()&lt;/code&gt; directly to pass inline audio bytes is simpler, and the delay is not bad. Leave Flash Live for scenarios that truly require real-time conversations.&lt;/p&gt;




&lt;h2&gt;
  
  
  Effect Demonstration
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Scenario 1: Voice Command Query
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] Help me search for cafes near Taipei Main Station

Bot Reply #1: You said: Help me search for cafes near Taipei Main Station
Bot Reply #2: [LocationAgent replies with a list of nearby cafes]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 2: Voice Question
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] What's the difference between Gemini and GPT-4

Bot Reply #1: You said: What's the difference between Gemini and GPT-4
Bot Reply #2: [ChatAgent with Google Search Grounding replies with comparison results]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Scenario 3: Voice Send URL
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;User sends: [Voice] Help me summarize this article https://example.com/article

Bot Reply #1: You said: Help me summarize this article https://example.com/article
Bot Reply #2: [ContentAgent fetches and summarizes the article]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The text transcribed from voice goes directly into the Orchestrator, and all existing URL detection and intent determination work as usual, with zero extra logic.&lt;/p&gt;




&lt;h2&gt;
  
  
  Traditional Text Input vs. Voice Input
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Text Input&lt;/th&gt;
&lt;th&gt;&lt;strong&gt;Voice Input&lt;/strong&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Input Format&lt;/td&gt;
&lt;td&gt;TextMessage&lt;/td&gt;
&lt;td&gt;AudioMessage (m4a)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Pre-processing&lt;/td&gt;
&lt;td&gt;None&lt;/td&gt;
&lt;td&gt;Gemini transcription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;reply token&lt;/td&gt;
&lt;td&gt;Direct use&lt;/td&gt;
&lt;td&gt;Reply #1 consumes, Reply #2 changes to push&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Orchestrator&lt;/td&gt;
&lt;td&gt;Direct routing&lt;/td&gt;
&lt;td&gt;Route after transcription&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Supported Functions&lt;/td&gt;
&lt;td&gt;All&lt;/td&gt;
&lt;td&gt;All (no additional settings required)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Error Handling&lt;/td&gt;
&lt;td&gt;reply_message&lt;/td&gt;
&lt;td&gt;replied flag determines reply/push&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Analysis and Outlook
&lt;/h2&gt;

&lt;p&gt;What I am most satisfied with in this integration is that &lt;strong&gt;I hardly need to change the Orchestrator itself&lt;/strong&gt;. As long as the voice is converted to text at the input end, all the routing logic, Agent calls, and error handling are automatically inherited.&lt;/p&gt;

&lt;p&gt;Gemini's multimodal audio understanding performs very stably in this scenario – Traditional Chinese, Taiwanese accents, and sentences mixed with English can basically be transcribed accurately.&lt;/p&gt;

&lt;p&gt;Future directions for extension:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Multi-language automatic detection&lt;/strong&gt;: Tell Gemini to preserve the original language during transcription, Japanese voice → Japanese transcription, and then the Orchestrator decides whether to translate&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Group voice support&lt;/strong&gt;: Currently limited to 1:1, voice messages in groups are temporarily ignored&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Long recording summary&lt;/strong&gt;: Recordings exceeding a certain length go directly to ContentAgent for summarization, instead of being treated as commands&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Extension: 🔊 Read Summary Aloud – Make the Bot Speak
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghuiv4o4wq6yt1jyupmu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fghuiv4o4wq6yt1jyupmu.png" alt="Preview Program 2026-03-28 20.33.53" width="800" height="1739"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Voice recognition allows the Bot to "understand" what the user is saying. After this is done, the next question naturally arises:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Can the Bot respond by speaking?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The Gemini Live API has a setting &lt;code&gt;response_modalities: ["AUDIO"]&lt;/code&gt;, which can directly output an audio PCM stream. I connected it to another scenario – &lt;strong&gt;reading summaries aloud&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Function Design
&lt;/h3&gt;

&lt;p&gt;Each time the Bot summarizes a URL, YouTube, or PDF, a "🔊 Read Aloud" QuickReply button will appear below the message. When the user presses it, the Bot sends the summary text into Gemini Live TTS, converts the PCM audio to m4a, and then sends it back using &lt;code&gt;AudioSendMessage&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;URL summary complete
    │
    ▼ [🔊 Read Aloud] QuickReply button
    │
User presses the button → PostbackEvent
    │
    ▼ handle_read_aloud_postback()
    │
    ├─ ① Retrieve the summary text from summary_store (10 minutes TTL)
    │
    ├─ ② Gemini Live API → PCM audio
    │ model: gemini-live-2.5-flash-native-audio
    │ response_modalities: ["AUDIO"]
    │
    ├─ ③ ffmpeg transcoding: PCM → m4a
    │ s16le, 16kHz, mono → AAC
    │
    └─ ④ AudioSendMessage sent to the user
            original_content_url: /audio/{uuid}
            duration: {ms}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Core Code (tools/tts_tool.py)
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-live-2.5-flash-native-audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;text_to_speech&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;bytes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;genai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Client&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;vertexai&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;project&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;VERTEX_PROJECT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_modalities&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AUDIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;aio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;live&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_client_content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;turns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Content&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;types&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Part&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt;
            &lt;span class="n"&gt;turn_complete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pcm_chunks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;receive&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_turn&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;model_turn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parts&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                        &lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;part&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inline_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;server_content&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;turn_complete&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;break&lt;/span&gt;

    &lt;span class="n"&gt;pcm_bytes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_chunks&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;duration_ms&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;32000&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# 16kHz × 16-bit mono
&lt;/span&gt;
    &lt;span class="c1"&gt;# PCM → m4a (temp file mode, avoid moov atom problem)
&lt;/span&gt;    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tempfile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;NamedTemporaryFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;suffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pcm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;delete&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pcm_bytes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;pcm_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;
    &lt;span class="n"&gt;m4a_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pcm_path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.pcm&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.m4a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;subprocess&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ffmpeg&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;s16le&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-ar&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;16000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-ac&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
         &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-i&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;pcm_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;-c:a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;aac&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;m4a_path&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;capture_output&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m4a_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;duration_ms&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Pitfalls of Read Aloud Function
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 4: Completely Different Model Name
&lt;/h3&gt;

&lt;p&gt;The first attempt at Gemini Live TTS was:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-3.1-flash-live-preview&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Following the inference of &lt;code&gt;gemini-3.1-flash-lite-preview&lt;/code&gt; used for voice recognition, the result was a direct 1008 policy violation:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Publisher Model `projects/line-vertex/locations/global/publishers/google/
models/gemini-3.1-flash-live-preview` was not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Listing the available models on Vertex AI revealed that the model naming rules for Live/native audio are completely different:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ✅ Correct
&lt;/span&gt;&lt;span class="n"&gt;LIVE_MODEL&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gemini-live-2.5-flash-native-audio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is &lt;strong&gt;no Live version&lt;/strong&gt; of Gemini 3.1 on Vertex AI. The Live/native audio feature is currently the 2.5 generation, and the naming format is &lt;code&gt;gemini-live-{version}-{variant}-native-audio&lt;/code&gt;, which is completely separate from the general model &lt;code&gt;gemini-{version}-flash-{variant}&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❌ Pitfall 5: &lt;code&gt;GOOGLE_CLOUD_LOCATION=global&lt;/code&gt; Causes Live API to Disconnect
&lt;/h3&gt;

&lt;p&gt;After changing to the correct model name, the error message was still the same:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Publisher Model `projects/line-vertex/locations/global/...` was not found
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This time the model name was correct, but &lt;code&gt;locations/global&lt;/code&gt; was strange – we clearly set &lt;code&gt;us-central1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Investigating the source code of the Google GenAI SDK revealed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# _api_client.py
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;env_location&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;location&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;global&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="c1"&gt;# ← here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;location or env_location&lt;/code&gt; – if the passed-in &lt;code&gt;location&lt;/code&gt; is an empty string, it will fall back to &lt;code&gt;global&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The root cause of the problem is the environment variable of Cloud Run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"GOOGLE_CLOUD_LOCATION"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"global"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;GOOGLE_CLOUD_LOCATION&lt;/code&gt; was set to the &lt;code&gt;"global"&lt;/code&gt; string. &lt;code&gt;os.getenv("GOOGLE_CLOUD_LOCATION", "us-central1")&lt;/code&gt; did not get &lt;code&gt;"us-central1"&lt;/code&gt;, but &lt;code&gt;"global"&lt;/code&gt; – then the SDK obediently connected to the global endpoint, but &lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt; does not have BidiGenerateContent support in global.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Endpoint&lt;/th&gt;
&lt;th&gt;Standard API&lt;/th&gt;
&lt;th&gt;Live API&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;global&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Available&lt;/td&gt;
&lt;td&gt;❌ Model not here&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;us-central1&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;✅ Available&lt;/td&gt;
&lt;td&gt;✅ &lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Solution: Hardcode the location of the Live API, and don't read from the env var:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Affected by GOOGLE_CLOUD_LOCATION=global
&lt;/span&gt;&lt;span class="n"&gt;VERTEX_LOCATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;GOOGLE_CLOUD_LOCATION&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Hardcoded, not affected by env var
&lt;/span&gt;&lt;span class="n"&gt;VERTEX_LOCATION&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;us-central1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="c1"&gt;# Live API needs a regional endpoint
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Voice Recognition vs. Read Summary Aloud
&lt;/h2&gt;

&lt;p&gt;The two functions use completely different Gemini APIs:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;/th&gt;
&lt;th&gt;Voice Recognition&lt;/th&gt;
&lt;th&gt;Read Summary Aloud&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Direction&lt;/td&gt;
&lt;td&gt;Audio → Text&lt;/td&gt;
&lt;td&gt;Text → Audio&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;API&lt;/td&gt;
&lt;td&gt;Standard &lt;code&gt;generate_content&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Live API &lt;code&gt;BidiGenerateContent&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Model&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-3.1-flash-lite-preview&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;gemini-live-2.5-flash-native-audio&lt;/code&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Location&lt;/td&gt;
&lt;td&gt;Follows env var&lt;/td&gt;
&lt;td&gt;Hardcoded &lt;code&gt;us-central1&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Output Format&lt;/td&gt;
&lt;td&gt;text&lt;/td&gt;
&lt;td&gt;PCM → ffmpeg → m4a&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;LINE Message Type&lt;/td&gt;
&lt;td&gt;Input: &lt;code&gt;AudioMessage&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Output: &lt;code&gt;AudioSendMessage&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;The release of Gemini 3.1 Flash Live makes audio AI more worthy of serious consideration. This time, both voice recognition and read summary aloud were integrated into the LINE Bot:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;strong&gt;Voice Recognition&lt;/strong&gt;: Standard Gemini API, pre-recorded m4a one-time transcription, connected to the existing Orchestrator&lt;/li&gt;
&lt;li&gt;  &lt;strong&gt;Read Summary Aloud&lt;/strong&gt;: Gemini Live TTS, summary text to PCM, ffmpeg to m4a, &lt;code&gt;AudioSendMessage&lt;/code&gt; returns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most troublesome part is not the function itself, but &lt;strong&gt;finding the correct model name&lt;/strong&gt; and &lt;strong&gt;locating the SDK's location logic&lt;/strong&gt; – neither of these are clearly written in a prominent place in the documentation, and the answer can only be found by listing available models and reading the SDK source code.&lt;/p&gt;

&lt;p&gt;The full code is on &lt;a href="https://github.com/kkdai/linebot-helper-python" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt;, feel free to refer to it.&lt;/p&gt;

&lt;p&gt;See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>python</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Building an Agent Skill Hub: From Skill Development to Automated Multilingual Documentation Deployment on GitHub Pages</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 27 Mar 2026 01:45:20 +0000</pubDate>
      <link>https://dev.to/evanlin/building-an-agent-skill-hub-from-skill-development-to-automated-multilingual-documentation-5ae7</link>
      <guid>https://dev.to/evanlin/building-an-agent-skill-hub-from-skill-development-to-automated-multilingual-documentation-5ae7</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cu37bxccsz7i7k6wl0n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8cu37bxccsz7i7k6wl0n.png" alt="image-20260322225856161" width="800" height="692"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/kkdai/agent-skill-hub" rel="noopener noreferrer"&gt;Agent Skill Hub Repository&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://plateaukao.github.io/whisperASR/" rel="noopener noreferrer"&gt;whisperASR Reference Example&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://docs.github.com/en/pages" rel="noopener noreferrer"&gt;GitHub Pages Official Documentation&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents how I built a skill description specification from scratch and created a GitHub Pages documentation site that supports both Chinese and English, drawing inspiration from minimalist aesthetics, while developing the &lt;strong&gt;Agent Skill Hub (2026 Skill Library)&lt;/strong&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;With the popularity of AI Agents (such as OpenClaw or Gemini CLI), we found that "how to quickly understand and execute specific tasks for the Agent" has become key. Instead of writing long prompts every time, it's better to package common operations into standardized &lt;strong&gt;Skills&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;To facilitate community communication and Agent reading, I created &lt;code&gt;agent-skill-hub&lt;/code&gt;. But code alone is not enough; we also need a decent "facade" – a document website that is both aesthetically pleasing and has technical details.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Step 1: Standardize Skill Descriptions (SKILL.md)
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;agent-skill-hub&lt;/code&gt;, each skill (such as &lt;code&gt;gcp-helper&lt;/code&gt; or &lt;code&gt;n8n-executor&lt;/code&gt;) has a &lt;code&gt;SKILL.md&lt;/code&gt;. The structure of this file is crucial because it's not just for humans to read, but also for LLMs to read:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Name &amp;amp; Description&lt;/strong&gt;: Let the Agent know what this is.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;When to Use&lt;/strong&gt;: Define trigger scenarios.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Core Pattern&lt;/strong&gt;: Provide standard instruction examples.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Common Mistakes&lt;/strong&gt;: Reduce errors caused by Agent hallucinations.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎨 Step 2: Design Style — Tribute to Minimalist Aesthetics
&lt;/h2&gt;

&lt;p&gt;When designing the web pages under the &lt;code&gt;docs&lt;/code&gt; directory, I referenced the style of &lt;strong&gt;whisperASR&lt;/strong&gt;. That design of a dark background with bright accent colors (Teal) is very in line with the aesthetics of modern developers:&lt;/p&gt;

&lt;h3&gt;
  
  
  Visual Element Highlights:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Gradient Title&lt;/strong&gt;: Use &lt;code&gt;linear-gradient&lt;/code&gt; to create a high-end texture.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Teal Accent Color&lt;/strong&gt;: Use &lt;code&gt;#14b8a6&lt;/code&gt; as the highlight color for key buttons and titles.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Card-style Layout&lt;/strong&gt;: Clearly present the icons and introductions of each skill, with good responsive design.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  🌐 Step 3: Multilingual Support and Automatic Switching
&lt;/h2&gt;

&lt;p&gt;To make it available to developers worldwide, I adopted a directory-structured language management method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docs/
├── index.html (Language detection and redirection)
├── en/ (English version)
│ └── skills/
└── zh/ (Traditional Chinese version)
    └── skills/

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I added a simple JavaScript snippet to the root directory's &lt;code&gt;index.html&lt;/code&gt;, which automatically redirects to the correct language based on the user's browser settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;lang&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;language&lt;/span&gt; &lt;span class="o"&gt;||&lt;/span&gt; &lt;span class="nb"&gt;navigator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;userLanguage&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;lang&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startsWith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;zh&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./zh/index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;window&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;location&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;href&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;./en/index.html&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🚀 Step 4: GitHub Pages Deployment Process
&lt;/h2&gt;

&lt;p&gt;In 2026, the most recommended deployment method is to put the content in the &lt;code&gt;docs/&lt;/code&gt; directory of the main branch, which can keep the &lt;code&gt;main&lt;/code&gt; branch clean while keeping development and documentation synchronized.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Prepare the Directory Structure
&lt;/h3&gt;

&lt;p&gt;Create all the necessary directories at once using the command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; docs/en/skills docs/zh/skills docs/assets/css

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Git Commit and Push
&lt;/h3&gt;

&lt;p&gt;After completing HTML/CSS development, execute the standard Git process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git add docs/
git commit &lt;span class="nt"&gt;-m&lt;/span&gt; &lt;span class="s2"&gt;"docs: add GitHub Pages documentation in English and Chinese"&lt;/span&gt;
git push origin main

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Enable GitHub Pages Settings
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt; Go to &lt;strong&gt;Settings &amp;gt; Pages&lt;/strong&gt; in the GitHub repository.&lt;/li&gt;
&lt;li&gt; Under &lt;strong&gt;Build and deployment&lt;/strong&gt;, in &lt;strong&gt;Branch&lt;/strong&gt;, select the &lt;code&gt;main&lt;/code&gt; branch and the &lt;code&gt;/docs&lt;/code&gt; folder.&lt;/li&gt;
&lt;li&gt; Click &lt;strong&gt;Save&lt;/strong&gt;, and the website will be online in a few minutes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj277v30sramun4c3ae5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnj277v30sramun4c3ae5.png" alt="image-20260322225932252" width="800" height="476"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Common Pitfalls and Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❓ Why can't the webpage style (CSS) be loaded?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt; In HTML files under subdirectories (such as &lt;code&gt;en/skills/&lt;/code&gt;), the referenced paths must correctly use relative paths. &lt;strong&gt;Correction:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight html"&gt;&lt;code&gt;&lt;span class="c"&gt;&amp;lt;!-- In the home page index.html --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"stylesheet"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"../assets/css/style.css"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;&amp;lt;!-- In the skill detail page --&amp;gt;&lt;/span&gt;
&lt;span class="nt"&gt;&amp;lt;link&lt;/span&gt; &lt;span class="na"&gt;rel=&lt;/span&gt;&lt;span class="s"&gt;"stylesheet"&lt;/span&gt; &lt;span class="na"&gt;href=&lt;/span&gt;&lt;span class="s"&gt;"../../assets/css/style.css"&lt;/span&gt;&lt;span class="nt"&gt;&amp;gt;&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  ❓ How to ensure that the Agent can correctly read the document?
&lt;/h3&gt;

&lt;p&gt;We have retained a large number of semantic tags (&lt;code&gt;article&lt;/code&gt;, &lt;code&gt;h2&lt;/code&gt;, &lt;code&gt;pre&lt;/code&gt;, &lt;code&gt;code&lt;/code&gt;) in the HTML, so that the Agent can more accurately capture the core logic when performing RAG (Retrieval-Augmented Generation) or directly reading the webpage.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏁 Conclusion
&lt;/h2&gt;

&lt;p&gt;Through this development, I have realized the importance of "documentation as product". A good AI skill library, in addition to powerful program logic, also needs a clear, intuitive, and multilingual-friendly navigation system.&lt;/p&gt;

&lt;p&gt;If you also want to create a professional facade for your AI project, you might as well refer to the &lt;code&gt;docs/&lt;/code&gt; structure layout. Happy Coding! 🦞&lt;/p&gt;




</description>
      <category>agents</category>
      <category>automation</category>
      <category>documentation</category>
      <category>github</category>
    </item>
    <item>
      <title>Security Declaration for AI Agents: Deep Dive into A2AS (Agent-to-Agent Security) Certification Mechanism</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Fri, 27 Mar 2026 01:45:10 +0000</pubDate>
      <link>https://dev.to/evanlin/security-declaration-for-ai-agents-deep-dive-into-a2as-agent-to-agent-security-certification-2okf</link>
      <guid>https://dev.to/evanlin/security-declaration-for-ai-agents-deep-dive-into-a2as-agent-to-agent-security-certification-2okf</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2FA2AS-CERTIFIED-f3af80" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fimg.shields.io%2Fbadge%2FA2AS-CERTIFIED-f3af80" alt="A2AS-CERTIFIED" width="110" height="20"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Reference links:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://a2as.org" rel="noopener noreferrer"&gt;A2AS.org Official Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.a2as.org/certified/agents/kkdai/linebot-adk" rel="noopener noreferrer"&gt;linebot-adk Project Certification Page&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents an interesting Pull Request I received while maintaining &lt;strong&gt;linebot-adk (LINE Bot Agent Development Kit)&lt;/strong&gt;: adding the &lt;strong&gt;A2AS security certificate&lt;/strong&gt; to the project. This is not just a YAML file, but a significant milestone for AI Agents to move towards "industrial-grade security" in 2026.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0eydihb3vmxsrfh8k20.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa0eydihb3vmxsrfh8k20.png" alt="Google Chrome 2026-03-26 22.45.44" width="800" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Background
&lt;/h1&gt;

&lt;p&gt;When we develop Agents like &lt;code&gt;linebot-adk&lt;/code&gt; that have Tool Use (Function Calling) capabilities, the biggest concern for users is often: "Will this Agent issue commands without my permission?" or "What data can it access?".&lt;/p&gt;

&lt;p&gt;Traditionally, we could only write explanations in &lt;code&gt;README.md&lt;/code&gt;, but that's for humans to read, not for system verification. This is why &lt;strong&gt;A2AS (Agent-to-Agent Security)&lt;/strong&gt; emerged – it's hailed as the "HTTPS of the AI world".&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Step 1: Understanding the BASIC Model of A2AS
&lt;/h2&gt;

&lt;p&gt;A2AS is not just a name; it has a complete &lt;strong&gt;BASIC security model&lt;/strong&gt; behind it, designed to solve the trust issue between AI Agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;(B)ehavior Certificates&lt;/strong&gt;: Declarative certificates that clearly define the behavior boundaries of the Agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(A)uthenticated Prompts&lt;/strong&gt;: Ensures that the source of prompts is trustworthy and traceable.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(S)ecurity Boundaries&lt;/strong&gt;: Uses structured tags (such as &lt;code&gt;&amp;lt;a2as:user&amp;gt;&lt;/code&gt;) to isolate untrusted input.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(I)n-Context Defenses&lt;/strong&gt;: Embeds defense logic in prompts to reject malicious injections.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;(C)odified Policies&lt;/strong&gt;: Writes business rules into code and enforces them during inference.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  🎨 Step 2: Deconstructing a2as.yaml – The Agent's ID Card
&lt;/h2&gt;

&lt;p&gt;In PR #1 received by &lt;code&gt;linebot-adk&lt;/code&gt;, the most core change was the addition of &lt;code&gt;a2as.yaml&lt;/code&gt;. This file is like the Agent's "digital signature", making the code logic explicit:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;manifest&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;subject&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kkdai/linebot-adk&lt;/span&gt;
    &lt;span class="na"&gt;scope&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;main.py&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;multi_tool_agent/agent.py&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
  &lt;span class="na"&gt;issued&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;by&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;A2AS.org&lt;/span&gt;
    &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;https://a2as.org/certified/agents/kkdai/linebot-adk&lt;/span&gt;

&lt;span class="na"&gt;agents&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;root_agent&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;instance&lt;/span&gt;
    &lt;span class="na"&gt;models&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gemini-2.5-flash&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;get_weather&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="nv"&gt;get_current_time&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why is this important?
&lt;/h3&gt;

&lt;p&gt;This certificate is directly linked to the content of our &lt;code&gt;main.py&lt;/code&gt;. When the certificate declares &lt;code&gt;tools: [get_weather, get_current_time]&lt;/code&gt;, it means this is a &lt;strong&gt;limited-authorization&lt;/strong&gt; Agent. If it tries to execute &lt;code&gt;delete_database&lt;/code&gt;, the security monitoring system can immediately detect that it is outside the certificate scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  🌐 Step 3: Combining Code Logic
&lt;/h2&gt;

&lt;p&gt;In &lt;code&gt;linebot-adk&lt;/code&gt;, we used Google's &lt;strong&gt;ADK (Agent Development Kit)&lt;/strong&gt; to build the Agent. The A2AS certificate can accurately map our program architecture:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Tool Declaration and Implementation
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;multi_tool_agent/agent.py&lt;/code&gt;, we defined two tools:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Implement the logic to get the weather
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_current_time&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Implement the logic to get the time
&lt;/span&gt;    &lt;span class="bp"&gt;...&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The A2AS certificate will register these &lt;code&gt;function&lt;/code&gt;s in the &lt;code&gt;tools&lt;/code&gt; block, ensuring that the Agent's capability boundaries are transparent and auditable.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Runner and Execution Loop
&lt;/h3&gt;

&lt;p&gt;In &lt;code&gt;main.py&lt;/code&gt;, we start the Agent through &lt;code&gt;Runner&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;runner&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Runner&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;root_agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;app_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;APP_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session_service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;manifest.subject.scope&lt;/code&gt; in the certificate marks &lt;code&gt;main.py&lt;/code&gt;, which means the entire startup process (including FastAPI's Webhook processing) is within the A2AS compliant scope.&lt;/p&gt;




&lt;h2&gt;
  
  
  🚀 Step 4: Why is this the "HTTPS of the AI world"?
&lt;/h2&gt;

&lt;p&gt;Imagine if you want a "travel agent Agent" to talk to a "hotel reservation Agent".&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Without A2AS&lt;/strong&gt;: The travel Agent can only "blindly trust" the hotel Agent.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;With A2AS&lt;/strong&gt;: The travel Agent can first check the other party's &lt;code&gt;a2as.yaml&lt;/code&gt; certificate. If the other party claims to have the right to "modify orders" but the certificate doesn't say so, the travel Agent can refuse the transaction.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This &lt;strong&gt;"verify first, then execute"&lt;/strong&gt; model is the trust network that A2AS wants to build.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Common Pitfalls and Troubleshooting
&lt;/h2&gt;

&lt;h3&gt;
  
  
  ❓ What if the certificate expires or the Commit Hash doesn't match?
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt; A2AS certificates are bound to a specific Git Commit. When you modify the logic of &lt;code&gt;agent.py&lt;/code&gt; but don't update the certificate, the verification will fail. &lt;strong&gt;Correction:&lt;/strong&gt; Every time you modify the core functions of the Agent (such as adding a Tool or changing the Model), you must regenerate and sign &lt;code&gt;a2as.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  ❓ Does using A2AS increase latency?
&lt;/h3&gt;

&lt;p&gt;No. A2AS is mainly a "declarative" and "structured" specification. During the inference phase, it uses structured tags (S in the BASIC model) to help LLMs distinguish between instructions and data, which can reduce the hallucinations caused by the model's confusion and improve execution efficiency.&lt;/p&gt;




&lt;h2&gt;
  
  
  🏁 Conclusion
&lt;/h2&gt;

&lt;p&gt;Through the introduction of this A2AS certificate, &lt;code&gt;linebot-adk&lt;/code&gt; is no longer just a simple LINE Bot example; it has become a transparent Agent that meets the 2026 security standards. In an era where AI agents are gradually penetrating our lives, "transparency" is the best defense.&lt;/p&gt;

&lt;p&gt;If you are also developing AI Agents, you might as well go to &lt;a href="https://a2as.org" rel="noopener noreferrer"&gt;A2AS.org&lt;/a&gt; and add that badge of trust to your project. Happy Coding! 🦞&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>opensource</category>
      <category>security</category>
    </item>
    <item>
      <title>Deploying OpenClaw on Google Cloud VM: Avoiding Sudo and NVM Pitfalls</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sun, 01 Mar 2026 14:04:54 +0000</pubDate>
      <link>https://dev.to/gde/deploying-openclaw-on-google-cloud-vm-avoiding-sudo-and-nvm-pitfalls-92k</link>
      <guid>https://dev.to/gde/deploying-openclaw-on-google-cloud-vm-avoiding-sudo-and-nvm-pitfalls-92k</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatem7u193qqdcdo7bfox.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fatem7u193qqdcdo7bfox.png" alt="OpenClaw on GCP" width="800" height="436"&gt;&lt;/a&gt;&lt;em&gt;(Image generated by &lt;a href="https://github.com/kkdai/nanobanana" rel="noopener noreferrer"&gt;Nano Banana&lt;/a&gt; - Gemini Image Generation)&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;References:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://openclaw.ai/" rel="noopener noreferrer"&gt;OpenClaw Official Website&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://yu-wenhao.com/zh-TW/blog/openclaw-tools-skills-tutorial/" rel="noopener noreferrer"&gt;OpenClaw Practical Tutorial: Chinese FAQ and Recommended Skills&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://yu-wenhao.com/zh-TW/blog/2026-02-04-is-openclaw-safe-security-guide/" rel="noopener noreferrer"&gt;OpenClaw Security Guide: Security Enhancement Recommendations&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://youtu.be/FC3Wo3ew130" rel="noopener noreferrer"&gt;YouTube Tutorial: Deploying OpenClaw on GCP&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This article documents the complete solution process for the permission, environment variable, and process persistence issues encountered when installing &lt;strong&gt;OpenClaw (2026 Latest Version)&lt;/strong&gt; in a Debian/Ubuntu environment on Google Cloud Platform (GCP).&lt;/p&gt;

&lt;h1&gt;
  
  
  Preface
&lt;/h1&gt;

&lt;p&gt;The AI Agent field has been very popular recently. &lt;strong&gt;OpenClaw&lt;/strong&gt;, as an open-source AI agent that can operate 24 hours a day, has impressed people with its powerful system access and browsing capabilities. For security reasons, deploying it on a cloud VM (such as GCP GCE) is the most ideal approach, which can ensure 24/7 online availability and isolate sensitive local data.&lt;/p&gt;

&lt;p&gt;However, in the default Debian/Ubuntu environment of GCP, due to the permission mechanism being slightly different from that of a general Desktop Linux, following the official script for installation often leads to many pitfalls.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Basic Installation Process of OpenClaw on GCP
&lt;/h2&gt;

&lt;p&gt;Before we get into troubleshooting, let's quickly go through the standard installation logic:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Create a VM Instance
&lt;/h3&gt;

&lt;p&gt;Create a new VM in the GCP Console:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Machine type&lt;/strong&gt;: Recommended &lt;code&gt;e2-small&lt;/code&gt; or &lt;code&gt;e2-medium&lt;/code&gt; (depending on your Agent load).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operating system&lt;/strong&gt;: Recommended to choose &lt;strong&gt;Ubuntu 24.04 LTS&lt;/strong&gt; or &lt;strong&gt;Debian 12&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hard disk&lt;/strong&gt;: Recommended 20GB or more.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. Connect and Basic Updates
&lt;/h3&gt;

&lt;p&gt;After entering the VM via SSH, first perform a system update:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sudo apt update &amp;amp;&amp;amp; sudo apt upgrade -y
sudo apt install -y git curl build-essential

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. Officially Install OpenClaw
&lt;/h3&gt;

&lt;p&gt;The official website provides a one-click installation script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;curl -fsSL https://openclaw.ai/install.sh | bash

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;But!&lt;/strong&gt; If you directly execute the above script, you will usually encounter the following two serious permission and path problems on GCP.&lt;/p&gt;




&lt;h2&gt;
  
  
  🛠️ Problem 1: "HAL 9000" Style Denial of sudo-rs
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Symptom:&lt;/strong&gt; When executing the official installation script, the following error is encountered with &lt;code&gt;sudo-rs&lt;/code&gt;:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;code&gt;sudo-rs: I'm sorry evanslin. I'm afraid I can't do that&lt;/code&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;strong&gt;Reason:&lt;/strong&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Interaction Restriction&lt;/strong&gt;: The script executed via &lt;code&gt;curl ... | bash&lt;/code&gt; cannot obtain password input from the terminal when &lt;code&gt;sudo&lt;/code&gt; is required.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;No Password Account&lt;/strong&gt;: GCP defaults to using SSH Key login, and the user account usually does not have a physical password set, leading to &lt;code&gt;sudo&lt;/code&gt; authentication failure.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; Use &lt;strong&gt;NVM (Node Version Manager)&lt;/strong&gt; to install Node.js, and build the environment under the user directory, completely avoiding the &lt;code&gt;sudo&lt;/code&gt; requirement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# 1. Install NVM
curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.7/install.sh | bash

# Reload shell configuration
source ~/.bashrc

# 2. Install Node.js
nvm install node # Recommended version v25.7.0+

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🛠️ Problem 2: NVM Path and Environment Variables
&lt;/h2&gt;

&lt;p&gt;After using NVM, although &lt;code&gt;sudo&lt;/code&gt; is avoided, a new problem arises: when you log in again or execute commands using a non-interactive shell, the system may not be able to find the &lt;code&gt;node&lt;/code&gt; or &lt;code&gt;openclaw&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;This is because the NVM path is dynamically loaded. It is recommended to ensure that the following content exists in &lt;code&gt;~/.bashrc&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;export NVM_DIR="$HOME/.nvm"
[-s "$NVM_DIR/nvm.sh"] &amp;amp;&amp;amp; \. "$NVM_DIR/nvm.sh"
[-s "$NVM_DIR/bash_completion"] &amp;amp;&amp;amp; \. "$NVM_DIR/bash_completion"

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  🛠️ Problem 3: How to Make OpenClaw Run 24/7 Stably?
&lt;/h2&gt;

&lt;p&gt;After installation, in order to keep the Agent running after closing the SSH window, I switched from the original GCP Web SSH to using the local &lt;code&gt;gcloud&lt;/code&gt; CLI, but I also found a new small pitfall.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Why gcloud ssh can't find openclaw?
&lt;/h3&gt;

&lt;p&gt;This is usually because GCP's &lt;code&gt;gcloud compute ssh&lt;/code&gt; may create a new username based on your &lt;strong&gt;local account name&lt;/strong&gt;, instead of using the account you used when installing on the VM (e.g., &lt;code&gt;evanslin&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Verification method:&lt;/strong&gt; Please enter the following in the "Web SSH" and "Local gcloud SSH" windows respectively:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;whoami

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Root cause:&lt;/strong&gt; If the web version shows &lt;code&gt;evanslin&lt;/code&gt;, but the gcloud version shows a name like &lt;code&gt;evan_lin_yourdomain_com&lt;/code&gt;, then the home directory paths of the two are completely different, and your NVM and OpenClaw settings will of course "disappear".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Solution:&lt;/strong&gt; When executing the &lt;code&gt;gcloud&lt;/code&gt; command, &lt;strong&gt;explicitly specify&lt;/strong&gt; the account to log in to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;gcloud compute ssh evanslin@openclaw-evanlin

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will ensure that you return to the correct environment!&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Use tmux and Startup Script to Achieve Perfect Execution
&lt;/h3&gt;

&lt;p&gt;In order to ensure that environment variables can be loaded correctly in any SSH session (web version or gcloud version), and to keep OpenClaw running stably in the background, it is recommended to use the following "scripted" startup method.&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 1: Create a Startup Script
&lt;/h4&gt;

&lt;p&gt;In a window where you can normally execute &lt;code&gt;openclaw&lt;/code&gt; (usually Web SSH), create a startup script:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;cat &amp;lt;&amp;lt; 'EOF' &amp;gt; ~/start_openclaw.sh
#!/bin/bash
# 1. Force loading NVM path
export NVM_DIR="$HOME/.nvm"
[-s "$NVM_DIR/nvm.sh"] &amp;amp;&amp;amp; \. "$NVM_DIR/nvm.sh"

# 2. Automatically correct PATH (please adjust the path according to your Node version)
export PATH="$HOME/.nvm/versions/node/v25.7.0/bin:$PATH"

# 3. Execute command
openclaw "$@"
EOF

# Grant execution permission
chmod +x ~/start_openclaw.sh

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Step 2: Verify the Script
&lt;/h4&gt;

&lt;p&gt;From now on, no matter where you log in from, please use this script uniformly. Test in the &lt;code&gt;gcloud ssh&lt;/code&gt; window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;~/start_openclaw.sh gateway

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If it can run successfully, it means the path has been manually connected!&lt;/p&gt;

&lt;h4&gt;
  
  
  Step 3: Combine tmux to Solve the Disconnection Problem
&lt;/h4&gt;

&lt;p&gt;Now we combine the script with &lt;code&gt;tmux&lt;/code&gt; to achieve true 24/7 background operation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; &lt;strong&gt;Open a new session&lt;/strong&gt;: &lt;code&gt;tmux new -s openclaw&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Execute the script inside&lt;/strong&gt;: &lt;code&gt;~/start_openclaw.sh gateway&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Perfectly detach&lt;/strong&gt;: Press &lt;code&gt;Ctrl + B&lt;/code&gt; and release, then press &lt;code&gt;D&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt; &lt;strong&gt;Reconnect at any time&lt;/strong&gt;: Next time you log in, execute &lt;code&gt;tmux a -t openclaw&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;The key to deploying OpenClaw on GCP is &lt;strong&gt;"user directory priority"&lt;/strong&gt;. By using NVM to avoid the system-level &lt;code&gt;sudo-rs&lt;/code&gt; restriction, not only is the installation process smoother, but it also makes it easier to switch Node.js versions to meet the latest requirements of OpenClaw.&lt;/p&gt;

&lt;p&gt;After successful deployment, don't forget to use &lt;code&gt;openclaw onboard&lt;/code&gt; to start configuring your API Keys and communication channels (such as Telegram or Discord).&lt;/p&gt;

&lt;p&gt;I hope this note can help developers who are also working hard on GCP. See you next time!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>cloud</category>
      <category>google</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Sharing Good Books: Secrets to Successful WFH</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 28 Feb 2026 16:38:19 +0000</pubDate>
      <link>https://dev.to/evanlin/sharing-good-books-secrets-to-successful-wfh-5g8j</link>
      <guid>https://dev.to/evanlin/sharing-good-books-secrets-to-successful-wfh-5g8j</guid>
      <description>&lt;p&gt;&lt;a href="http://moo.im/a/7opqFR" rel="noopener noreferrer"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3pw9cfbmn02qn16w62a1.jpg" width="210" height="298"&gt;&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WFH在家工作的成功祕訣
美國中小企業最佳CEO教你高效、彈性、具團隊精神的企業競爭新優勢
How to Thrive in the Virtual Workplace : Simple and Effective Tips for Successful， Productive and Empowered Remote Work

作者： 羅伯特・格雷瑟 米克・史隆 原文作者： Robert Glazer Mick Sloan 譯者： 孟令函 出版社：遠流出版

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Purchase Recommendation Website:
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="http://moo.im/a/7opqFR" rel="noopener noreferrer"&gt;Readmoo Online Book Purchase&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Preface:
&lt;/h1&gt;

&lt;p&gt;This is the eleventh book I've read this year. Since the international outbreak of the pandemic in 2020, the company has actually started to accelerate its transformation into a Hybrid Office concept. The so-called Hybrid Office means having flexible office seats, plus flexible remote work. During the severe pandemic, we even directly launched a full-stage WFH.&lt;/p&gt;

&lt;p&gt;Whether you are employees or supervisors, are you afraid or fond of WFH? Do you like it because you don't need to spend extra commuting time, but you also worry that your home doesn't have enough equipment, and you also worry that you don't have actual interaction with your colleagues? When I saw this book, I thought it was interesting, so I bought it and read it.&lt;/p&gt;

&lt;h1&gt;
  
  
  Content Introduction:
&lt;/h1&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Build a virtual office and enhance future competitiveness!
"If you are still struggling with remote work, Robert Glazer can provide you with some immediately actionable advice." - Adam Grant (Professor at Wharton Business School, author of "Give and Take")

When millions of office workers around the world were suddenly forced to work from home (WFH, Work From Home) to prevent the pandemic, business owners found that employees were more willing to accept it than they had previously understood, and most of the work content could still operate normally. However, not every company and every office worker can smoothly transition overnight, and it's not enough to simply apply the work procedures and strategies commonly used in physical offices. In the future, as remote or hybrid work models become more and more common, companies that do well will have a clear competitive advantage and attract the best talent.

As the founder and CEO of "Accelerate Partners," a 100% remote work organization with 170 employees working from home, Robert Glazer has drawn on more than ten years of valuable experience to extract the correct principles, strategies, and tools for managing remote employees, allowing companies to excel in both the virtual and real worlds.

Office workers will from now on:
✔ Don't have to commute, stay away from the pressure of high housing prices and high rents in the city
✔ Not be disturbed, create their own work schedule and environment
✔ Enjoy the ideal life of balancing family, interests, and work

Companies can even:
✔ Save costs, or can invest more resources in employees and customers
✔ Improve efficiency, and can achieve excellent performance and work results globally
✔ Create an equal and cohesive work environment, retaining outstanding talent
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Chapter Outline
&lt;/h1&gt;

&lt;h2&gt;
  
  
  Part 1: The Winning Mindset for Remote Workers
&lt;/h2&gt;

&lt;p&gt;What exactly is remote work? It's not a product of the pandemic. Before the pandemic, many companies needed businesses or customer service marketing personnel around the world. But they couldn't afford to set up physical offices in every region. The result was that employees came from all over the world and could work from their own homes.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Winning Mindset for Remote Workers&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Recruit diligent, responsible, and self-disciplined employees&lt;/li&gt;
&lt;li&gt;Give them enough trust&lt;/li&gt;
&lt;li&gt;Perfect work procedures&lt;/li&gt;
&lt;li&gt;Excellent company culture&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  &lt;strong&gt;The Basics for Remote Workers&lt;/strong&gt;
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Develop a work plan and execute it effectively&lt;/li&gt;
&lt;li&gt;Create a suitable work environment&lt;/li&gt;
&lt;li&gt;Establish a clear boundary between work and personal life&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Properly Manage Your Email
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;After remote work, the probability of email exchanges will increase.&lt;/li&gt;
&lt;li&gt;How to let others know your reply frequency is very important.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Methods to Improve Work Efficiency
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Allocate energy well&lt;/li&gt;
&lt;li&gt;Create a buffer before and after work&lt;/li&gt;
&lt;li&gt;Prioritize and allocate time.&lt;/li&gt;
&lt;li&gt;Establish expectations&lt;/li&gt;
&lt;li&gt;Stay focused

&lt;ul&gt;
&lt;li&gt;Try to focus on one thing for at least 15 to 20 minutes a day.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Take care of yourself

&lt;ul&gt;
&lt;li&gt;Physical and mental health is very important, don't ruin your health because of WFH.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Establish communication between people

&lt;ul&gt;
&lt;li&gt;Create some chat channels&lt;/li&gt;
&lt;li&gt;Allow more participants to speak in meetings.&lt;/li&gt;
&lt;li&gt;Make good use of asynchronous video (use videos instead of emails or announcements)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Changing Work Location
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Make sure that the region (country) has an office of the company, otherwise there may be problems with salary remittances.&lt;/li&gt;
&lt;li&gt;Due to the difference in labor laws and tax rates in various countries, employee benefits and labor regulations are different.&lt;/li&gt;
&lt;li&gt;Changing countries may result in salary differences, which will be adjusted based on the cost of living in each location.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Part 2: The Success Rules for Remote Work Companies
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Starting from Organizational Culture
&lt;/h3&gt;

&lt;p&gt;Since remote work companies care a lot about employees' autonomous work motivation, every colleague needs to have an in-depth understanding of the organizational culture. (And also be able to deeply identify with it).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Company Culture:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Vision&lt;/li&gt;
&lt;li&gt;Values&lt;/li&gt;
&lt;li&gt;Goals&lt;/li&gt;
&lt;li&gt;Consistency&lt;/li&gt;
&lt;li&gt;Clear and explicit&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  How to Describe the Core Concept of the Company:
&lt;/h4&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The core concept is to think about a specific future point in time, and use the tone of describing the current facts, as much as possible, to detail what the company and employees will be like at that time, and how they will feel.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  When to Use the Core Concept:
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Recruiting employees&lt;/li&gt;
&lt;li&gt;Major policy decisions&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Recruit Suitable Remote Employees
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Ask the other party if they have remote work experience&lt;/li&gt;
&lt;li&gt;Whether they agree with the core concept&lt;/li&gt;
&lt;li&gt;Look at the other party's concept and handling methods for remote work&lt;/li&gt;
&lt;li&gt;You can ask detailed questions

&lt;ul&gt;
&lt;li&gt;Do you like remote work -&amp;gt; Why do you like it -&amp;gt; How do you arrange it -&amp;gt; Self-adjustment&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Conduct Remote Interviews
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Fact-based interview questions&lt;/li&gt;
&lt;li&gt;What changes does remote work bring&lt;/li&gt;
&lt;li&gt;Are you troubled because you can't work face-to-face?&lt;/li&gt;
&lt;li&gt;How to communicate effectively without meeting&lt;/li&gt;
&lt;li&gt;How to avoid feeling isolated while working from home
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Will not waste training resources for someone who only meets the average standard
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Notes for Remote Work Colleagues
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Complete onboarding process

&lt;ul&gt;
&lt;li&gt;1 on 1 with each supervisor&lt;/li&gt;
&lt;li&gt;Colleagues breaking the ice&lt;/li&gt;
&lt;li&gt;Setting up equipment&lt;/li&gt;
&lt;li&gt;Related pre-education system&lt;/li&gt;
&lt;li&gt;More special:&lt;/li&gt;
&lt;li&gt;Introduction to company regulations (especially related to remote work)&lt;/li&gt;
&lt;li&gt;Introduction to company culture (to constantly keep everyone on the same core concept)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Reduce meetings, especially reduce regular meetings, and change to irregular, fast, and concise discussions with a small number of participants

&lt;ul&gt;
&lt;li&gt;Meeting participants rate themselves whether they need to attend, if it is less than six points. Then cancel the relevant meeting.&lt;/li&gt;
&lt;li&gt;Everyone participating in the meeting must speak&lt;/li&gt;
&lt;li&gt;Meeting summaries are very important (to avoid someone not being able to participate)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Etiquette in different time zones

&lt;ul&gt;
&lt;li&gt;Emails and messages should clearly indicate the time zone that needs attention. (If possible, convert it to their time zone)&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Travel strategy

&lt;ul&gt;
&lt;li&gt;The probability of business trips is expected to decrease after the pandemic&lt;/li&gt;
&lt;li&gt;Become more individual, more face-to-face meetings with fewer people&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Team camaraderie

&lt;ul&gt;
&lt;li&gt;Deepen camaraderie through regular meetings and casual chats after meetings.&lt;/li&gt;
&lt;li&gt;Play some online games&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Performance management

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;Compared to physical work, remote work requires more feedback and suggestions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;Don't always think that you only give opinions during performance evaluations.&lt;/li&gt;
&lt;li&gt;This can increase the sense of trust between colleagues.&lt;/li&gt;
&lt;li&gt;&lt;code&gt;Praise, praise immediately!&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Responsible culture

&lt;ul&gt;
&lt;li&gt;Avoid the strategy of close monitoring&lt;/li&gt;
&lt;li&gt;Through weekly reports, or daily regular reports.&lt;/li&gt;
&lt;li&gt;Give more trust and care appropriately.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Trust crisis:

&lt;ul&gt;
&lt;li&gt;If any violations occur, they need to be handled immediately.&lt;/li&gt;
&lt;li&gt;And announce it (without announcing the name, only providing the violation) as a reminder among colleagues&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Physical employee conference

&lt;ul&gt;
&lt;li&gt;Although everyone is working remotely, it doesn't mean that everyone doesn't need to meet. You can arrange for everyone to meet in the same place once a year.&lt;/li&gt;
&lt;li&gt;Connect feelings and synchronize company culture&lt;/li&gt;
&lt;li&gt;And it can make more people work more smoothly&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h1&gt;
  
  
  Thoughts
&lt;/h1&gt;

&lt;p&gt;This book was written by the founder of a startup accelerator company, and his company has also been doing full remote work for a long time. The whole book clearly explains remote work through the impact of the pre-pandemic and the pandemic. Two major aspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;As a remote worker, how should you adjust?&lt;/li&gt;
&lt;li&gt;As a manager, how should you manage your all-remote team? (or even a full-remote company)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This book gives remote workers the psychological preparation they should have. After all, remote work is not just about saving commuting time, but also about paying more attention to the overall transformation of the work model. Remote work requires a higher degree of self-discipline and proactive aspects. Only then can supervisors and company colleagues trust and feel at ease. It is even more necessary to balance your life and work to avoid blurring the lines between work and personal life because of working from home, which can lead to early burn-out.&lt;/p&gt;

&lt;p&gt;And as a manager, you need to pay more attention to the company culture and core concepts. Because employees are scattered everywhere, they cannot feel the banners and slogans in many office decorations. You need to frequently communicate relevant information, and you also need to pay special attention when recruiting employees. Not all employees can understand and properly use the benefits that remote work brings to them. This book also spends a lot of time teaching how to build corporate culture and core concepts (remotely), which also gives me a lot of in-depth understanding.&lt;/p&gt;

&lt;p&gt;Finally, whether you are a prospective remote worker or a management level who may become a remote worker, this book can help you.&lt;/p&gt;

</description>
      <category>career</category>
      <category>management</category>
      <category>productivity</category>
      <category>resources</category>
    </item>
    <item>
      <title>LINE Bot with Long Memory: Firebase Database, Gemini Pro, and Cloud Functions</title>
      <dc:creator>Evan Lin</dc:creator>
      <pubDate>Sat, 28 Feb 2026 16:38:04 +0000</pubDate>
      <link>https://dev.to/evanlin/line-bot-with-long-memory-firebase-database-gemini-pro-and-cloud-functions-455j</link>
      <guid>https://dev.to/evanlin/line-bot-with-long-memory-firebase-database-gemini-pro-and-cloud-functions-455j</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0ewdqrz4j8ctmd78pjh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe0ewdqrz4j8ctmd78pjh.png" alt="image-20240413210750427" width="800" height="1731"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Preface:
&lt;/h1&gt;

&lt;p&gt;This is the second in a series of articles for the BUILD WITH AI (BWAI) WORKSHOP, in collaboration with the Google Developer Group on 04/18 (it's unknown how many more articles are needed).&lt;/p&gt;

&lt;p&gt;This article will focus on the following aspects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Firebase Database setup&lt;/li&gt;
&lt;li&gt;How to access Firebase through the official Golang on Cloud Function&lt;/li&gt;
&lt;li&gt;Using Firebase Database to make your Gemini remember everything that has been said, optimizing the LINE Bot built in &lt;a href="https://dev.to/evanlin/bwai-workshopgolang-line-oa-cloudfunction-geminipro-firebase-lu-xing-xiao-bang-shou-line-liao-tian-ji-qi-ren-23j9-temp-slug-2266421"&gt;the last time&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Article List:
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://dev.to/evanlin/bwai-workshopgolang-line-oa-cloudfunction-geminipro-firebase-lu-xing-xiao-bang-shou-line-liao-tian-ji-qi-ren-23j9-temp-slug-2266421"&gt;[BwAI workshop][Golang] LINE OA + CloudFunction + GeminiPro + Firebase = Travel Assistant LINE Chatbot (1): Scenery Recognition Assistant&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;[BwAI workshop][Golang] LINE OA + CloudFunction + GeminiPro + Firebase = Travel Assistant LINE Chatbot (2): Firebase Database gives LINEBot a super long memory&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  Preparation
&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://developers.line.biz/en/" rel="noopener noreferrer"&gt;LINE Developer Account&lt;/a&gt;&lt;/strong&gt;: You only need a LINE account to apply for a developer account.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://cloud.google.com/functions?hl=zh_cn" rel="noopener noreferrer"&gt;&lt;strong&gt;Google Cloud Functions&lt;/strong&gt;&lt;/a&gt;: The &lt;strong&gt;deployment platform&lt;/strong&gt; for Go code, generating the webhook address for LINEBot.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://firebase.google.com/" rel="noopener noreferrer"&gt;&lt;strong&gt;Firebase&lt;/strong&gt;&lt;/a&gt;: Create a &lt;strong&gt;Realtime database&lt;/strong&gt;, LINE Bot can remember your previous conversations and even answer many interesting questions.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://aistudio.google.com/" rel="noopener noreferrer"&gt;Google AI Studio&lt;/a&gt;&lt;/strong&gt;: You can get the Gemini Key here.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Applying for Firebase Database Service
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Remember to go to &lt;a href="https://console.firebase.google.com/" rel="noopener noreferrer"&gt;Firebase Console&lt;/a&gt; and create a project.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Create a Firebase Realtime Database, which will be used later&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Select the US region&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Start in “lock mode”&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;For ease of development, set it to read and write in “Rules”. Pay close attention:&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2umoeo1n59unkzyx7cf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fs2umoeo1n59unkzyx7cf.png" alt="image-20240413213202354" width="633" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remember the URL (Note! &lt;strong&gt;You need to change the permissions back when you go live&lt;/strong&gt;), and add an item: “ &lt;strong&gt;BwAI&lt;/strong&gt; ”&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fouj8ilfmmmoah9bt50u9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fouj8ilfmmmoah9bt50u9.png" alt="image-20240413213802313" width="800" height="448"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Applying for Services Account Credential to connect Cloud Function to Google services
&lt;/h2&gt;

&lt;p&gt;You can actually refer to another article of mine for this part of the tutorial. &lt;a href="https://www.evanlin.com/til-heroku-gcp-key/" rel="noopener noreferrer"&gt;[Learning Document] How to use Golang to access Google Cloud services on Heroku&lt;/a&gt;, but I'll quickly go through it here.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Enter Google Cloud Console, go to IAM &amp;amp; Admin and select Create Services Account&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq73jo7sv5juu64eome4o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq73jo7sv5juu64eome4o.png" alt="image-20240413221505536" width="444" height="382"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Decide on the Services Account Name yourself, pay attention (the project and Firebase &lt;strong&gt;project names must be consistent&lt;/strong&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ys6mumbx82iiwf69zq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr4ys6mumbx82iiwf69zq.png" alt="image-20240413222847247" width="651" height="684"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Grant this service account access to project. When setting the identity, it is recommended to use Editor first (it is larger and needs to be used with caution)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F541gygfqb74m3zava6ed.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F541gygfqb74m3zava6ed.png" alt="image-20240413223055288" width="800" height="468"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;“Grant users access to this service account” does not need to be specifically set&lt;/li&gt;
&lt;li&gt;Press “Manage Keys” to prepare to download Credential&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vgtbig2kdgql9sxhwt8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0vgtbig2kdgql9sxhwt8.png" alt="image-20240413223225404" width="800" height="296"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Select Add Key -&amp;gt; Create new Key -&amp;gt; Download JSON&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftiutm3is5eitmz93w6a9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftiutm3is5eitmz93w6a9.png" alt="image-20240413223613244" width="555" height="343"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Things to note when using Golang Google Options package:
&lt;/h2&gt;

&lt;p&gt;Although Firebase Realtime Database has been set to allow everyone to read and write, if you access it through Golang, you will get an Unauthorized request error message. This is because the Project of your JSON file is different from your Firebase Project. Just recreate a Services Account and update the JSON content.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fao9k6eymdu1cienk09hb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fao9k6eymdu1cienk09hb.png" alt="image-20240413220630196" width="800" height="62"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  How to import Services Account Credential in Google Cloud Function?
&lt;/h1&gt;

&lt;p&gt;Next, I will share how to correctly use it within Cloud Function. If you want to directly use Cloud Function to open the Credential JSON file, you will always get an error message that you cannot get the credential correctly.&lt;/p&gt;

&lt;p&gt;At this time, you need to add it through environment variables:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Copy all the content in the JSON file&lt;/li&gt;
&lt;li&gt;Set the &lt;code&gt;GOOGLE_APPLICATION_CREDENTIALS&lt;/code&gt; parameter, and then paste all the content into the environment parameter.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbhayrqqfametdn4v8w2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flbhayrqqfametdn4v8w2.png" alt="image-20240413225710980" width="247" height="315"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Next, I will tell you how to modify the relevant code?
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    // Init firebase related variables
    ctx := context.Background()
    opt := option.WithCredentialsJSON([]byte(os.Getenv("GOOGLE_APPLICATION_CREDENTIALS")))
    config := &amp;amp;firebase.Config{DatabaseURL: os.Getenv("FIREBASE_URL")}
    app, err := firebase.NewApp(ctx, config, opt)
    if err != nil {
        log.Fatalf("error initializing app: %v", err)
    }
    client, err := app.Database(ctx)
    if err != nil {
        log.Fatalf("error initializing database: %v", err)
    }

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;First, &lt;code&gt;option.WithCredentialsJSON([]byte(os.Getenv("GOOGLE_APPLICATION_CREDENTIALS")))&lt;/code&gt; allows you to read the credential from the environment variable.&lt;/li&gt;
&lt;li&gt;Next, &lt;code&gt;&amp;amp;firebase.Config{DatabaseURL: os.Getenv("FIREBASE_URL")}&lt;/code&gt; sets the FIREBASE_URL content.&lt;/li&gt;
&lt;li&gt;This can be executed correctly, and then we will look at the relevant processing of Gemini chat history.&lt;/li&gt;
&lt;/ul&gt;

&lt;h1&gt;
  
  
  How to correctly process Gemini Pro Chat History?
&lt;/h1&gt;

&lt;h1&gt;
  
  
  Full Source Code
&lt;/h1&gt;

&lt;p&gt;You can find the relevant open source code here: &lt;a href="https://github.com/kkdai/linebot-cf-firebase" rel="noopener noreferrer"&gt;https://github.com/kkdai/linebot-cf-firebase&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>gemini</category>
      <category>go</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
