Background
When developing a LINE Bot, I wanted to improve the plain text search function: allowing users to input any question, and the AI would automatically search the web for information and organize the answers, while also supporting continuous conversations. The traditional approach required connecting multiple APIs (Gemini to extract keywords → Google Custom Search → Gemini to summarize), which was not only slow (3 API calls) but also lacked conversation memory.
However, Google launched the Grounding with Google Search feature in 2024. This is the official RAG (Retrieval-Augmented Generation) solution, allowing the Gemini model to automatically search the web and cite sources, and natively supports Chat Session! This feature is provided through Vertex AI, so that AI responses are no longer based on imagination, but on real web information.
Screen Display
(Results using the old Google Custom Search)
You will find that the results are based on Google Search.
Main Repo https://github.com/kkdai/linebot-helper-python
Problems Encountered During Development
Problem 1: Bottlenecks of the Old Implementation
When implementing loader/searchtool.py, I used the traditional search process:
# ❌ Old method - 3 API calls
async def handle_text_message(event, user_id):
msg = event.message.text
# 1st: Extract keywords
keywords = extract_keywords_with_gemini(msg, api_key)
# 2nd: Google Custom Search
results = search_with_google_custom_search(keywords, search_api_key, cx)
# 3rd: Summarize results
summary = summarize_text(result_text, 300)
# Return results...
This method has several obvious problems:
❌ No conversation memory - Each time is a new conversation, unable to ask questions continuously
User: "What is Python?"
Bot: [Search results + Summary]
User: "What are its advantages?" # ❌ Bot doesn't know "it" refers to Python
❌ Shallow search results - Only using snippets, unable to deeply read the content of the webpage
❌ Slow and costly - 3 API calls (~6-8 seconds) + Google Custom Search fees ($0.005/time)
Problem 2: Client Closed Error
When I switched to Vertex AI Grounding, I encountered this error:
ERROR:loader.chat_session:Grounding search failed: Cannot send a request, as the client has been closed.
The reason was that I created a local client variable in the function:
# ❌ Incorrect method - client will be garbage collected
def get_or_create_session(self, user_id):
client = self._create_client() # Local variable
chat = client.chats.create(...)
return chat # The client is closed after the function ends!
When the function ends, client is garbage collected and closed, causing the chat session created based on it to be unusable.
Correct Solutions
1. Using Vertex AI Grounding with Google Search
Google Search Grounding is the official RAG solution provided by Vertex AI. Comparison with the old Custom Search:
| Feature | Old Version (Custom Search) | New Version (Grounding) |
|---|---|---|
| Number of API Calls | 3 times | 1 time |
| Response Speed | ~6-8 seconds | ~2-3 seconds |
| Conversation Memory | ❌ No | ✅ Native Support |
| Search Quality | ⭐⭐⭐ (snippet) | ⭐⭐⭐⭐⭐ (full webpage) |
| Source Citation | Only links | Full citation |
| Cost | Gemini + Custom Search | Only Vertex AI |
2. Create a Chat Session Manager
First, I created loader/chat_session.py to manage chat sessions:
from google import genai
from google.genai import types
from datetime import datetime, timedelta
from typing import Dict, Tuple, List
class ChatSessionManager:
def __init__ (self, session_timeout_minutes: int = 30):
self.sessions: Dict[str, dict] = {}
self.session_timeout = timedelta(minutes=session_timeout_minutes)
# ✅ Key: Create a shared client instance (avoid client closed error)
self.client = self._create_client()
def _create_client(self) -> genai.Client:
"""Create Vertex AI client"""
return genai.Client(
vertexai=True, # Enable Vertex AI
project=os.getenv('GOOGLE_CLOUD_PROJECT'),
location=os.getenv('GOOGLE_CLOUD_LOCATION', 'us-central1'),
http_options=types.HttpOptions(api_version="v1")
)
def get_or_create_session(self, user_id: str) -> Tuple[object, List[dict]]:
"""Get or create the user's chat session"""
now = datetime.now()
# Check existing session
if user_id in self.sessions:
session_data = self.sessions[user_id]
if not self._is_session_expired(session_data):
session_data['last_active'] = now
return session_data['chat'], session_data['history']
# Create a new session with Google Search Grounding
config = types.GenerateContentConfig(
temperature=0.7,
max_output_tokens=2048,
# ✅ Enable Google Search
tools=[types.Tool(google_search=types.GoogleSearch())],
)
# Use the shared self.client (will not be closed)
chat = self.client.chats.create(
model="gemini-2.0-flash",
config=config
)
self.sessions[user_id] = {
'chat': chat,
'last_active': now,
'history': [],
'created_at': now
}
return chat, []
Key Points of Repair:
- Shared Client -
self.clientis created in__init__ (), and the lifecycle is the same as ChatSessionManager - Automatic Expiration - The session automatically expires after 30 minutes
- Conversation Isolation - Each user's session is completely independent
3. Implement Search and Answer Functions
Next, implement the core function to search and answer using Grounding:
async def search_and_answer_with_grounding(
query: str,
user_id: str,
session_manager: ChatSessionManager
) -> dict:
"""Search and answer questions using Vertex AI Grounding"""
try:
# Get or create chat session
chat, history = session_manager.get_or_create_session(user_id)
# Build prompt (Traditional Chinese + not using markdown)
prompt = f"""Please answer the following question in Traditional Chinese using Taiwanese terminology.
If you need the latest information, please search the web and provide accurate answers.
Please provide detailed and useful answers and ensure that the source of information is reliable.
Please do not use markdown format (do not use symbols such as **, ##, -, etc.). Use plain text to answer.
Question: {query}"""
# Send message (Gemini will automatically decide whether to search)
response = chat.send_message(prompt)
# Record to history
session_manager.add_to_history(user_id, "user", query)
session_manager.add_to_history(user_id, "assistant", response.text)
# Extract source citations
sources = []
if hasattr(response, 'candidates') and response.candidates:
candidate = response.candidates[0]
if hasattr(candidate, 'grounding_metadata'):
metadata = candidate.grounding_metadata
if hasattr(metadata, 'grounding_chunks'):
for chunk in metadata.grounding_chunks:
if hasattr(chunk, 'web'):
sources.append({
'title': chunk.web.title,
'uri': chunk.web.uri
})
return {
'answer': response.text,
'sources': sources,
'has_history': len(history) > 0
}
except Exception as e:
logger.error(f"Grounding search failed: {e}")
raise
Key Features:
- ✅ Gemini automatically determines when to search
- ✅ Read the full webpage content (not just snippets)
- ✅ Automatically extract source citations
- ✅ Support continuous conversations (remember context)
4. Integrate into main.py
Integrate the Grounding function in main.py:
from loader.chat_session import (
ChatSessionManager,
search_and_answer_with_grounding,
format_grounding_response,
get_session_status_message
)
# Initialize Session Manager
chat_session_manager = ChatSessionManager(session_timeout_minutes=30)
async def handle_text_message(event: MessageEvent, user_id: str):
"""Handle plain text messages - using Grounding"""
msg = event.message.text.strip()
# Special instructions
if msg.lower() in ['/clear', '/清除']:
chat_session_manager.clear_session(user_id)
reply_msg = TextSendMessage(text="✅ Conversation has been reset")
await line_bot_api.reply_message(event.reply_token, [reply_msg])
return
if msg.lower() in ['/status', '/狀態']:
status_text = get_session_status_message(chat_session_manager, user_id)
reply_msg = TextSendMessage(text=status_text)
await line_bot_api.reply_message(event.reply_token, [reply_msg])
return
# Use Grounding to search and answer
try:
result = await search_and_answer_with_grounding(
query=msg,
user_id=user_id,
session_manager=chat_session_manager
)
response_text = format_grounding_response(result, include_sources=True)
reply_msg = TextSendMessage(text=response_text)
await line_bot_api.reply_message(event.reply_token, [reply_msg])
except Exception as e:
logger.error(f"Error in Grounding search: {e}", exc_info=True)
error_text = "❌ Sorry, an error occurred while processing your question. Please try again later."
reply_msg = TextSendMessage(text=error_text)
await line_bot_api.reply_message(event.reply_token, [reply_msg])
Practical Application Examples
The implemented function is very powerful and can conduct intelligent conversations:
Example 1: Basic Question and Answer
User: What is Python?
Bot: Python is a high-level, interpreted programming language created by Guido van Rossum in 1991...
📚 Reference Source:
1. Python Official Website
https://www.python.org/
Example 2: Continuous Conversation (Conversation Memory)
User: What is Python?
Bot: [Answer...]
User: What are its advantages? ✅ Bot knows "it" = Python
Bot: 💬 [In conversation]
The main advantages of Python include:
1. Concise and readable syntax
2. Rich standard library
...
Example 3: Latest Information Search
User: Latest earthquake news in Japan
Bot: According to the latest information, Japan on December 2025...
[Gemini automatically searches the web and organizes the latest information]
📚 Reference Source:
1. Central Weather Bureau
2. NHK News
Use Cases
These application scenarios are especially suitable for:
- 💬 Intelligent Customer Service - Automatically search for the latest product information
- 📰 News Assistant - Track the latest current affairs
- 🎓 Learning Assistant - Answer questions and provide reliable sources
- 🔍 Research Assistant - Quickly search and organize information
Environment Setup
Required Environment Variables
# Vertex AI settings (required)
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1" # Optional, defaults to us-central1
# Authentication method (choose one)
# Method 1: Use ADC (development environment)
gcloud auth application-default login
# Method 2: Use Service Account (production environment)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com
No Longer Needed Environment Variables
Due to the switch to Grounding, the following environment variables are no longer needed:
# ❌ Not needed anymore
# SEARCH_API_KEY=...
# SEARCH_ENGINE_ID=...
This simplifies the configuration and also saves on the Google Custom Search API fees!
Code Cleanup
Remove Old searchtool Code
Since Grounding is already in use, I performed code cleanup:
- main.py - Remove searchtool import
# ❌ Removed
# from loader.searchtool import search_from_text
# search_api_key = os.getenv('SEARCH_API_KEY')
# search_engine_id = os.getenv('SEARCH_ENGINE_ID')
# ✅ Added
logger.info('Text search using Vertex AI Grounding with Google Search')
- loader/searchtool.py - Marked as DEPRECATED
"""
⚠️ DEPRECATED: This module is no longer used in the main application.
The text search functionality has been replaced by Vertex AI Grounding
with Google Search, which provides better quality results and native
conversation memory.
This file is kept for reference or as a fallback option.
"""
- .env.example and README.md - Remove Custom Search environment variable description
Cleanup Results
| Item | Before Cleanup | After Cleanup |
|---|---|---|
| Required Environment Variables | 4 | 2 |
| API Calls | 3 times | 1 time |
| Code Complexity | High | Low |
| Maintenance Cost | High | Low |
Supported Model List
Currently supported Gemini models for Google Search Grounding:
- ✅ Gemini 3.0 Pro (Preview) (Powerful)
- ✅ Gemini 2.5 Pro
- ✅ Gemini 2.5 Flash
- ✅ Gemini 2.0 Flash (Recommended)
- ✅ Gemini 2.5 Flash with Live API
- ❌ Gemini 2.0 Flash-Lite (Does not support Grounding)
Performance Improvement
Speed Comparison
| Metric | Old Version | New Version | Improvement |
|---|---|---|---|
| Number of API Calls | 3 times | 1 time | ⬇️ 66% |
| Response Time | ~6-8 seconds | ~2-3 seconds | ⬇️ 60% |
| Search Quality | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⬆️ Significantly improved |
Cost Analysis
Old Version Cost (per question and answer):
1. extract_keywords_with_gemini() → Gemini API
2. Google Custom Search → $0.005
3. summarize_text() → Gemini API
─────────
Total: Gemini + $0.005
New Version Cost (per question and answer):
1. Grounding with Google Search → Vertex AI
─────────
Total: Only Vertex AI
✅ Save Custom Search API fees ✅ Faster response speed ✅ Higher search quality
Things to Note Currently
1. Must Use Vertex AI
The Google Search Grounding feature does not support the general Gemini Developer API and must be accessed through Vertex AI.
2. Authentication Settings
- Development environment: Use
gcloud auth application-default login - Production environment: Use Service Account and set
GOOGLE_APPLICATION_CREDENTIALS
3. Supported Models
Ensure that you are using a model that supports Grounding (e.g., gemini-2.0-flash or above), and avoid using the -lite version.
4. Client Lifecycle
Be sure to create a shared client instance in __init__ () to avoid the "client closed" error.
5. Prompt Optimization
In the prompt, clearly indicate:
- Use Traditional Chinese
- Do not use markdown format (if plain text is needed)
- Provide reliable sources
Development Experience
1. Grounding is a Game Changer
From the traditional "keyword extraction → API search → result summary" process to using Grounding's "one API call to complete everything," this transformation brings not only technical simplification but also a qualitative change in user experience:
Technical Aspects:
- ✅ Code reduction of 70% (from 3 functions to 1)
- ✅ API call reduction of 66% (from 3 times to 1 time)
- ✅ Response time reduction of 60% (from 6-8 seconds to 2-3 seconds)
User Experience:
- ✅ Support continuous conversations (finally able to understand what "it" refers to!)
- ✅ Automatic source citation (increase credibility)
- ✅ More in-depth information (full webpage vs. short snippet)
2. Client Lifecycle Management is Important
The "client closed" error I encountered initially taught me that: When using the google-genai SDK, the client should be a long-lived object, not a new one created every time.
# ❌ Error: client will be garbage collected
def create_session():
client = genai.Client(...)
chat = client.chats.create(...)
return chat # client is closed, chat cannot be used
# ✅ Correct: Shared client instance
class Manager:
def __init__ (self):
self.client = genai.Client(...) # Create only once
def create_session(self):
return self.client.chats.create(...) # Reuse
This lesson applies to all SDKs that need to manage long connections.
3. RAG Doesn't Necessarily Need to Be Implemented by Yourself
In the past, we needed to implement RAG (Retrieval-Augmented Generation) ourselves:
- Use embedding to create a vector database
- Implement similarity search
- Inject the search results into the prompt
- Manage the context window
But Google Search Grounding has already done all of this for us! It:
- ✅ Automatically determines when to search
- ✅ Uses Google's search engine (much better than what we can do ourselves)
- ✅ Reads the full webpage and extracts important information
- ✅ Automatically cites sources
Conclusion: If your RAG requirement is "search for web information," just use Grounding, don't reinvent the wheel.
4. Session Management is Simpler Than I Thought
When implementing conversation memory, I originally thought I would need:
- Redis persistence
- Complex context management
- Manually maintain conversation history
But in reality, the Gemini Chat API natively supports multi-turn conversations! All you need is:
chat = client.chats.create(...)
chat.send_message("Question 1") # Round 1
chat.send_message("Question 2") # Round 2 (automatically remembers Round 1)
All I need to do is:
- Store the chat object in memory
- Regularly clear expired sessions
- Provide the /clear instruction
Simple, efficient, and reliable!
5. The Importance of Prompt Optimization
The initial responses contained a lot of markdown formatting (**bold**, ## title), which was not aesthetically pleasing when displayed on LINE. Just add a line to the prompt:
prompt = f"""...
Please do not use markdown format (do not use symbols such as **, ##, -, etc.). Use plain text to answer.
Question: {query}"""
And the problem was solved! This made me realize: Good prompt design is as important as good code.
6. Learning from Failure
During this development process, I experienced:
- ❌ Using Custom Search → Found it too slow, too shallow
- ✅ Switching to Grounding → But encountered the client closed error
- ✅ Fixing the client lifecycle → Found the markdown format problem
- ✅ Optimizing the prompt → Perfect!
Every problem is an opportunity to learn. If I had succeeded from the start, I wouldn't have learned so much about SDK design, lifecycle management, and prompt engineering.
Summary
If you are developing an AI application that requires search functionality:
- ✅ Prioritize Grounding - Much simpler than implementing RAG yourself
- ✅ Pay attention to Client Lifecycle - Avoid unnecessary repeated creation
- ✅ Make good use of Chat Session - Native conversation memory is very powerful
- ✅ Invest in Prompt Optimization - Small changes bring big improvements
Google Search Grounding is definitely worth a try!
Testing Steps
1. Start the Application
# Confirm that the environment variables have been set
export GOOGLE_CLOUD_PROJECT=your-project-id
# Restart the application
uvicorn main:app --reload
2. Test Basic Functions
Test in LINE:
Send: What is Python?
Expected: ✅ Receive a detailed answer + source
Send: What are its advantages?
Expected: ✅ See the "💬 [In conversation]" mark, Bot knows "it" = Python
Send: /status
Expected: ✅ Display conversation status
Send: /clear
Expected: ✅ Display "Conversation has been reset"
3. Check the Logs
Should see:
INFO:main:Text search using Vertex AI Grounding with Google Search
INFO:loader.chat_session:Creating new session for user ...
INFO:loader.chat_session:Sending message to Grounding API ...
Should not see:
ERROR:loader.chat_session:Grounding search failed: Cannot send a request, as the client has been closed.
Related Documents
Detailed technical documentation in the project:
- TEXT_SEARCH_IMPROVEMENT.md - Complete solution analysis and comparison
- GROUNDING_IMPLEMENTATION.md - Implementation guide and acceptance checklist
- CLIENT_CLOSED_FIX.md - Client lifecycle error fix
- SEARCHTOOL_CLEANUP.md - Code cleanup summary


Top comments (0)