Background
When developing a LINE Bot, I wanted to improve the plain text search function: allowing users to input any question, and the AI could automatically search the internet for information and organize the answers, while also supporting continuous conversations. The traditional method required connecting multiple APIs (Gemini to extract keywords → Google Custom Search → Gemini to summarize), which was not only slow (3 API calls) but also lacked conversation memory.
However, Google launched the Grounding with Google Search feature in 2024, which is the official RAG (Retrieval-Augmented Generation) solution, allowing the Gemini model to automatically search the internet and cite sources, and natively supports Chat Session! This feature is provided through Vertex AI, so that AI responses are no longer based on imagination, but on real internet information.
Screen Display
(The results using the old Google Custom Search)
You will find that the results are based on Google Search.
Main Repo https://github.com/kkdai/linebot-helper-python
Problems Encountered During Development
Problem 1: Bottlenecks in the Old Implementation
When implementing loader/searchtool.py, I used the traditional search process:
# ❌ Old method - 3 API calls
async def handle_text_message(event, user_id):
msg = event.message.text
# 1st: Extract keywords
keywords = extract_keywords_with_gemini(msg, api_key)
# 2nd: Google Custom Search
results = search_with_google_custom_search(keywords, search_api_key, cx)
# 3rd: Summarize results
summary = summarize_text(result_text, 300)
# Return results...
This method has several obvious problems:
❌ No conversation memory - Each time is a new conversation, unable to ask questions continuously
User: "What is Python?"
Bot: [Search results + Summary]
User: "What are its advantages?" # ❌ Bot doesn't know "it" refers to Python
❌ Shallow search results - Only using snippets, unable to deeply read the content of the webpage
❌ Slow and costly - 3 API calls (~6-8 seconds) + Google Custom Search fees ($0.005/time)
Problem 2: Client Closed Error
When I switched to Vertex AI Grounding, I encountered this error:
ERROR:loader.chat_session:Grounding search failed: Cannot send a request, as the client has been closed.
The reason was that I created a local client variable in the function:
# ❌ Incorrect method - client will be garbage collected
def get_or_create_session(self, user_id):
client = self._create_client() # Local variable
chat = client.chats.create(...)
return chat # The client is closed after the function ends!
After the function ends, client is garbage collected and closed, causing the chat session created based on it to be unusable.
Correct Solution
1. Using Vertex AI Grounding with Google Search
Google Search Grounding is the official RAG solution provided by Vertex AI, compared to the old Custom Search:
| Feature | Old Version (Custom Search) | New Version (Grounding) |
|---|---|---|
| Number of API Calls | 3 times | 1 time |
| Response Speed | ~6-8 seconds | ~2-3 seconds |
| Conversation Memory | ❌ No | ✅ Native support |
| Search Quality | ⭐⭐⭐ (snippet) | ⭐⭐⭐⭐⭐ (complete webpage) |
| Source Citation | Only links | Complete citation |
| Cost | Gemini + Custom Search | Only Vertex AI |
2. Create a Chat Session Manager
First, I created loader/chat_session.py to manage chat sessions:
from google import genai
from google.genai import types
from datetime import datetime, timedelta
from typing import Dict, Tuple, List
class ChatSessionManager:
def __init__ (self, session_timeout_minutes: int = 30):
self.sessions: Dict[str, dict] = {}
self.session_timeout = timedelta(minutes=session_timeout_minutes)
# ✅ Key: Create a shared client instance (avoid client closed error)
self.client = self._create_client()
def _create_client(self) -> genai.Client:
"""Create a Vertex AI client"""
return genai.Client(
vertexai=True, # Enable Vertex AI
project=os.getenv('GOOGLE_CLOUD_PROJECT'),
location=os.getenv('GOOGLE_CLOUD_LOCATION', 'us-central1'),
http_options=types.HttpOptions(api_version="v1")
)
def get_or_create_session(self, user_id: str) -> Tuple[object, List[dict]]:
"""Get or create the user's chat session"""
now = datetime.now()
# Check existing session
if user_id in self.sessions:
session_data = self.sessions[user_id]
if not self._is_session_expired(session_data):
session_data['last_active'] = now
return session_data['chat'], session_data['history']
# Create a new session with Google Search Grounding
config = types.GenerateContentConfig(
temperature=0.7,
max_output_tokens=2048,
# ✅ Enable Google Search
tools=[types.Tool(google_search=types.GoogleSearch())],
)
# Use the shared self.client (will not be closed)
chat = self.client.chats.create(
model="gemini-2.0-flash",
config=config
)
self.sessions[user_id] = {
'chat': chat,
'last_active': now,
'history': [],
'created_at': now
}
return chat, []
Key points for repair:
- Shared Client -
self.clientis created in__init__ (), and the lifecycle is the same as ChatSessionManager - Automatic Expiration - The session automatically expires after 30 minutes
- Conversation Isolation - Each user's session is completely independent
3. Implement Search and Answer Functions
Then implement the core function to search and answer using Grounding:
async def search_and_answer_with_grounding(
query: str,
user_id: str,
session_manager: ChatSessionManager
) -> dict:
"""Search and answer questions using Vertex AI Grounding"""
try:
# Get or create chat session
chat, history = session_manager.get_or_create_session(user_id)
# Build prompt (Traditional Chinese + not using markdown)
prompt = f"""Please answer the following questions in Traditional Chinese using Taiwanese terminology.
If you need the latest information, please search the internet and provide accurate answers.
Please provide detailed and useful answers, and ensure that the information sources are reliable.
Please do not use markdown format (do not use symbols such as **, ##, -, etc.). Use plain text to answer.
Question: {query}"""
# Send message (Gemini will automatically decide whether to search)
response = chat.send_message(prompt)
# Record to history
session_manager.add_to_history(user_id, "user", query)
session_manager.add_to_history(user_id, "assistant", response.text)
# Extract source citations
sources = []
if hasattr(response, 'candidates') and response.candidates:
candidate = response.candidates[0]
if hasattr(candidate, 'grounding_metadata'):
metadata = candidate.grounding_metadata
if hasattr(metadata, 'grounding_chunks'):
for chunk in metadata.grounding_chunks:
if hasattr(chunk, 'web'):
sources.append({
'title': chunk.web.title,
'uri': chunk.web.uri
})
return {
'answer': response.text,
'sources': sources,
'has_history': len(history) > 0
}
except Exception as e:
logger.error(f"Grounding search failed: {e}")
raise
Key features:
- ✅ Gemini automatically determines when to search
- ✅ Read the complete webpage content (not just snippets)
- ✅ Automatically extract source citations
- ✅ Support continuous conversations (remember context)
4. Integrate into main.py
Integrate the Grounding function in main.py:
from loader.chat_session import (
ChatSessionManager,
search_and_answer_with_grounding,
format_grounding_response,
get_session_status_message
)
# Initialize Session Manager
chat_session_manager = ChatSessionManager(session_timeout_minutes=30)
async def handle_text_message(event: MessageEvent, user_id: str):
"""Handle plain text messages - using Grounding"""
msg = event.message.text.strip()
# Special instructions
if msg.lower() in ['/clear', '/清除']:
chat_session_manager.clear_session(user_id)
reply_msg = TextSendMessage(text="✅ Conversation has been reset")
await line_bot_api.reply_message(event.reply_token, [reply_msg])
return
if msg.lower() in ['/status', '/狀態']:
status_text = get_session_status_message(chat_session_manager, user_id)
reply_msg = TextSendMessage(text=status_text)
await line_bot_api.reply_message(event.reply_token, [reply_msg])
return
# Use Grounding to search and answer
try:
result = await search_and_answer_with_grounding(
query=msg,
user_id=user_id,
session_manager=chat_session_manager
)
response_text = format_grounding_response(result, include_sources=True)
reply_msg = TextSendMessage(text=response_text)
await line_bot_api.reply_message(event.reply_token, [reply_msg])
except Exception as e:
logger.error(f"Error in Grounding search: {e}", exc_info=True)
error_text = "❌ Sorry, an error occurred while processing your question. Please try again later."
reply_msg = TextSendMessage(text=error_text)
await line_bot_api.reply_message(event.reply_token, [reply_msg])
Practical Application Examples
The implemented function is very powerful and can perform intelligent conversations:
Example 1: Basic Question and Answer
User: What is Python?
Bot: Python is a high-level, interpreted programming language created by Guido van Rossum in 1991...
📚 Reference source:
1. Python official website
https://www.python.org/
Example 2: Continuous Conversation (Conversation Memory)
User: What is Python?
Bot: [Answer...]
User: What are its advantages? ✅ Bot knows "it" = Python
Bot: 💬 [In conversation]
The main advantages of Python include:
1. Concise and readable syntax
2. Rich standard library
...
Example 3: Latest Information Search
User: Latest earthquake news in Japan
Bot: According to the latest information, Japan on December 2025...
[Gemini automatically searches the internet and organizes the latest information]
📚 Reference source:
1. Central Weather Bureau
2. NHK News
Usage Scenarios
These application scenarios are particularly suitable for:
- 💬 Intelligent Customer Service - Automatically search for the latest product information
- 📰 News Assistant - Track the latest current affairs
- 🎓 Learning Assistant - Answer questions and provide reliable sources
- 🔍 Research Assistant - Quickly search and organize information
Environment Setup
Required Environment Variables
# Vertex AI settings (required)
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1" # Optional, defaults to us-central1
# Authentication method (choose one)
# Method 1: Use ADC (development environment)
gcloud auth application-default login
# Method 2: Use Service Account (production environment)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"
# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com
Environment Variables No Longer Needed
Due to the switch to Grounding, the following environment variables are no longer needed:
# ❌ No longer needed
# SEARCH_API_KEY=...
# SEARCH_ENGINE_ID=...
This simplifies the configuration and also saves the cost of the Google Custom Search API!
Code Cleanup
Remove Old searchtool Code
Since Grounding is already in use, I performed code cleanup:
- main.py - Remove searchtool import
# ❌ Removed
# from loader.searchtool import search_from_text
# search_api_key = os.getenv('SEARCH_API_KEY')
# search_engine_id = os.getenv('SEARCH_ENGINE_ID')
# ✅ Added
logger.info('Text search using Vertex AI Grounding with Google Search')
- loader/searchtool.py - Marked as DEPRECATED
"""
⚠️ DEPRECATED: This module is no longer used in the main application.
The text search functionality has been replaced by Vertex AI Grounding
with Google Search, which provides better quality results and native
conversation memory.
This file is kept for reference or as a fallback option.
"""
- .env.example and README.md - Remove Custom Search environment variable description
Cleanup Results
| Item | Before Cleanup | After Cleanup |
|---|---|---|
| Required Environment Variables | 4 | 2 |
| API Calls | 3 times | 1 time |
| Code Complexity | High | Low |
| Maintenance Cost | High | Low |
Supported Model List
Currently supported Gemini models for Google Search Grounding:
- ✅ Gemini 3.0 Pro (Preview) (Powerful)
- ✅ Gemini 2.5 Pro
- ✅ Gemini 2.5 Flash
- ✅ Gemini 2.0 Flash (Recommended)
- ✅ Gemini 2.5 Flash with Live API
- ❌ Gemini 2.0 Flash-Lite (Does not support Grounding)
Performance Improvement
Speed Comparison
| Metric | Old Version | New Version | Improvement |
|---|---|---|---|
| Number of API Calls | 3 times | 1 time | ⬇️ 66% |
| Response Time | ~6-8 seconds | ~2-3 seconds | ⬇️ 60% |
| Search Quality | ⭐⭐⭐ | ⭐⭐⭐⭐⭐ | ⬆️ Significantly improved |
Cost Analysis
Old Version Cost (per question and answer):
1. extract_keywords_with_gemini() → Gemini API
2. Google Custom Search → $0.005
3. summarize_text() → Gemini API
─────────
Total: Gemini + $0.005
New Version Cost (per question and answer):
1. Grounding with Google Search → Vertex AI
─────────
Total: Only Vertex AI
✅ Save the Custom Search API cost ✅ Faster response speed ✅ Higher search quality
Things to Note Currently
1. Must Use Vertex AI
The Google Search Grounding feature does not support the general Gemini Developer API and must be accessed through Vertex AI.
2. Authentication Settings
- Development environment: Use
gcloud auth application-default login - Production environment: Use Service Account and set
GOOGLE_APPLICATION_CREDENTIALS
3. Supported Models
Make sure to use a model that supports Grounding (such as gemini-2.0-flash or above), and avoid using the -lite version.
4. Client Lifecycle
Be sure to create a shared client instance in __init__ () to avoid the "client closed" error.
5. Prompt Optimization
In the prompt, clearly indicate:
- Use Traditional Chinese
- Do not use markdown format (if plain text is needed)
- Provide reliable sources
Development Experience
1. Grounding is a Game Changer
From the traditional "keyword extraction → API search → result summary" process to using Grounding's "one API call to complete everything", this transformation brings not only technical simplification, but also a qualitative change in user experience:
Technical Level:
- ✅ Code volume reduced by 70% (from 3 functions to 1)
- ✅ API calls reduced by 66% (from 3 times to 1 time)
- ✅ Response time shortened by 60% (from 6-8 seconds to 2-3 seconds)
User Experience:
- ✅ Support continuous conversations (finally able to understand what "it" refers to!)
- ✅ Automatically cite sources (increase credibility)
- ✅ More in-depth information (complete webpage vs. short snippet)
2. Client Lifecycle Management is Important
The "client closed" error I encountered initially taught me: When using the google-genai SDK, the client should be a long-lived object, not a new one created every time.
# ❌ Error: client will be garbage collected
def create_session():
client = genai.Client(...)
chat = client.chats.create(...)
return chat # client is closed, chat cannot be used
# ✅ Correct: Shared client instance
class Manager:
def __init__ (self):
self.client = genai.Client(...) # Create only once
def create_session(self):
return self.client.chats.create(...) # Reuse
This lesson applies to all SDKs that need to manage long connections.
3. RAG Doesn't Have to Be Implemented by Yourself
In the past, we needed to implement RAG (Retrieval-Augmented Generation) ourselves:
- Use embedding to create a vector database
- Implement similarity search
- Inject the search results into the prompt
- Manage the context window
But Google Search Grounding has already done all of this for us! It:
- ✅ Automatically determines when to search
- ✅ Uses Google's search engine (much better than doing it ourselves)
- ✅ Reads the complete webpage and extracts important information
- ✅ Automatically cites sources
Conclusion: If your RAG requirement is "search for internet information", just use Grounding, don't reinvent the wheel.
4. Session Management is Simpler Than You Think
When implementing conversation memory, I originally thought I needed:
- Redis persistence
- Complex context management
- Manually maintain conversation history
But in reality, the Gemini Chat API natively supports multi-turn conversations! Just need:
chat = client.chats.create(...)
chat.send_message("Question 1") # Round 1
chat.send_message("Question 2") # Round 2 (automatically remembers Round 1)
I only need to do:
- Store the chat object in memory
- Regularly clear expired sessions
- Provide the /clear command
Simple, efficient, and reliable!
5. The Importance of Prompt Optimization
The initial responses contained a lot of markdown format (**bold**, ## title), which is not aesthetically pleasing when displayed on LINE. Just add a line in the prompt:
prompt = f"""...
Please do not use markdown format (do not use symbols such as **, ##, -, etc.). Use plain text to answer.
Question: {query}"""
The problem is solved! This made me realize: Good prompt design is as important as good code.
6. Learn from Failure
During this development process, I experienced:
- ❌ Using Custom Search → Found it too slow and shallow
- ✅ Switching to Grounding → But encountered the client closed error
- ✅ Fixing the client lifecycle → Found the markdown format problem
- ✅ Optimizing the prompt → Perfect!
Each problem is an opportunity to learn. If I had succeeded from the start, I wouldn't have learned so much about SDK design, lifecycle management, and prompt engineering.
Summary
If you are developing an AI application that requires search functionality:
- ✅ Prioritize Grounding - Much simpler than implementing RAG yourself
- ✅ Pay attention to Client Lifecycle - Avoid unnecessary repeated creation
- ✅ Make good use of Chat Session - Native conversation memory is very powerful
- ✅ Invest in Prompt Optimization - Small changes bring big improvements
Google Search Grounding is definitely worth a try!
Test Steps
1. Start the Application
# Confirm that the environment variables have been set
export GOOGLE_CLOUD_PROJECT=your-project-id
# Restart the application
uvicorn main:app --reload
2. Test Basic Functions
Test in LINE:
Send: What is Python?
Expected: ✅ Receive detailed answer + source
Send: What are its advantages?
Expected: ✅ See "💬 [In conversation]" mark, Bot knows "it" = Python
Send: /status
Expected: ✅ Display conversation status
Send: /clear
Expected: ✅ Display "Conversation has been reset"
3. Check the Logs
Should see:
INFO:main:Text search using Vertex AI Grounding with Google Search
INFO:loader.chat_session:Creating new session for user ...
INFO:loader.chat_session:Sending message to Grounding API ...
Should not see:
ERROR:loader.chat_session:Grounding search failed: Cannot send a request, as the client has been closed.
Related Documents
Detailed technical documentation in the project:
- TEXT_SEARCH_IMPROVEMENT.md - Complete solution analysis and comparison
- GROUNDING_IMPLEMENTATION.md - Implementation guide and acceptance checklist
- CLIENT_CLOSED_FIX.md - Client lifecycle error repair
- SEARCHTOOL_CLEANUP.md - Code cleanup summary


Top comments (0)