Evan Lin

Posted on Jan 24 • Originally published at evanlin.com on Jan 24

[Gemini 3.0][Google Search] Building a News and Information Assistant with Google Search Grounding API and Gemini 3.0 Pro

#ai #api #gemini #rag

Background

When developing a LINE Bot, I wanted to improve the plain text search function: allowing users to input any question, and the AI would automatically search the web for information and organize the answers, while also supporting continuous conversations. The traditional approach required connecting multiple APIs (Gemini to extract keywords → Google Custom Search → Gemini to summarize), which was not only slow (3 API calls) but also lacked conversation memory.

However, Google launched the Grounding with Google Search feature in 2024. This is the official RAG (Retrieval-Augmented Generation) solution, allowing the Gemini model to automatically search the web and cite sources, and natively supports Chat Session! This feature is provided through Vertex AI, so that AI responses are no longer based on imagination, but on real web information.

Screen Display

(Results using the old Google Custom Search)

You will find that the results are based on Google Search.

Main Repo https://github.com/kkdai/linebot-helper-python

Problems Encountered During Development

Problem 1: Bottlenecks of the Old Implementation

When implementing loader/searchtool.py, I used the traditional search process:

# ❌ Old method - 3 API calls
async def handle_text_message(event, user_id):
    msg = event.message.text

    # 1st: Extract keywords
    keywords = extract_keywords_with_gemini(msg, api_key)

    # 2nd: Google Custom Search
    results = search_with_google_custom_search(keywords, search_api_key, cx)

    # 3rd: Summarize results
    summary = summarize_text(result_text, 300)

    # Return results...

This method has several obvious problems:

❌ No conversation memory - Each time is a new conversation, unable to ask questions continuously

User: "What is Python?"
Bot: [Search results + Summary]

User: "What are its advantages?" # ❌ Bot doesn't know "it" refers to Python

❌ Shallow search results - Only using snippets, unable to deeply read the content of the webpage

❌ Slow and costly - 3 API calls (~6-8 seconds) + Google Custom Search fees ($0.005/time)

Problem 2: Client Closed Error

When I switched to Vertex AI Grounding, I encountered this error:

ERROR:loader.chat_session:Grounding search failed: Cannot send a request, as the client has been closed.

The reason was that I created a local client variable in the function:

# ❌ Incorrect method - client will be garbage collected
def get_or_create_session(self, user_id):
    client = self._create_client() # Local variable
    chat = client.chats.create(...)
    return chat # The client is closed after the function ends!

When the function ends, client is garbage collected and closed, causing the chat session created based on it to be unusable.

Correct Solutions

1. Using Vertex AI Grounding with Google Search

Google Search Grounding is the official RAG solution provided by Vertex AI. Comparison with the old Custom Search:

Feature	Old Version (Custom Search)	New Version (Grounding)
Number of API Calls	3 times	1 time
Response Speed	~6-8 seconds	~2-3 seconds
Conversation Memory	❌ No	✅ Native Support
Search Quality	⭐⭐⭐ (snippet)	⭐⭐⭐⭐⭐ (full webpage)
Source Citation	Only links	Full citation
Cost	Gemini + Custom Search	Only Vertex AI

2. Create a Chat Session Manager

First, I created loader/chat_session.py to manage chat sessions:

from google import genai
from google.genai import types
from datetime import datetime, timedelta
from typing import Dict, Tuple, List

class ChatSessionManager:
    def __init__ (self, session_timeout_minutes: int = 30):
        self.sessions: Dict[str, dict] = {}
        self.session_timeout = timedelta(minutes=session_timeout_minutes)

        # ✅ Key: Create a shared client instance (avoid client closed error)
        self.client = self._create_client()

    def _create_client(self) -> genai.Client:
        """Create Vertex AI client"""
        return genai.Client(
            vertexai=True, # Enable Vertex AI
            project=os.getenv('GOOGLE_CLOUD_PROJECT'),
            location=os.getenv('GOOGLE_CLOUD_LOCATION', 'us-central1'),
            http_options=types.HttpOptions(api_version="v1")
        )

    def get_or_create_session(self, user_id: str) -> Tuple[object, List[dict]]:
        """Get or create the user's chat session"""
        now = datetime.now()

        # Check existing session
        if user_id in self.sessions:
            session_data = self.sessions[user_id]
            if not self._is_session_expired(session_data):
                session_data['last_active'] = now
                return session_data['chat'], session_data['history']

        # Create a new session with Google Search Grounding
        config = types.GenerateContentConfig(
            temperature=0.7,
            max_output_tokens=2048,
            # ✅ Enable Google Search
            tools=[types.Tool(google_search=types.GoogleSearch())],
        )

        # Use the shared self.client (will not be closed)
        chat = self.client.chats.create(
            model="gemini-2.0-flash",
            config=config
        )

        self.sessions[user_id] = {
            'chat': chat,
            'last_active': now,
            'history': [],
            'created_at': now
        }

        return chat, []

Key Points of Repair:

Shared Client - self.client is created in __init__ (), and the lifecycle is the same as ChatSessionManager
Automatic Expiration - The session automatically expires after 30 minutes
Conversation Isolation - Each user's session is completely independent

3. Implement Search and Answer Functions

Next, implement the core function to search and answer using Grounding:

async def search_and_answer_with_grounding(
    query: str,
    user_id: str,
    session_manager: ChatSessionManager
) -> dict:
    """Search and answer questions using Vertex AI Grounding"""
    try:
        # Get or create chat session
        chat, history = session_manager.get_or_create_session(user_id)

        # Build prompt (Traditional Chinese + not using markdown)
        prompt = f"""Please answer the following question in Traditional Chinese using Taiwanese terminology.
If you need the latest information, please search the web and provide accurate answers.
Please provide detailed and useful answers and ensure that the source of information is reliable.
Please do not use markdown format (do not use symbols such as **, ##, -, etc.). Use plain text to answer.

Question: {query}"""

        # Send message (Gemini will automatically decide whether to search)
        response = chat.send_message(prompt)

        # Record to history
        session_manager.add_to_history(user_id, "user", query)
        session_manager.add_to_history(user_id, "assistant", response.text)

        # Extract source citations
        sources = []
        if hasattr(response, 'candidates') and response.candidates:
            candidate = response.candidates[0]
            if hasattr(candidate, 'grounding_metadata'):
                metadata = candidate.grounding_metadata
                if hasattr(metadata, 'grounding_chunks'):
                    for chunk in metadata.grounding_chunks:
                        if hasattr(chunk, 'web'):
                            sources.append({
                                'title': chunk.web.title,
                                'uri': chunk.web.uri
                            })

        return {
            'answer': response.text,
            'sources': sources,
            'has_history': len(history) > 0
        }

    except Exception as e:
        logger.error(f"Grounding search failed: {e}")
        raise

Key Features:

✅ Gemini automatically determines when to search
✅ Read the full webpage content (not just snippets)
✅ Automatically extract source citations
✅ Support continuous conversations (remember context)

4. Integrate into main.py

Integrate the Grounding function in main.py:

from loader.chat_session import (
    ChatSessionManager,
    search_and_answer_with_grounding,
    format_grounding_response,
    get_session_status_message
)

# Initialize Session Manager
chat_session_manager = ChatSessionManager(session_timeout_minutes=30)

async def handle_text_message(event: MessageEvent, user_id: str):
    """Handle plain text messages - using Grounding"""
    msg = event.message.text.strip()

    # Special instructions
    if msg.lower() in ['/clear', '/清除']:
        chat_session_manager.clear_session(user_id)
        reply_msg = TextSendMessage(text="✅ Conversation has been reset")
        await line_bot_api.reply_message(event.reply_token, [reply_msg])
        return

    if msg.lower() in ['/status', '/狀態']:
        status_text = get_session_status_message(chat_session_manager, user_id)
        reply_msg = TextSendMessage(text=status_text)
        await line_bot_api.reply_message(event.reply_token, [reply_msg])
        return

    # Use Grounding to search and answer
    try:
        result = await search_and_answer_with_grounding(
            query=msg,
            user_id=user_id,
            session_manager=chat_session_manager
        )

        response_text = format_grounding_response(result, include_sources=True)
        reply_msg = TextSendMessage(text=response_text)
        await line_bot_api.reply_message(event.reply_token, [reply_msg])

    except Exception as e:
        logger.error(f"Error in Grounding search: {e}", exc_info=True)
        error_text = "❌ Sorry, an error occurred while processing your question. Please try again later."
        reply_msg = TextSendMessage(text=error_text)
        await line_bot_api.reply_message(event.reply_token, [reply_msg])

Practical Application Examples

The implemented function is very powerful and can conduct intelligent conversations:

Example 1: Basic Question and Answer

User: What is Python?
Bot: Python is a high-level, interpreted programming language created by Guido van Rossum in 1991...

     📚 Reference Source:
     1. Python Official Website
        https://www.python.org/

Example 2: Continuous Conversation (Conversation Memory)

User: What is Python?
Bot: [Answer...]

User: What are its advantages? ✅ Bot knows "it" = Python
Bot: 💬 [In conversation]

     The main advantages of Python include:
     1. Concise and readable syntax
     2. Rich standard library
     ...

Example 3: Latest Information Search

User: Latest earthquake news in Japan
Bot: According to the latest information, Japan on December 2025...
     [Gemini automatically searches the web and organizes the latest information]

     📚 Reference Source:
     1. Central Weather Bureau
     2. NHK News

Use Cases

These application scenarios are especially suitable for:

💬 Intelligent Customer Service - Automatically search for the latest product information
📰 News Assistant - Track the latest current affairs
🎓 Learning Assistant - Answer questions and provide reliable sources
🔍 Research Assistant - Quickly search and organize information

Environment Setup

Required Environment Variables

# Vertex AI settings (required)
export GOOGLE_CLOUD_PROJECT="your-project-id"
export GOOGLE_CLOUD_LOCATION="us-central1" # Optional, defaults to us-central1

# Authentication method (choose one)
# Method 1: Use ADC (development environment)
gcloud auth application-default login

# Method 2: Use Service Account (production environment)
export GOOGLE_APPLICATION_CREDENTIALS="/path/to/service-account-key.json"

# Enable Vertex AI API
gcloud services enable aiplatform.googleapis.com

No Longer Needed Environment Variables

Due to the switch to Grounding, the following environment variables are no longer needed:

# ❌ Not needed anymore
# SEARCH_API_KEY=...
# SEARCH_ENGINE_ID=...

This simplifies the configuration and also saves on the Google Custom Search API fees!

Code Cleanup

Remove Old searchtool Code

Since Grounding is already in use, I performed code cleanup:

main.py - Remove searchtool import

# ❌ Removed
# from loader.searchtool import search_from_text
# search_api_key = os.getenv('SEARCH_API_KEY')
# search_engine_id = os.getenv('SEARCH_ENGINE_ID')

# ✅ Added
logger.info('Text search using Vertex AI Grounding with Google Search')

loader/searchtool.py - Marked as DEPRECATED

"""
⚠️ DEPRECATED: This module is no longer used in the main application.

The text search functionality has been replaced by Vertex AI Grounding
with Google Search, which provides better quality results and native
conversation memory.

This file is kept for reference or as a fallback option.
"""

.env.example and README.md - Remove Custom Search environment variable description

Cleanup Results

Item	Before Cleanup	After Cleanup
Required Environment Variables	4	2
API Calls	3 times	1 time
Code Complexity	High	Low
Maintenance Cost	High	Low

Supported Model List

Currently supported Gemini models for Google Search Grounding:

✅ Gemini 3.0 Pro (Preview) (Powerful)
✅ Gemini 2.5 Pro
✅ Gemini 2.5 Flash
✅ Gemini 2.0 Flash (Recommended)
✅ Gemini 2.5 Flash with Live API
❌ Gemini 2.0 Flash-Lite (Does not support Grounding)

Performance Improvement

Speed Comparison

Metric	Old Version	New Version	Improvement
Number of API Calls	3 times	1 time	⬇️ 66%
Response Time	~6-8 seconds	~2-3 seconds	⬇️ 60%
Search Quality	⭐⭐⭐	⭐⭐⭐⭐⭐	⬆️ Significantly improved

Cost Analysis

Old Version Cost (per question and answer):

1. extract_keywords_with_gemini() → Gemini API
2. Google Custom Search → $0.005
3. summarize_text() → Gemini API
                                    ─────────
                                    Total: Gemini + $0.005

New Version Cost (per question and answer):

1. Grounding with Google Search → Vertex AI
                                    ─────────
                                    Total: Only Vertex AI

✅ Save Custom Search API fees ✅ Faster response speed ✅ Higher search quality

Things to Note Currently

1. Must Use Vertex AI

The Google Search Grounding feature does not support the general Gemini Developer API and must be accessed through Vertex AI.

2. Authentication Settings

Development environment: Use gcloud auth application-default login
Production environment: Use Service Account and set GOOGLE_APPLICATION_CREDENTIALS

3. Supported Models

Ensure that you are using a model that supports Grounding (e.g., gemini-2.0-flash or above), and avoid using the -lite version.

4. Client Lifecycle

Be sure to create a shared client instance in __init__ () to avoid the "client closed" error.

5. Prompt Optimization

In the prompt, clearly indicate:

Use Traditional Chinese
Do not use markdown format (if plain text is needed)
Provide reliable sources

Development Experience

1. Grounding is a Game Changer

From the traditional "keyword extraction → API search → result summary" process to using Grounding's "one API call to complete everything," this transformation brings not only technical simplification but also a qualitative change in user experience:

Technical Aspects:

✅ Code reduction of 70% (from 3 functions to 1)
✅ API call reduction of 66% (from 3 times to 1 time)
✅ Response time reduction of 60% (from 6-8 seconds to 2-3 seconds)

User Experience:

✅ Support continuous conversations (finally able to understand what "it" refers to!)
✅ Automatic source citation (increase credibility)
✅ More in-depth information (full webpage vs. short snippet)

2. Client Lifecycle Management is Important

The "client closed" error I encountered initially taught me that: When using the google-genai SDK, the client should be a long-lived object, not a new one created every time.

# ❌ Error: client will be garbage collected
def create_session():
    client = genai.Client(...)
    chat = client.chats.create(...)
    return chat # client is closed, chat cannot be used

# ✅ Correct: Shared client instance
class Manager:
    def __init__ (self):
        self.client = genai.Client(...) # Create only once

    def create_session(self):
        return self.client.chats.create(...) # Reuse

This lesson applies to all SDKs that need to manage long connections.

3. RAG Doesn't Necessarily Need to Be Implemented by Yourself

In the past, we needed to implement RAG (Retrieval-Augmented Generation) ourselves:

Use embedding to create a vector database
Implement similarity search
Inject the search results into the prompt
Manage the context window

But Google Search Grounding has already done all of this for us! It:

✅ Automatically determines when to search
✅ Uses Google's search engine (much better than what we can do ourselves)
✅ Reads the full webpage and extracts important information
✅ Automatically cites sources

Conclusion: If your RAG requirement is "search for web information," just use Grounding, don't reinvent the wheel.

4. Session Management is Simpler Than I Thought

When implementing conversation memory, I originally thought I would need:

Redis persistence
Complex context management
Manually maintain conversation history

But in reality, the Gemini Chat API natively supports multi-turn conversations! All you need is:

chat = client.chats.create(...)
chat.send_message("Question 1") # Round 1
chat.send_message("Question 2") # Round 2 (automatically remembers Round 1)

All I need to do is:

Store the chat object in memory
Regularly clear expired sessions
Provide the /clear instruction

Simple, efficient, and reliable!

5. The Importance of Prompt Optimization

The initial responses contained a lot of markdown formatting (**bold**, ## title), which was not aesthetically pleasing when displayed on LINE. Just add a line to the prompt:

prompt = f"""...
Please do not use markdown format (do not use symbols such as **, ##, -, etc.). Use plain text to answer.
Question: {query}"""

And the problem was solved! This made me realize: Good prompt design is as important as good code.

6. Learning from Failure

During this development process, I experienced:

❌ Using Custom Search → Found it too slow, too shallow
✅ Switching to Grounding → But encountered the client closed error
✅ Fixing the client lifecycle → Found the markdown format problem
✅ Optimizing the prompt → Perfect!

Every problem is an opportunity to learn. If I had succeeded from the start, I wouldn't have learned so much about SDK design, lifecycle management, and prompt engineering.

Summary

If you are developing an AI application that requires search functionality:

✅ Prioritize Grounding - Much simpler than implementing RAG yourself
✅ Pay attention to Client Lifecycle - Avoid unnecessary repeated creation
✅ Make good use of Chat Session - Native conversation memory is very powerful
✅ Invest in Prompt Optimization - Small changes bring big improvements

Google Search Grounding is definitely worth a try!

Testing Steps

1. Start the Application

# Confirm that the environment variables have been set
export GOOGLE_CLOUD_PROJECT=your-project-id

# Restart the application
uvicorn main:app --reload

2. Test Basic Functions

Test in LINE:

Send: What is Python?
Expected: ✅ Receive a detailed answer + source

Send: What are its advantages?
Expected: ✅ See the "💬 [In conversation]" mark, Bot knows "it" = Python

Send: /status
Expected: ✅ Display conversation status

Send: /clear
Expected: ✅ Display "Conversation has been reset"

3. Check the Logs

Should see:

INFO:main:Text search using Vertex AI Grounding with Google Search
INFO:loader.chat_session:Creating new session for user ...
INFO:loader.chat_session:Sending message to Grounding API ...

Should not see:

ERROR:loader.chat_session:Grounding search failed: Cannot send a request, as the client has been closed.