Generative AI has shifted from simple chat interfaces to complex, autonomous agents that can reason, plan, and—most importantly—access private data. While Large Language Models (LLMs) like Gemini are incredibly capable, they are limited by their knowledge cutoff and lack of access to your specific business data.
This is where Retrieval-Augmented Generation (RAG) comes in. RAG allows an LLM to retrieve relevant information from a trusted data source before generating a response. However, building a RAG pipeline from scratch—handling vector databases, embeddings, chunking, and ranking—can be a daunting task.
In this tutorial, we will use Vertex AI Agent Builder to create a production-ready RAG agent in minutes. We will connect a Gemini-powered agent to a private data store and expose it via a Python-based interface.
What You Will Build
You will build a "Technical Support Agent" capable of answering complex questions about a specific product documentation set. Unlike a standard chatbot, this agent will:
- Search through a private repository of PDF/HTML documents.
- Ground its answers in the retrieved data to prevent hallucinations.
- Provide citations so users can verify the information.
What You Will Learn
- How to set up a Google Cloud Project for AI development.
- How to create and manage Data Stores in Vertex AI Search.
- How to configure a Gemini-powered chat application.
- How to interact with your agent programmatically using the Python SDK.
- Best practices for grounding and response quality.
Prerequisites
- A Google Cloud Platform (GCP) account with billing enabled.
- Basic knowledge of Python.
- Access to the Google Cloud Console.
- The
gcloudCLI installed and authenticated (optional but recommended).
The Learning Journey
Before we dive into the code, let's visualize the steps we will take to transform raw data into a functional AI agent.
Step 1: Project Setup and API Configuration
To begin, you need a GCP project. Vertex AI Agent Builder is a managed service that orchestrates several underlying APIs, including Discovery Engine and Vertex AI.
- Go to the Google Cloud Console.
- Create a new project named
gemini-rag-agent. - Open the Cloud Shell or your local terminal and enable the necessary APIs:
gcloud services enable discoveryengine.googleapis.com \
storage.googleapis.com \
aiplatform.googleapis.com
Why this is necessary:
-
discoveryengine.googleapis.com: Powers the search and conversation capabilities. -
storage.googleapis.com: Hosts your raw documents. -
aiplatform.googleapis.com: Provides access to the Gemini models.
Step 2: Prepare Your Data Source
Vertex AI Agent Builder supports multiple data sources, including Google Cloud Storage (GCS), BigQuery, and even public website URLs. For this tutorial, we will use GCS with a collection of PDF documents.
- Create a GCS bucket:
export BUCKET_NAME="your-unique-bucket-name"
gsutil mb gs://$BUCKET_NAME
- Upload your technical documentation (PDF or JSONL files) to the bucket. If you don't have files ready, you can use a public sample:
gsutil cp gs://cloud-samples-data/gen-app-builder/search/alphabet-investor-pdfs/*.pdf gs://$BUCKET_NAME/
Note on Data Formats: For structured data, use JSONL where each line represents a document. For unstructured data, PDFs and HTML files work best as the service automatically handles text extraction and chunking.
Step 3: Create a Data Store
A Data Store is the heart of your RAG system. It indexes your files, creates vector embeddings, and prepares them for retrieval.
- In the GCP Console, navigate to Vertex AI Search and Conversation.
- Click Data Stores in the left menu and then Create Data Store.
- Select Cloud Storage as the source.
- Point it to the bucket you created (e.g.,
gs://your-unique-bucket-name/*). - Choose Unstructured Data as the data type.
- Give your data store a name, such as
tech-docs-store, and click Create.
Indexing may take several minutes depending on the volume of data. Vertex AI is busy under the hood creating an inverted index and a vector index for semantic search.
Step 4: Create the Gemini Chat Application
Now that our data is indexed, we need to create the interface that uses Gemini to reason over that data.
- In the console, click Apps > Create App.
- Select Chat as the app type.
- Enter a name (e.g.,
Technical-Support-Agent) and company name. - Click Connect Data Store and select the
tech-docs-storeyou created in the previous step. - Click Create.
Step 5: Configure Grounding and the Gemini Model
Once the app is created, we must configure how the LLM interacts with the data. This is where we ensure the agent doesn't "make things up."
- Go to the Configurations tab of your new app.
- Under Model, select
gemini-1.5-flashorgemini-1.5-pro. Flash is faster and cheaper, while Pro is better for complex reasoning. - In the System Instructions, provide a persona: > "You are a helpful technical support assistant. You only answer questions based on the provided documentation. If the answer is not in the documentation, politely state that you do not know."
- Ensure Grounding is enabled. This forces the model to check the search results from your Data Store before responding.
The Interaction Flow
The following sequence diagram illustrates how a user request flows through the components we just configured.
Step 6: Programmatic Access via Python
While the Google Cloud Console provides a "Preview" tab to test your agent, most developers will want to integrate this into their own applications. We will use the google-cloud-discoveryengine library.
First, install the library:
pip install google-cloud-discoveryengine
Now, use the following Python script to query your agent. Replace the placeholders with your actual Project ID and Data Store ID.
from google.cloud import discoveryengine_v1beta as discoveryengine
def query_agent(project_id, location, data_store_id, user_query):
# Initialize the client
client = discoveryengine.ConversationalSearchServiceClient()
# The full resource name of the search engine serving config
serving_config = client.serving_config_path(
project=project_id,
location=location,
data_store=data_store_id,
serving_config="default_config",
)
# Initialize a conversation session
chat_session = discoveryengine.Conversation()
# Build the request
request = discoveryengine.ConverseConversationRequest(
name=serving_config,
query=discoveryengine.TextInput(input=user_query),
serving_config=serving_config,
summary_spec=discoveryengine.ConverseConversationRequest.SummarySpec(
summary_result_count=3,
include_citations=True,
)
)
# Execute the request
response = client.converse_conversation(request=request)
print(f"Answer: {response.reply.summary.summary_text}")
print("\nCitations:")
for context in response.reply.summary.safety_attributes:
print(f"- {context}")
# Configuration Constants
PROJECT_ID = "your-project-id"
LOCATION = "global"
DATA_STORE_ID = "your-data-store-id"
query_agent(PROJECT_ID, LOCATION, DATA_STORE_ID, "What is the revenue for 2023?")
What this code does:
-
Client Setup: It connects to the
discoveryengineservice. - Serving Config: It points to the specific configuration of your app.
- Conversational Request: It sends the user query and specifically asks for a summary with citations.
- Handling Output: It prints the grounded answer and the source references.
Understanding the User Journey
To ensure our agent is effective, we must consider the user's experience. A successful RAG agent provides transparency and trust.
Best Practices for Gemini Agents
- Data Quality: Your agent is only as good as your data. Ensure your PDFs are high-quality and text-selectable. If using images, ensure OCR is enabled.
- Prompt Engineering: Use the "System Instructions" to define the tone and constraints. For example, tell the agent to use bullet points for technical steps.
- Chunking Strategies: While Vertex AI Agent Builder handles chunking automatically, for very complex documents, you might want to pre-process data into smaller JSONL objects to provide more granular context.
- Safety Settings: Gemini has built-in safety filters. Adjust these in the Vertex AI console if your domain-specific language (e.g., medical or legal) is being incorrectly flagged.
Performance and Scaling
When deploying a RAG agent, consider the latency. The retrieval step adds a small amount of time to the request.
- O(1) Retrieval: Basic keyword search is fast but lacks context.
- O(log n) Retrieval: Vector search scales efficiently even with millions of documents.
- Gemini 1.5 Flash: Use this model if you need sub-second response times for simpler queries.
Conclusion
Building a RAG-enabled agent used to require a team of data engineers and weeks of infrastructure setup. With Vertex AI Agent Builder, the process is streamlined into a few steps: indexing data, configuring the Gemini model, and connecting the two.
This setup allows you to focus on the "Agentic" part of your application—designing how the agent should behave and what problems it should solve—rather than the plumbing of vector databases.
Next Steps
-
Try Multi-Turn Conversations: Modify the Python code to maintain state by passing the
conversation_idback in subsequent requests. - Add Tool Use: Explore how Gemini can call external APIs (like a weather API or your own database) to supplement the RAG data.
- Grounding with Google Search: Combine your private data with public web data for a truly comprehensive knowledge base.
Further Reading & Resources
- Vertex AI Agent Builder Documentation - The official guide for creating search and conversation apps on GCP.
- Google Cloud Discovery Engine API Reference - Detailed API specifications for programmatic integration.
- Generative AI on Vertex AI: Best Practices - A guide on prompt engineering and model selection for Gemini.
- Retrieval Augmented Generation (RAG) Explained - The original research context and technical background behind RAG architectures.
- Python SDK for Google Cloud Discovery Engine - Package details and version history for the Python client library.



Top comments (0)