Wayne Gakuo

Posted on Mar 10

Mastering Retrieval-Augmented Generation with Gemini API's File Search Tool

#ai #angular #rag #gemini

I first heard about Retrieval-Augmented Generation (RAG) when I was trying to get an AI agent to provide answers to questions pertaining to internal documents and not necessarily on the data it was trained on. Initially, I imagined it as how one would provide sensitive information only to authorized personnel or users; think company-level kind of access control or NDA-bound information. The goal is to ensure that the AI agent doesn't hallucinate.

So what exactly is this RAG thing that people are just catching up to? Let's dive in.

What is RAG?

Retrieval-Augmented Generation (RAG) is a technique used to give Large Language Models (LLMS) access to data that they weren't originally trained on.
For example, imagine an LLM as a smart student who has read every book in the world up until 2025. If you ask them about a private company memo written yesterday, they won't know the answer. RAG is like giving that student a library card and a specific book to look up before they answer your question.

For the neuroscience nerds, here's a visual of how a typical human brain works when it comes to storing and retrieving of information. Have this mental model as it will come in handy as you go through this article.

A visual representation of how human brains store and retrieve information; generated by Gemini's Nano Bana 2

How does RAG work?

RAG operates with a few main steps to help enhance generative AI outputs:

Retrieval & pre-processing: RAGs leverage advanced search algorithms to query external data sources like databases, web pages, and knowledge bases. The information retrieved is then pre-processed, involving steps such as tokenization, stemming, and the removal of stop words.
Grounded generation: We then feed the pre-processed information into the pre-trained LLM, enriching its context and deepening its understanding of the subject. This enhanced context allows the LLM to produce more accurate, informative, and engaging responses.

The Traditional RAG Process

Before tools like File Search existed, developers had to build "the nitty-gritty" part of RAG themselves. This involved several complex steps:

Challenges of Traditional RAG

Chunking Logic: How do you split a document such as a PDF? By paragraph? By page? If you cut a sentence in half, the meaning is lost.
Embeddings: You must convert text into high-dimensional vectors (numbers) using an embedding model.
Vector Databases: You need to manage a separate database (like Cloud Firestore) to store and search these vectors.
Maintenance: Keeping the database in sync with your files is a constant chore.

File Search

File Search Tool is a fully managed RAG system built directly into the Gemini API that abstracts away the retrieval pipeline so you can focus on building great experiences. It handles the chunking, embedding, and vector storage for you. You simply upload a file, and Gemini takes care of the rest.

File Search is simple and affordable for developers, whereby storage and embedding generation at query time is FREE of charge. You only pay for the creation of embeddings when you first index your files, at a fixed rate of $0.15 per million tokens, making the File Search Tool both easier and cost-effective.

When you use File Search, you don't worry about "vectors" or "chunks." You interact with Files and Stores.

How it Works in our Demo app

To further demonstrate how this works, I built a simple web app dubbed Onboard HQ, which is a centralized, AI-powered "Command Center" that is designed to transform a company's fragmented internal documentation into an interactive, high-velocity knowledge base.
The app is built using Angular, Genkit, Firebase and File Search (through the Google GenAI SDK).

Step 1: Ingesting Documents ("Upload")

We define a flow to upload files to a fileSearchStore. Notice how we don't manually calculate embeddings; we just send the blob to Google's infrastructure.

// From functions/src/index.ts
export const _uploadToFileSearchStoreLogic = ai.defineFlow(
  {
    name: 'uploadToFileSearchStore',
    inputSchema: z.object({
      fileData: z.string(), // Base64 encoded file
      mimeType: z.string(),
      displayName: z.string().optional(),
    }),
    // ...
  },
  async ({fileData, mimeType, displayName}) => {
    const genAI = new GoogleGenAI({});

    // ... (logic to get/create store name)

    const blob = new Blob([Buffer.from(fileData, 'base64')], { type: mimeType });

    // This single call handles chunking and indexing!
    let operation = await genAI.fileSearchStores.uploadToFileSearchStore({
      file: blob,
      fileSearchStoreName: googleFileSearchStoreName,
      config: {
        displayName: displayName || 'uploaded-file',
        mimeType: mimeType,
      }
    });

    // We wait for the background indexing to complete
    while (!operation.done) {
      await new Promise((resolve) => setTimeout(resolve, 5000));
      const operationResult = await genAI.operations.get({ operation });
      operation = operationResult as any;
    }

    return { success: true };
  }
);

Step 2: Querying ("Retrieval")

When the user asks a question, we don't manually search the database. We tell the Gemini model that it has a tool called fileSearch available.

// From functions/src/index.ts
export const _chatWithFileSearchLogic = ai.defineFlow(
  // ... input: question, history
  async({question, history}) => {
    // We retrieve the store name where our files are indexed
    const storeSnap = await db.collection('obhqFileSearchStores').doc('obhqknowledge').get();
    const googleFileSearchStoreName = storeSnap.data()?.googleFileSearchStoreName;

    const response = await ai.generate({
      system: CHAT_WITH_FILE_SEARCH_SYSTEM_PROMPT,
      messages: [
        ...toGenkitMessages(history ?? []),
        {role: 'user', content: [{text: question}]},
      ],
      config: {
        tools: [
          {
            fileSearch: {
              fileSearchStoreNames: [googleFileSearchStoreName]
            }
          }
        ]
      }
    });

    return response.text;
  }
);

File Search vs. Traditional RAG: A Comparison

Feature	Traditional RAG	File Search
Setup Time	Days/Weeks	Minutes
Infrastructure	Vector DB + Embedding Model + App	Managed by Google
Chunking	Manual configuration	Automatic/AI-optimized
Scalability	You manage shards/indexes	Managed automatically
Developer Focus	Pipelines and Databases	User Experience and Prompts

Summary

Building a RAG pipeline traditionally involved complex processes like data ingestion, chunking, embedding creation, vector database setup, and prompt context injection. Google's new File Search tool automates these tasks, providing a fully managed, end-to-end RAG solution directly within the Gemini API via the ai.generate call.

By using File Search, we've offloaded the heavy lifting of data engineering to the model provider. This allows us to focus on building a better onboarding experience for Onboard HQ users, rather than debugging vector similarity thresholds.

There is a lot more about the File Search Tool that I haven't covered in this article, which I recommend you read more about here, for example, how to use the File Metadata and Citations for fact-checking and verification.

RAG is no longer about the "nitty-gritty" of data pipelines—it's about providing the right information to the right user at the right time.

Check out the demo app below to see File Search in action and the code on GitHub:

Onboard HQ demo app: https://onboarding-hq.web.app/
GitHub Repo: https://github.com/waynegakuo/onboardhq

Example Questions

Try asking these common onboarding and policy questions in the Onboard HQ app:

"What is our policy on hybrid and remote work?"
"How do I set up my corporate VPN and email?"
"Tell me about the company's culture"

DEV Community