Gao Dalie (Ilyass)

Posted on Nov 18

RAG Will Never Be the Same After Gemini File Search Tool

#machinelearning #ai #programming #datascience

Last week I heard bad news, and life hit me hard again. Moments like that remind me how fragile everything is — how one day we all leave, and even love can feel temporary.

In the middle of all this, I saw a post on X saying Gemini’s File Search Tool makes RAG super easy and is being offered at a really reasonable cost. I don’t know why, but something about it pushed me to try it

Google announced the File Search Tool, a fully managed search Augmentation generation (RAG) system built directly into the Gemini API.

Previously, to build a RAG, you had to choose a vector database, develop a chunking strategy, call an embedding model, and tie everything together. The file search tool handles all of that automatically behind the API.

These were major barriers for companies wanting to introduce AI, but with the introduction of the File Search Tool, these mechanisms can now be completed within the Gemini API.

Developers can simply upload files and use standard API calls to generate answers based on their own data. clearly indicating which part of which file the AI agent referenced when generating an answer. This helps prevent hallucination, a common problem with generative AI.

The File Search Tool helps developers build file search and ingestion pipelines in a simple, integrated, and flexible way to enhance Gemini answers with their own data. Storing files and generating embeddings at the time they are created for free, with a one-time fee only for the initial indexing of files.

So, let me give you a quick demo of a live chatbot to show you what I mean.

Check a video

During my development, one paper drew my attention. AI is increasingly involved across industries, influencing science and beyond. I will upload the Ocean AI PDF.

I will ask the chatbot a question: “What is Ocean AI, and why is Ocean AI different from OpenAI?” If you take a look at how the chatbot generates the output, you’ll see that the agent first saves my uploaded PDF to a temporary file, then creates a unique FileSearchStore with a random ID.

The agent uploads the PDF into this store and waits while Gemini breaks down the document into chunks and builds a searchable index — this is A wait_operation function that checks every 2 seconds until indexing finishes.

When I type my question and hit enter, query_file_search sends it to the Gemini api along with the store name. Gemini automatically searches through the indexed PDF chunks, finds the relevant sections about Ocean AI and how it differs from OpenAI, uses those chunks as context, and generates an answer using the selected model.

The response includes the answer text plus grounding metadata showing exactly which parts of the PDF were used, so when I click "View Sources", I can see the citations proving where the information came from. When I'm done, clicking "Clear PDF" deletes the entire store and cleans up all the data.

What makes File Search Tools?

The Gemini API File Search Tool consolidates these complex processes into a single, fully automated API callgenerateContent , allowing developers to leverage file search functionality within their existing APIs, eliminating the complex setup and management work previously required.

Unlike traditional keyword-based searches, the File Search Tool understands the meaning and context of your query and can find relevant information even if exact word matches are not used.

This is achieved through powerful vector search, leveraging the latest Gemini Embedding model.

Even more noteworthy is the implementation of auto-citation, which automatically includes citations to the specific documents used to generate the answer, greatly simplifying verification and fact-checking and making it much more useful for businesses.

Current limitations and expected improvements

The File Search Tool currently has some limitations. The most important limitation is the limited ability to adjust the number of chunks retrieved. During testing, we confirmed advanced configuration options such as metadata filters, but we hope that future enhancements will allow for more detailed control of the number of chunks.

There is also room for improvement in the accuracy of image recognition. Currently, it is possible to extract text from images, but it is not yet at the level of understanding the structure and relationships of diagrams. In particular, it can be difficult to extract meaningful information from documents written in Markdown format or with complex layouts.

File size limitations are also a consideration. Each file is limited to a maximum of 100MB, and the file search store size for the entire project is limited to 1GB-1TB, depending on the user tier. These limitations may affect practicality for large enterprises.

Differences from OpenAI/Anthropic

Currently, OpenAI’s Retrieval API and Anthropic’s File Contexts are well-known examples of RAG implementations. These systems use external storage to reference documents, but they require developers to build and manage a vector database, making them difficult to implement.

On the other hand, the File Search Tool completely automates this part and is done entirely within the Gemini API. The table below compares the three major RAG solutions.

As can be seen from this comparison, the File Search Tool is superior in terms of both development burden and operational costs, and is particularly suitable for prototype development and experimental use by individual developers.

In addition, the Gemini Embedding model provided by Google provides high search accuracy and is also a major attraction in that it can accurately extract information with similar meanings.

Let’s start coding :

Before we dive into our application, we will create an ideal environment for the code to work. For this, we need to install the necessary Python libraries.

pip install requirements

The next step is the usual one: We will import the relevant libraries, the significance of which will become evident as we proceed and perform some basic configuration.

import streamlit as st
import os
import time
import random
import string
import tempfile
from pathlib import Path
from PyPDF2 import PdfReader
from google import genai
from google.genai import types
from dotenv import load_dotenv

I designed these helper functions to handle the main tasks of the app. First, I created get_text(key, lang='en') to get translated text - it just looks up a word or phrase in a translation dictionary, defaults to English if the language doesn't exist, and returns the original key if nothing is found.

Then I built generate_random_id(length=8) to make random IDs for naming stores - it randomly picks 8 characters from letters and numbers and combines them into a string.

I developed wait_operation(client, op, sleep_sec=2, max_wait_sec=300) to wait for background operations to finish - it keeps checking every 2 seconds if the operation is done by calling the API, and if it takes longer than 5 minutes, it stops waiting and throws an error so the app doesn't hang forever.

Next, I made extract_text_from_pdf(pdf_file, lang='en') to pull text out of PDF files - it opens the PDF, goes through each page one by one, grabs the text from each page, adds it all together with line breaks, and returns the complete text.

I wrapped this in error handling, so if the PDF is broken or can't be read, it shows an error message to the user instead of crashing.

def get_text(key, lang='en'):
    """Get translated text for the given key and language"""
    return TRANSLATIONS.get(lang, TRANSLATIONS['en']).get(key, key)

# Helper Functions
def generate_random_id(length=8):
    """Generate a random ID for store naming"""
    return ''.join(random.choices(string.ascii_lowercase + string.digits, k=length))

def wait_operation(client, op, sleep_sec=2, max_wait_sec=300):
    """Wait for Operations API to complete with timeout"""
    start = time.time()
    while not op.done:
        if time.time() - start > max_wait_sec:
            raise TimeoutError("Operation timed out.")
        time.sleep(sleep_sec)
        op = client.operations.get(op)
    return op

def extract_text_from_pdf(pdf_file, lang='en'):
    """Extract text content from uploaded PDF file"""
    try:
        pdf_reader = PdfReader(pdf_file)
        text = ""
        for page in pdf_reader.pages:
            text += page.extract_text() + "\n"
        return text
    except Exception as e:
        st.error(get_text('error_pdf_extract', lang).format(e))
        return None

Building on these utilities, I created three more functions to handle file management and storage. The save_uploaded_file(uploaded_file, lang='en') function takes care of saving uploaded files temporarily - it creates a temporary file that won't auto-delete, adds a .pdf extension to it, writes the uploaded file's content into it using getvalue(), and returns the file path so we can use it later with the other functions.

Next, create_file_search_store(client, store_name, lang='en') sets up a new storage space using the random ID from generate_random_id - It calls the API to create a file search store with a custom display name, returns the store object if successful, or shows an error get_text and returns None if it fails.

The last function, upload_file_to_store(client, file_path, store_name, display_name, lang='en'), actually uploads files into the store - it sends the file to the specified store using the API, adds some metadata like the source being "streamlit_upload" and a timestamp of when it was uploaded, then waits for the upload to complete using my wait_operation function from earlier, and returns the response once it's done.

def save_uploaded_file(uploaded_file, lang='en'):
    """Save uploaded file to temporary location"""
    try:
        with tempfile.NamedTemporaryFile(delete=False, suffix='.pdf') as tmp_file:
            tmp_file.write(uploaded_file.getvalue())
            return tmp_file.name
    except Exception as e:
        st.error(get_text('error_save_file', lang).format(e))
        return None

def create_file_search_store(client, store_name, lang='en'):
    """Create a new File Search Store"""
    try:
        store = client.file_search_stores.create(
            config={'display_name': store_name}
        )
        return store
    except Exception as e:
        st.error(get_text('error_create_store', lang).format(e))
        return None

def upload_file_to_store(client, file_path, store_name, display_name, lang='en'):
    """Upload file to File Search Store"""
    try:
        upload_op = client.file_search_stores.upload_to_file_search_store(
            file=file_path,
            file_search_store_name=store_name,
            config={
                'display_name': display_name,
                'custom_metadata': [
                    {"key": "source", "string_value": "streamlit_upload"},
                    {"key": "timestamp", "numeric_value": int(time.time())}
                ]
            }
        )
        upload_op = wait_operation(client, upload_op)
        return upload_op.response
    except Exception as e:
        st.error(get_text('error_upload_store', lang).format(e))
        return None

I built two final functions that let users interact with the uploaded files and clean up afterwards. The query_file_search(client, question, store_name, model, lang='en') function is where the magic happens - it takes a user's question and searches through the files in the store by calling the AI model with special file search tools configured.

It passes the question to the model along with a reference to the store name we created earlier, and the model automatically searches through all the uploaded files to find relevant information and generate an answer.

Like the other functions, it uses get_text for error messages and returns None if something goes wrong. After the user is done working with their files, cleanup_store(client, store_name, lang='en') handles the cleanup - it deletes the entire file search store, including all uploaded files, by calling the delete API with the force: True flag to make sure everything gets removed, returns True if successful or False If it fails, and shows an error message using the translation helper if anything breaks during deletion.

def query_file_search(client, question, store_name, model, lang='en'):
    """Query the File Search Store with a question"""
    try:
        response = client.models.generate_content(
            model=model,
            contents=question,
            config=types.GenerateContentConfig(
                tools=[
                    types.Tool(
                        file_search=types.FileSearch(
                            file_search_store_names=[store_name]
                        )
                    )
                ]
            )
        )
        return response
    except Exception as e:
        st.error(get_text('error_query', lang).format(e))
        return None

def cleanup_store(client, store_name, lang='en'):
    """Delete the File Search Store"""
    try:
        client.file_search_stores.delete(
            name=store_name,
            config={'force': True}
        )
        return True
    except Exception as e:
        st.error(get_text('error_cleanup', lang).format(e))
        return False

Conclusion :

The file search tool puts the advanced technology of RAG within the reach of all developers, not just a select few experts. This is truly the “democratisation of RAG.”

from worries about complex infrastructure and costs, developers can focus on developing more creative applications that directly address user challenges.

Combining your unique data with Gemini’s powerful intelligence will create new business value that was previously impossible. Let’s use this new tool to create the applications of the future!

Reference:

I would highly appreciate it if you