David Mezzetti for NeuML

Posted on Dec 12, 2025 • Edited on Dec 14, 2025 • Originally published at neuml.hashnode.dev

RAG is more than Vector Search

#ai #llm #rag #vectordatabase

Retrieval Augmented Generation (RAG) is often associated with vector search. And while that is a primary use case, any search will do.

✅ Vector Search
✅ Web Search
✅ SQL Query

This article will go over a few RAG examples covering different retrieval methods. These examples require txtai 9.3+.

Install dependencies

Install txtai and all dependencies.

pip install txtai[pipeline-data]

# Download example SQL database
wget https://huggingface.co/NeuML/txtai-wikipedia-slim/resolve/main/documents

RAG with Late Interaction

The first example will cover RAG with ColBERT / Late Interaction retrieval. TxtAI 9.0 added support for MUVERA and ColBERT multi-vector ranking.

We'll build a pipeline that reads the ColBERT v2 paper, extracts the text into sections and builds an index with a ColBERT model. Then we'll wrap that as a Reranker pipeline using the same ColBERT model. Finally a RAG pipeline will utilize this for retrieval.

Note: This uses the custom ColBERT Muvera Nano model which is only 970K parameters! That's right thousands. It's surprisingly effective.

from txtai import Embeddings, RAG, Textractor
from txtai.pipeline import Reranker, Similarity

# Get text from ColBERT v2 paper
textractor = Textractor(sections=True, backend="docling")
data = textractor("https://arxiv.org/pdf/2112.01488")

# MUVERA fixed dimensional encodings
embeddings = Embeddings(content=True, path="neuml/colbert-muvera-nano", vectors={"trust_remote_code": True})
embeddings.index(data)

# Re-rank using same late interaction model
reranker = Reranker(embeddings, Similarity("neuml/colbert-muvera-nano", lateencode=True, vectors={"trust_remote_code": True}))

template = """
  Answer the following question using the provided context.

  Question:
  {question}

  Context:
  {context}
"""

# RAG with late interaction models
rag = RAG(reranker, "Qwen/Qwen3-4B-Instruct-2507", template=template, output="flatten")
print(rag("Write a sentence abstract about this paper", maxlength=2048))

This paper introduces ColBERTv2, a neural information retrieval model that enhances the quality and efficiency of late interaction by combining an aggressive residual compression mechanism with a denoised supervision strategy, achieving state-of-the-art performance across diverse benchmarks while reducing the model's space footprint by 6–10× compared to previous methods.

RAG with a Web Search

Next we'll run a RAG pipeline using a web search as the retrieval method.

from smolagents import WebSearchTool

tool = WebSearchTool()

def websearch(queries, limit):
    results = []
    for query in queries:
        result = [
            {"id": i, "text": f'{x["title"]} {x["description"]}', "score": 1.0} for i, x in enumerate(tool.search(query))
        ]
        results.append(result[:limit])

    return results

# RAG with a websearch
rag = RAG(websearch, "Qwen/Qwen3-4B-Instruct-2507", template=template, output="flatten")
print(rag("What is AI?", maxlength=2048))

Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It involves technologies like machine learning, deep learning, and natural language processing, and enables machines to simulate human-like learning, comprehension, problem solving, decision-making, creativity, and autonomy.

RAG with a SQL Query

The last example we'll cover is running RAG with a SQL query. We'll use the SQL database that's a component of the txtai-wikipedia-slim embeddings database.

Since this is just a database with Wikipedia abstracts, we'll need a way to build a SQL query from a search query. For that we'll use an LLM to extract a keyword to use in a LIKE clause.

Given that the LLM used was released in August 2025, let's ask it a question that can only be accurated answered with external data. Who won the 2025 World Series? which ended in November.

import sqlite3

from txtai import LLM

def keyword(query):
    return llm(f"""
        Extract a keyword for this search query: {query}.
        Return only text with no other formatting or explanation.
    """)

def sqlsearch(queries, limit):
    results = []
    sql = "SELECT id, text FROM sections WHERE id LIKE ? LIMIT ?"

    for query in queries:
        # Extract a keyword for this search
        query = keyword(query)

        # Run the SQL Query
        results.append([
            {"id": uid, "text": text, "score": 1.0}
            for uid, text in cursor.execute(sql, [f"%{query}%", limit])
        ])

    return results

# Load the database
cursor = sqlite3.connect("documents")

# Load the LLM
llm = LLM("Qwen/Qwen3-4B-Instruct-2507")

# RAG with a SQL query
rag = RAG(sqlsearch, llm, template=template, output="flatten")
print(rag("Tell me what happened in the 2025 World Series", maxlength=2048))

In the 2025 World Series, the Los Angeles Dodgers defeated the Toronto Blue Jays in seven games to win the championship. The series took place from October 24 to November 1 (ending early on November 2, Toronto time). Dodgers pitcher Yoshinobu Yamamoto was named the World Series MVP. The series was televised by Fox in the United States and by Sportsnet in Canada.