DEV Community

David Mezzetti for NeuML

Posted on • Originally published at neuml.hashnode.dev

RAG is more than Vector Search

Retrieval Augmented Generation (RAG) is often associated with vector search. And while that is a primary use case, any search will do.

  • ✅ Vector Search
  • ✅ Web Search
  • ✅ SQL Query

This article will go over a few RAG examples covering different retrieval methods. These examples require txtai 9.3+.

Install dependencies

Install txtai and all dependencies.

pip install txtai[pipeline-data]

# Download example SQL database
wget https://huggingface.co/NeuML/txtai-wikipedia-slim/resolve/main/documents
Enter fullscreen mode Exit fullscreen mode

RAG with Late Interaction

The first example will cover RAG with ColBERT / Late Interaction retrieval. TxtAI 9.0 added support for MUVERA and ColBERT multi-vector ranking.

We'll build a pipeline that reads the ColBERT v2 paper, extracts the text into sections and builds an index with a ColBERT model. Then we'll wrap that as a Reranker pipeline using the same ColBERT model. Finally a RAG pipeline will utilize this for retrieval.

Note: This uses the custom ColBERT Muvera Nano model which is only 970K parameters! That's right thousands. It's surprisingly effective.

from txtai import Embeddings, RAG, Textractor
from txtai.pipeline import Reranker, Similarity

# Get text from ColBERT v2 paper
textractor = Textractor(sections=True, backend="docling")
data = textractor("https://arxiv.org/pdf/2112.01488")

# MUVERA fixed dimensional encodings
embeddings = Embeddings(content=True, path="neuml/colbert-muvera-nano", vectors={"trust_remote_code": True})
embeddings.index(data)

# Re-rank using same late interaction model
reranker = Reranker(embeddings, Similarity("neuml/colbert-muvera-nano", lateencode=True, vectors={"trust_remote_code": True}))

template = """
  Answer the following question using the provided context.

  Question:
  {question}

  Context:
  {context}
"""

# RAG with late interaction models
rag = RAG(reranker, "Qwen/Qwen3-4B-Instruct-2507", template=template, output="flatten")
print(rag("Write a sentence abstract about this paper", maxlength=2048))
Enter fullscreen mode Exit fullscreen mode
This paper introduces ColBERTv2, a neural information retrieval model that enhances the quality and efficiency of late interaction by combining an aggressive residual compression mechanism with a denoised supervision strategy, achieving state-of-the-art performance across diverse benchmarks while reducing the model's space footprint by 6–10× compared to previous methods.
Enter fullscreen mode Exit fullscreen mode

RAG with a Web Search

Next we'll run a RAG pipeline using a web search as the retrieval method.

from smolagents import WebSearchTool

tool = WebSearchTool()

def websearch(queries, limit):
    results = []
    for query in queries:
        result = [
            {"id": i, "text": f'{x["title"]} {x["description"]}', "score": 1.0} for i, x in enumerate(tool.search(query))
        ]
        results.append(result[:limit])

    return results

# RAG with a websearch
rag = RAG(websearch, "Qwen/Qwen3-4B-Instruct-2507", template=template, output="flatten")
print(rag("What is AI?", maxlength=2048))
Enter fullscreen mode Exit fullscreen mode
Artificial intelligence (AI) is the capability of computational systems to perform tasks typically associated with human intelligence, such as learning, reasoning, problem-solving, perception, and decision-making. It involves technologies like machine learning, deep learning, and natural language processing, and enables machines to simulate human-like learning, comprehension, problem solving, decision-making, creativity, and autonomy.
Enter fullscreen mode Exit fullscreen mode

RAG with a SQL Query

The last example we'll cover is running RAG with a SQL query. We'll use the SQL database that's a component of the txtai-wikipedia-slim embeddings database.

Since this is just a database with Wikipedia abstracts, we'll need a way to build a SQL query from a search query. For that we'll use an LLM to extract a keyword to use in a LIKE clause.

Given that the LLM used was released in August 2025, let's ask it a question that can only be accurated answered with external data. Who won the 2025 World Series? which ended in November.

import sqlite3

from txtai import LLM

def keyword(query):
    return llm(f"""
        Extract a keyword for this search query: {query}.
        Return only text with no other formatting or explanation.
    """)

def sqlsearch(queries, limit):
    results = []
    sql = "SELECT id, text FROM sections WHERE id LIKE ? LIMIT ?"

    for query in queries:
        # Extract a keyword for this search
        query = keyword(query)

        # Run the SQL Query
        results.append([
            {"id": uid, "text": text, "score": 1.0}
            for uid, text in cursor.execute(sql, [f"%{query}%", limit])
        ])

    return results

# Load the database
cursor = sqlite3.connect("documents")

# Load the LLM
llm = LLM("Qwen/Qwen3-4B-Instruct-2507")

# RAG with a SQL query
rag = RAG(sqlsearch, llm, template=template, output="flatten")
print(rag("Tell me what happened in the 2025 World Series", maxlength=2048))
Enter fullscreen mode Exit fullscreen mode
In the 2025 World Series, the Los Angeles Dodgers defeated the Toronto Blue Jays in seven games to win the championship. The series took place from October 24 to November 1 (ending early on November 2, Toronto time). Dodgers pitcher Yoshinobu Yamamoto was named the World Series MVP. The series was televised by Fox in the United States and by Sportsnet in Canada.
Enter fullscreen mode Exit fullscreen mode

Wrapping up

This article showed that RAG is about much more than vector search. With txtai 9.3+, any callable method is now supported for retrieval. Enjoy!

Top comments (0)