DEV Community

David Mezzetti for NeuML

Posted on • Edited on • Originally published at neuml.hashnode.dev

Prompt-driven search with LLMs

This article revisits the RAG pipeline, which has been covered in a number of previous articles. This pipeline is a combination of a similarity instance (embeddings or similarity pipeline) to build a question context and a model that answers questions.

The RAG pipeline recently underwent a number of major upgrades to support the following.

  • Ability to run embeddings searches. Given that content is supported, text can be retrieved from the embeddings instance.
  • In addition to extractive qa, support text generation models, sequence to sequence models and custom pipelines

These changes enable embeddings-guided and prompt-driven search with Large Language Models (LLMs) 🔥🔥🔥

Install dependencies

Install txtai and all dependencies.

# Install txtai
pip install txtai datasets
Enter fullscreen mode Exit fullscreen mode

Create Embeddings and RAG instances

An Embeddings instance defines methods to represent text as vectors and build vector indexes for search.

The RAG pipeline is a combination of a similarity instance (embeddings or similarity pipeline) to build a question context and a model that answers questions. The model can be a prompt-driven large language model (LLM), an extractive question-answering model or a custom pipeline.

Let's run a basic example.

from txtai import Embeddings, RAG

# Create embeddings model with content support
embeddings = Embeddings(path="sentence-transformers/all-MiniLM-L6-v2", content=True)

# Create the RAG pipeline
rag = RAG(embeddings, "Qwen/Qwen3-4B-Instruct-2507", template="""
  Answer the following question using the provided context.

  Question:
  {question}

  Context:
  {context}
""")
Enter fullscreen mode Exit fullscreen mode
data = ["Giants hit 3 HRs to down Dodgers",
        "Giants 5 Dodgers 4 final",
        "Dodgers drop Game 2 against the Giants, 5-4",
        "Blue Jays beat Red Sox final score 2-1",
        "Red Sox lost to the Blue Jays, 2-1",
        "Blue Jays at Red Sox is over. Score: 2-1",
        "Phillies win over the Braves, 5-0",
        "Phillies 5 Braves 0 final",
        "Final: Braves lose to the Phillies in the series opener, 5-0",
        "Lightning goaltender pulled, lose to Flyers 4-1",
        "Flyers 4 Lightning 1 final",
        "Flyers win 4-1"]

questions = ["What team won the game?", "What was score?"]

for query in ["Red Sox - Blue Jays", "Phillies - Braves", "Dodgers - Giants", "Flyers - Lightning"]:
    print("----", query, "----")
    for answer in rag([f"{query} {x}" for x in questions], data):
        print(answer)
    print()
Enter fullscreen mode Exit fullscreen mode
---- Red Sox - Blue Jays ----
{'answer': 'The Blue Jays won the game.'}
{'answer': 'The score was 2-1 in favor of the Blue Jays.'}

---- Phillies - Braves ----
{'answer': 'The Phillies won the game.'}
{'answer': 'The score was 5-0 in favor of the Phillies.'}

---- Dodgers - Giants ----
{'answer': 'The Giants won the game.'}
{'answer': 'The score was Giants 5, Dodgers 4.'}

---- Flyers - Lightning ----
{'answer': 'The Flyers won the game.'}
{'answer': 'The score was Flyers 4, Lightning 1.'}
Enter fullscreen mode Exit fullscreen mode

This code runs a series of questions. First it runs an embeddings filtering query to find the most relevant text. For example, Red Sox - Blue Jays finds text related to those teams. Then What team won the game? and What was the score? are asked.

This logic is the same logic found in "Extractive QA with txtai" but uses prompt-based QA vs extractive QA.

Embeddings-guided and Prompt-driven Search

Now for the fun stuff. Let's build an embeddings index for the ag_news dataset (a set of news stories from the mid 2000s). Then we'll use prompts to ask questions with embeddings results as the context.

from datasets import load_dataset

dataset = load_dataset("ag_news", split="train")

# Create an embeddings index over the dataset
embeddings = Embeddings(path="sentence-transformers/all-MiniLM-L6-v2", content=True)
embeddings.index(dataset["text"])

# Create RAG instance
rag = RAG(embeddings, "Qwen/Qwen3-4B-Instruct-2507", template="""
  Answer the following question using the provided context.

  Question:
  {question}

  Context:
  {context}
""", output="flatten")
Enter fullscreen mode Exit fullscreen mode

Now let's run a prompt-driven search!

question = "Who won the 2004 presidential election?"
answer = rag(question)
print(question, answer)

nquestion = "Who did the candidate beat?"
print(nquestion, rag(f"{question} {answer}. {nquestion}"))
Enter fullscreen mode Exit fullscreen mode
Who won the 2004 presidential election? George W. Bush won the 2004 presidential election.
Who did the candidate beat? George W. Bush beat John F. Kerry in the 2004 presidential election.
Enter fullscreen mode Exit fullscreen mode

And there are the answers. Let's unpack how this works.

The first thing the RAG pipeline does is run an embeddings search to find the most relevant text within the index. A context string is then built using those search results.

After that, a prompt is generated, run and the answer printed.

Additional examples

Before moving on, a couple more example questions.

question = "Who won the World Series in 2004?"
answer = rag(question)
print(question, answer)

nquestion = "What team did the Red Sox beat in the World Series?"
print(nquestion, rag(f"{question} {answer}. {nquestion}"))
Enter fullscreen mode Exit fullscreen mode
Who won the World Series in 2004? The Boston Red Sox won the World Series in 2004.
What team did the Red Sox beat in the World Series? The Boston Red Sox beat the St. Louis Cardinals in the World Series.
Enter fullscreen mode Exit fullscreen mode
rag("Tell me something interesting")
Enter fullscreen mode Exit fullscreen mode
An interesting fact is that herrings communicate by farting—a quirky and unusual discovery that was honored with an Ig Nobel Prize for its oddball research.
Enter fullscreen mode Exit fullscreen mode

Whhaaaattt??? Is this a model hallucination?

Let's run an embeddings query and see if that text is in the results.

answer = "herrings communicate by farting"
for x in embeddings.search("Tell me something interesting"):
  if answer in x["text"]:
    start = x["text"].find(answer)
    print(x["text"][start:start + len(answer)])
Enter fullscreen mode Exit fullscreen mode
herrings communicate by farting
Enter fullscreen mode Exit fullscreen mode

Sure enough it is 😃

Wrapping up

This article covered how to run embeddings-guided and prompt-driven search with LLMs. This functionality is a major step forward towards Generative Semantic Search for txtai. More to come, stay tuned!

Top comments (1)

Collapse
 
leonardpuettmann profile image
Leonard Püttmann

Great post David!