DEV Community

Cover image for Comprehending RAGs with a keyword search [LLM A1]
anup s
anup s

Posted on

Comprehending RAGs with a keyword search [LLM A1]

Large Language Models (LLMs) have reshaped the IT industry over the past few years, and I’m no exception to the wave of adoption. They’ve become a go-to assistant for both day-to-day questions and deeper research tasks.

Whether the goal is business innovation or a personal project, building an effective GenAI workflow means understanding a handful of core capabilities and components. One of the most important is component is Retrieval-Augmented Generation (RAG).


🧠 What is RAG?

RAG is a technique/design pattern that improves the quality and relevance of LLM responses by combining two core capabilities:

  • Retrieval: Pulling relevant documents or facts from an external knowledge source (like a vector database or search engine).
  • Generation: Using a foundational model (e.g., from OpenAI, AWS, etc.) to generate natural language answers.
  • Augmentation: Enriching the model’s input prompt by injecting the retrieved content to provide context that the LLM alone wouldn't otherwise have.

⚽ Example Use Case

Imagine you’re building a sports Q&A assistant. A user asks:

"Who scored the winning goal in the last Barcelona match?"

Rather than relying solely on the LLM’s trained knowledge (which may be outdated), the RAG pipeline:

  • Retrieves up-to-date match stats, player performance data, and venue context from your internal index or search service.
  • Appends that information to the user's question.
  • Sends the enriched prompt to the LLM, which then generates a precise and relevant answer.

RAG Architecture


Build a Simple GenAI Workflow with ES and OpenAI

🛠️ Setup and Code Walkthrough

1. Start Elasticsearch via Docker

docker run -it \
  --name elasticsearch \
  -p 9200:9200 \
  -p 9300:9300 \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.17.6
Enter fullscreen mode Exit fullscreen mode

2. Index Documents into Elasticsearch

from elasticsearch import Elasticsearch
from tqdm.auto import tqdm

# Connect to local Elasticsearch
es_client = Elasticsearch("http://localhost:9200")

# Define schema for indexing
index_settings = {
    "settings": {"number_of_shards": 1, "number_of_replicas": 0},
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"}
        }
    }
}

index_name = "course-questions"
es_client.indices.create(index=index_name, body=index_settings)

# Index your documents (replace `docs` with your actual dataset)
for doc in tqdm(docs):
    es_client.index(index=index_name, document=doc)
Enter fullscreen mode Exit fullscreen mode

3. Define Retrieval Logic (ElasticSearch)

def elastic_search(query):
    search_query = {
        "size": 5,
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^3", "text", "section"],
                        "type": "best_fields"
                    }
                },
                "filter": {
                    "term": {
                        "course": "data-engineering-zoomcamp"
                    }
                }
            }
        }
    }

    response = es_client.search(index=index_name, body=search_query)

    result_docs = []
    for hit in response['hits']['hits']:
        result_docs.append(hit['_source'])

    return result_docs
Enter fullscreen mode Exit fullscreen mode

4. Build prompt for querying LLM

def build_prompt(q_question, search_results):
    prompt_template = """
    You're a course teaching assistant. Answer the QUESTION based on the CONTEXT.
Use only the facts from the CONTEXT when answering the QUESTION.
If the CONTEXT doesn't contain the answer, output NONE


QUESTION: {question} 

CONTEXT: {context}
""".strip()

    context = ""

    for doc in search_results:
        context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer:  {doc['text']}\n\n"

    prompt = prompt_template.format(question=q_question, context=context).strip()
    return prompt
Enter fullscreen mode Exit fullscreen mode

5. Retrieve Response using LLM


# Function to query the LLM (e.g. OpenAI)
def query_llm(mod_prompt):
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{"role": "user", "content": mod_prompt}]
    )
    return response.choices[0].message.content

# Combine everything in a RAG-style function
def rag(query):
    results = elastic_search(query)
    prompt = build_prompt(query, results)  # You’ll need to define this function
    return query_llm(prompt)

# Try it out
query = "can I still join the course?"
print(rag(query))  # Example output: "Yes, you can still join the course."
Enter fullscreen mode Exit fullscreen mode

📚 References

Top comments (0)