DEV Community

Abdelrahman Adnan
Abdelrahman Adnan

Posted on

πŸš€ LLM Zoomcamp Week 1: Retrieval-Augmented Generation (RAG)

🧩 The Big Picture: What is RAG?

Retrieval-Augmented Generation (RAG) is like giving your AI assistant a personalized reference library! It combines:

  1. Retrieval πŸ”: A smart search engine that finds relevant facts
  2. Generation ✍️: An LLM that crafts human-like responses

πŸ€– Why Do We Need RAG?

Standard LLMs have two major limitations:

  • They only know what they were trained on (often outdated) ⏰
  • They sometimes hallucinate (make things up) πŸ€ͺ

RAG solves both problems by feeding the model fresh, relevant information at query time!

πŸ› οΈ How RAG Actually Works

The Simple Flow:

User Question β†’ Search Engine β†’ Relevant Docs β†’ Enhanced Prompt β†’ LLM β†’ Answer
Enter fullscreen mode Exit fullscreen mode

The Magic Principle πŸ’«

LLMs prioritize information in the immediate context over what they've memorized during training. By placing facts directly in the prompt, we dramatically increase accuracy!

πŸ”§ Building Your Own RAG System

1️⃣ Set Up Your Knowledge Base

# Download your documents (FAQs, manuals, etc.)
!wget https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/01-intro/documents.json

# Start ElasticSearch (lightweight search engine)
!docker run -d -p 9200:9200 -p 9300:9300 -m 4g \
  -e "discovery.type=single-node" \
  -e "xpack.security.enabled=false" \
  docker.elastic.co/elasticsearch/elasticsearch:8.4.3
Enter fullscreen mode Exit fullscreen mode

2️⃣ Index Your Documents

# Create an index with proper text mappings
es.indices.create(index="course_questions", settings={"number_of_shards": 1, "number_of_replicas": 0})

# Add all documents to ElasticSearch
for doc in documents:
    es.index(index="course_questions", document=doc)
Enter fullscreen mode Exit fullscreen mode

3️⃣ Create Your Search Function

def elastic_search(query: str) -> list:
    """Find documents relevant to the query"""
    search_query = {
        "query": {
            "bool": {
                "must": {
                    "multi_match": {
                        "query": query,
                        "fields": ["question^3", "text", "section"],
                        "type": "best_fields"
                    }
                },
                "filter": {"term": {"course": "data-engineering-zoomcamp"}}
            }
        },
        "size": 5
    }

    response = es.search(index="course_questions", body=search_query)
    results = [hit["_source"] for hit in response["hits"]["hits"]]
    return results
Enter fullscreen mode Exit fullscreen mode

4️⃣ Build Your Prompt Constructor

def build_prompt(query: str, search_results: list) -> str:
    """Format retrieved context and query into effective prompt"""

    # Format context from search results
    context_parts = []
    for i, doc in enumerate(search_results, 1):
        context_parts.append(
            f"DOCUMENT {i}:\n"
            f"Section: {doc['section']}\n"
            f"Question: {doc['question']}\n"
            f"Answer: {doc['text']}\n"
        )
    context = "\n".join(context_parts)

    # Create instruction-focused prompt
    prompt = f"""You are a course teaching assistant for the DataTalks.Club courses.
CONTEXT:
{context}

QUESTION: {query}

Answer the QUESTION using ONLY information from the CONTEXT.
If you cannot answer based on the CONTEXT alone, say "I don't have enough information to answer that."
Keep your answer concise and helpful for a student."""

    return prompt.strip()
Enter fullscreen mode Exit fullscreen mode

5️⃣ Connect to an LLM

def llm(prompt: str, model: str = "llama3-8b-8192") -> str:
    """Generate response using LLM API"""
    client = Groq(api_key=os.getenv("GROQ_API_KEY"))

    completion = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}]
    )

    return completion.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

6️⃣ Combine Everything in a RAG Pipeline

def rag(query: str, model: str = "llama3-8b-8192") -> str:
    """Complete RAG pipeline from query to answer"""
    # 1. Retrieve relevant documents
    search_results = elastic_search(query)

    # 2. Build enhanced prompt with context
    prompt = build_prompt(query, search_results)

    # 3. Generate answer with LLM
    answer = llm(prompt, model)

    return answer
Enter fullscreen mode Exit fullscreen mode

πŸ”₯ Real-World Examples

Example 1: Course FAQ

User: "How do I run Kafka locally?"

Without RAG: "To run Kafka locally, you typically need to download Kafka, extract it, start Zookeeper, then start the Kafka server. You can do this by running bin/zookeeper-server-start.sh and bin/kafka-server-start.sh with their respective configuration files..." (Potentially incorrect or outdated)

With RAG: "According to the course materials, you can run Kafka locally using Docker Compose. The docker-compose.yml file is provided in the 'week_6_stream_processing' folder. Simply navigate to that directory and run 'docker-compose up' to start Kafka and Zookeeper together." (Accurate based on course-specific instructions)

Example 2: Technical Troubleshooting

User: "My Spark job is failing with OOM errors"

Without RAG: "Out of Memory errors in Spark can happen for various reasons..." (Generic advice)

With RAG: "In the Data Engineering Zoomcamp, this issue is addressed in the FAQ. You should try increasing the memory allocation for the Spark driver and executor by setting 'spark.driver.memory' and 'spark.executor.memory' in your SparkSession configuration. The course specifically recommends starting with '2g' for both settings when working with the NYC taxi dataset." (Specific to course materials)

πŸ“ˆ Advanced RAG Techniques

🧠 1. Contextual Compression

Only include the most relevant parts of documents to save context window space.

πŸ”„ 2. Hybrid Search

Combine keyword search with embedding-based semantic search for better results.

πŸ§ͺ 3. Self-Reflection

Have the LLM check if its answer actually addresses the question before responding.

🌟 When to Use RAG

RAG works best for:

  • Question-answering over specific documents πŸ“š
  • Customer support with product documentation πŸ›ŽοΈ
  • Technical assistance with code/API docs πŸ‘¨β€πŸ’»
  • Domain-specific knowledge (medical, legal, etc.) πŸ‘©β€βš•οΈ

🚦 Getting Started Checklist

  1. βœ… Gather your knowledge documents
  2. βœ… Set up a search system (ElasticSearch/Pinecone/Weaviate)
  3. βœ… Connect to an LLM API (OpenAI/Groq/Anthropic)
  4. βœ… Create your prompt template
  5. βœ… Build the retrieval-generation pipeline
  6. βœ… Test with real questions!

🎬 Final Thoughts

RAG isn't just a technical solutionβ€”it's a paradigm shift in how we think about AI assistants. Instead of expecting models to "know everything," we give them the specific knowledge they need, when they need it! 🧠✨

Remember: The quality of your RAG system depends on the quality of your documents, your search capability, and your prompt engineering. Keep improving all three for the best results!

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.