DEV Community

Abdelrahman Adnan
Abdelrahman Adnan

Posted on

๐Ÿ—๏ธ Part 1: Foundation - Basic RAG and Agentic Concepts

๐ŸŽ“ LLM Zoomcamp Tutorial Series - Building Agentic Assistants with OpenAI Function Calling

Welcome to Part 1 of our comprehensive LLM Zoomcamp tutorial series! ๐ŸŽ“ This is the foundation where you'll learn the core concepts of RAG (Retrieval Augmented Generation) and what makes a system "agentic". Perfect for beginners who want to understand how intelligent AI assistants work! ๐Ÿš€


๐ŸŽฏ Understanding the Core Problem (LLM Zoomcamp Challenge)

Welcome to your first LLM Zoomcamp agentic project! ๐ŸŽ“ Our goal is to create an intelligent assistant that can help course participants by leveraging Frequently Asked Questions (FAQ) documents. These FAQ documents contain question-answer pairs that provide valuable information about course enrollment, requirements, and procedures.

Think of it like having a smart study buddy who has read all the course materials! ๐Ÿ“–

๐ŸŽฏ What We Want to Build:

  • ๐Ÿ” Search through FAQ documents intelligently
  • ๐Ÿง  Decide when to use external knowledge vs. built-in knowledge
  • ๐Ÿ”„ Make multiple search iterations for complex queries
  • ๐Ÿ’ฌ Provide contextual, accurate responses

๐Ÿค– What Makes a System "Agentic"? (LLM Zoomcamp Core Concept)

An agent in AI is like a smart assistant that can think and act independently! Here's what makes it special:

๐ŸŒ Interacts with an environment (in our case, the chat dialogue)

๐Ÿ‘€ Observes and gathers information (through search functions)

๐Ÿƒโ€โ™‚๏ธ Performs actions (searching, answering, adding entries)

๐Ÿง  Maintains memory of past actions and context

โšก Makes independent decisions about what to do next

The key difference between basic RAG and agentic RAG is decision-making autonomy! Instead of always searching or always using built-in knowledge, an agentic system can intelligently choose the best approach. ๐ŸŽฏ

๐Ÿ—๏ธ Building Basic RAG Foundation (LLM Zoomcamp Step-by-Step)

Let's start by building the fundamental building blocks! Think of this as learning to walk before we run. ๐Ÿ‘ถ

๐Ÿ› ๏ธ Step 1: Setting Up Your LLM Zoomcamp Environment

# ๐Ÿ“ฆ First, let's install the packages we need
# Think of these as your toolkit for building AI assistants!
pip install openai minsearch requests jupyter markdown
Enter fullscreen mode Exit fullscreen mode

Now let's import our tools one by one:

# ๐Ÿ“š Import the libraries (like getting books from a library)
import json          # For working with data in JSON format
import requests      # For downloading data from the internet
from openai import OpenAI         # For talking to ChatGPT
from minsearch import AppendableIndex  # For searching through documents

# ๐Ÿ”‘ Initialize OpenAI client (this is your key to ChatGPT)
# Make sure you have OPENAI_API_KEY set in your environment!
client = OpenAI()
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Tip: Think of the OpenAI client as your telephone to ChatGPT. You'll use it to send questions and get answers!

๐Ÿ“Š Step 2: Getting and Preparing Our LLM Zoomcamp Data

Now let's get some real FAQ data to work with! This is like downloading all the course materials. ๐Ÿ“ฅ

# ๐ŸŒ Step 2a: Download the FAQ documents from the internet
# This URL contains real FAQ data from data engineering courses
docs_url = 'https://github.com/alexeygrigorev/llm-rag-workshop/raw/main/notebooks/documents.json'
docs_response = requests.get(docs_url)
documents_raw = docs_response.json()

print("๐Ÿ“ฅ Downloaded FAQ data successfully!")
print(f"๐Ÿ“Š Found {len(documents_raw)} courses with FAQ data")
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: We're downloading a JSON file that contains FAQ questions and answers from real courses. Think of it as a digital textbook! ๐Ÿ“š

# ๐Ÿ”„ Step 2b: Transform the data into a format we can search
# We're "flattening" the data - turning nested data into a simple list
documents = []

for course in documents_raw:  # Go through each course
    course_name = course['course']  # Get the course name

    for doc in course['documents']:  # Go through each FAQ in that course
        doc['course'] = course_name  # Add course name to each FAQ
        documents.append(doc)        # Add to our main list

print(f"โœ… Processed {len(documents)} FAQ documents total!")
print("๐Ÿ“‹ Each document now has: question, answer, section, and course name")
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: Imagine you have several books (courses), each with many pages (documents). We're taking all the pages and putting them in one big stack, but we label each page with which book it came from! ๐Ÿ“šโžก๏ธ๐Ÿ“„

# ๐Ÿ—‚๏ธ Step 2c: Create our search index (like a super-smart filing cabinet)
index = AppendableIndex(
    text_fields=["question", "text", "section"],  # Fields we can search in
    keyword_fields=["course"]                     # Fields for exact filtering
)

# ๐Ÿš€ Put all our documents into the search index
index.fit(documents)

print("๐Ÿ—‚๏ธ Created search index successfully!")
print("๐Ÿ” Now we can quickly find relevant FAQ answers!")
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: Think of this index like Google for your FAQ documents. Instead of reading every single document, we can ask "find me documents about Docker" and it will instantly find the relevant ones! โšก

๐Ÿ” Step 3: Building Our LLM Zoomcamp Search Function

Now let's create a function that can search through our FAQ documents! This is like having a research assistant. ๐Ÿ•ต๏ธโ€โ™€๏ธ

def search(query):
    """
    ๐Ÿ” Search the FAQ database for relevant entries.

    Think of this as asking a librarian: "Can you find me books about Python?"

    Args:
        query (str): What the user wants to search for (like "Docker setup")

    Returns:
        list: A list of relevant FAQ entries, ranked by relevance
    """

    # ๐ŸŽฏ Step 3a: Set up boosting (some fields are more important)
    # Questions are 3x more important than sections when matching
    boost = {
        'question': 3.0,    # If the search term appears in a question, it's very relevant!
        'section': 0.5      # If it appears in a section name, it's somewhat relevant
    }

    # ๐Ÿ” Step 3b: Actually perform the search
    results = index.search(
        query=query,                                    # What to search for
        filter_dict={'course': 'data-engineering-zoomcamp'},  # Only search in this course
        boost_dict=boost,                               # Use our importance scoring
        num_results=5,                                  # Return top 5 matches
        output_ids=True                                 # Include document IDs
    )

    return results

# ๐Ÿงช Let's test our search function!
test_results = search("How do I install Docker?")
print(f"๐Ÿ” Found {len(test_results)} results for 'How do I install Docker?'")

# ๐Ÿ‘€ Let's look at the first result
if test_results:
    first_result = test_results[0]
    print(f"๐Ÿ“ First result question: {first_result['question']}")
    print(f"โญ Relevance score: {first_result.get('score', 'N/A')}")
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: Our search function is like a smart librarian who:

  1. ๐ŸŽฏ Knows that questions are more important than section names
  2. ๐Ÿ” Only looks in the specific course we care about
  3. โญ Ranks results by how well they match
  4. ๐Ÿ“Š Returns the top 5 most relevant answers

๐Ÿ—๏ธ Step 4: Creating Our LLM Zoomcamp RAG Pipeline

Now we'll build the complete RAG system step by step! RAG = Retrieval + Augmented + Generation. ๐Ÿ—๏ธ

# ๐Ÿ“ Step 4a: Helper function to format search results
def build_context(search_results):
    """
    ๐Ÿ—๏ธ Build a context string from search results.

    Think of this as organizing your research notes before writing an essay!

    Args:
        search_results (list): Results from our search function

    Returns:
        str: Nicely formatted context for the AI to use
    """
    context = ""

    # ๐Ÿ”„ Go through each search result and format it nicely
    for doc in search_results:
        context += f"section: {doc['section']}\n"          # What section this is from
        context += f"question: {doc['question']}\n"        # The original question
        context += f"answer: {doc['text']}\n\n"           # The answer text

    return context.strip()  # Remove extra whitespace

# ๐Ÿงช Let's test our context builder
test_results = search("Docker installation")
test_context = build_context(test_results)
print("๐Ÿ“ Built context from search results:")
print(test_context[:200] + "..." if len(test_context) > 200 else test_context)
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: The build_context function is like organizing your research notes. Instead of giving ChatGPT a messy pile of information, we organize it neatly so the AI can easily understand and use it! ๐Ÿ“‹

# ๐Ÿค– Step 4b: Function to talk to ChatGPT
def llm(prompt):
    """
    ๐Ÿค– Send a question to ChatGPT and get an answer back.

    This is like having a conversation with a very smart assistant!

    Args:
        prompt (str): The complete question/instruction for ChatGPT

    Returns:
        str: ChatGPT's response
    """

    # ๐Ÿ“ž Make the API call to OpenAI
    response = client.chat.completions.create(
        model='gpt-4o-mini',                           # Which AI model to use
        messages=[{"role": "user", "content": prompt}] # Our question
    )

    # ๐Ÿ“ฅ Extract the text response
    return response.choices[0].message.content

print("๐Ÿค– LLM function ready - we can now talk to ChatGPT!")
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: This function is your hotline to ChatGPT! You send it a prompt (like a detailed question), and it sends back ChatGPT's answer. Simple! ๐Ÿ“žโžก๏ธ๐Ÿค–โžก๏ธ๐Ÿ’ฌ

# ๐ŸŽฏ Step 4c: Create the main RAG function
def basic_rag(query):
    """
    ๐ŸŽฏ Our complete RAG pipeline: Search + Context + Generate Answer

    This is the magic! We combine search results with AI to answer questions.

    Args:
        query (str): The user's question (like "How do I join the course?")

    Returns:
        str: A complete, helpful answer
    """

    # ๐Ÿ” Step 1: Search for relevant information
    print(f"๐Ÿ” Searching for: {query}")
    search_results = search(query)

    # ๐Ÿ“ Step 2: Build context from search results
    print(f"๐Ÿ“ Found {len(search_results)} relevant documents")
    context = build_context(search_results)

    # ๐ŸŽญ Step 3: Create a detailed prompt for ChatGPT
    prompt_template = """
You're a helpful course teaching assistant for the LLM Zoomcamp! ๐ŸŽ“

Your job is to answer the QUESTION based on the CONTEXT from our FAQ database.
Only use facts from the CONTEXT when answering the QUESTION.

<QUESTION>
{question}
</QUESTION>

<CONTEXT>
{context}
</CONTEXT>

Please provide a helpful, detailed answer! ๐Ÿ˜Š
""".strip()

    # ๐Ÿ”„ Step 4: Fill in the template with our data
    prompt = prompt_template.format(question=query, context=context)
    print("๐ŸŽญ Created prompt for ChatGPT")

    # ๐Ÿค– Step 5: Get answer from ChatGPT
    print("๐Ÿค– Getting answer from ChatGPT...")
    answer = llm(prompt)

    return answer

# ๐Ÿงช Let's test our complete RAG system!
print("๐Ÿงช Testing our LLM Zoomcamp RAG system!")
test_question = "How do I join the course?"
answer = basic_rag(test_question)

print(f"\nโ“ Question: {test_question}")
print(f"โœ… Answer: {answer}")
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: Our basic_rag function is like having a research assistant who:

  1. ๐Ÿ” Searches through all course materials
  2. ๐Ÿ“ Organizes the relevant information
  3. ๐ŸŽญ Asks ChatGPT a well-structured question
  4. โœ… Returns a helpful answer based on real course data!

๐Ÿง  Making RAG Agentic: Decision-Making Capabilities (LLM Zoomcamp Advanced)

The basic RAG always searches first, then answers. But what if we want our system to be smarter? ๐Ÿค” An agentic system should decide whether to search or use its own knowledge. Let's make it intelligent! ๐Ÿง 

๐ŸŽญ Enhanced Agentic Prompt (LLM Zoomcamp Magic)

# ๐ŸŽญ This is our "smart prompt" that teaches ChatGPT to make decisions
agentic_prompt_template = """
๐ŸŽ“ You're a course teaching assistant for the LLM Zoomcamp!

You're given a QUESTION from a student. You have three superpowers:

1. ๐Ÿ“– Answer using the provided CONTEXT (if available and good enough)
2. ๐Ÿง  Use your own knowledge if CONTEXT is EMPTY or not helpful  
3. ๐Ÿ” Request a search of the FAQ database if you need more info

Current CONTEXT: {context}

<QUESTION>
{question}
</QUESTION>

๐Ÿ” If CONTEXT is EMPTY or you need more information, respond with:
{{
"action": "SEARCH",
"reasoning": "Explain why you need to search the FAQ database"
}}

๐Ÿ“– If you can answer using CONTEXT, respond with:
{{
"action": "ANSWER",
"answer": "Your detailed, helpful answer here",
"source": "CONTEXT"
}}

๐Ÿง  If CONTEXT isn't helpful but you can answer from your knowledge:
{{
"action": "ANSWER", 
"answer": "Your detailed, helpful answer here",
"source": "OWN_KNOWLEDGE"
}}

Remember: Always be helpful and explain things clearly! ๐Ÿ˜Š
""".strip()

print("๐ŸŽญ Created our intelligent agentic prompt!")
print("โœจ Now ChatGPT can decide what to do instead of always searching!")
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: This prompt is like giving ChatGPT a decision-making flowchart! Instead of always doing the same thing, it can now choose the best action based on the situation. It's like upgrading from a calculator to a smartphone! ๐Ÿ“ฑ

๐Ÿš€ Implementing Agentic Decision Logic (LLM Zoomcamp Step-by-Step)

Now let's build our smart assistant that can make decisions! This is where the magic happens! โœจ

def agentic_rag_v1(question):
    """
    ๐Ÿš€ First version of our smart agentic RAG system!

    This assistant can decide whether to search or use its own knowledge.

    Args:
        question (str): The student's question

    Returns:
        dict: The assistant's response with source information
    """

    # ๐ŸŽฌ Step 1: Start with empty context (no information yet)
    print(f"๐ŸŽฌ Starting with question: {question}")
    context = "EMPTY"

    # ๐ŸŽญ Step 2: Create prompt and ask ChatGPT what to do
    prompt = agentic_prompt_template.format(question=question, context=context)
    print("๐Ÿค” Asking ChatGPT to make a decision...")

    # ๐Ÿค– Step 3: Get ChatGPT's decision
    answer_json = llm(prompt)
    answer = json.loads(answer_json)  # Convert JSON string to Python dictionary

    print(f"๐Ÿง  ChatGPT decided: {answer['action']}")

    # ๐Ÿ” Step 4: If ChatGPT wants to search, let's do it!
    if answer['action'] == 'SEARCH':
        print(f"๐Ÿ” Reason for searching: {answer['reasoning']}")
        print("๐Ÿ“š Performing search...")

        # Search the FAQ database
        search_results = search(question)
        context = build_context(search_results)

        print(f"โœ… Found {len(search_results)} relevant documents")

        # Ask ChatGPT again, now with context
        prompt = agentic_prompt_template.format(question=question, context=context)
        print("๐Ÿค– Asking ChatGPT again with search results...")

        answer_json = llm(prompt)
        answer = json.loads(answer_json)

        print(f"โœจ Final decision: {answer['action']}")

    return answer

# ๐Ÿงช Let's test our smart assistant!
print("๐Ÿงช Testing LLM Zoomcamp Agentic Assistant!")
print("\n" + "="*50)

# Test 1: Course-specific question (should search)
print("๐Ÿ“š Test 1: Course-specific question")
result1 = agentic_rag_v1("How do I join the LLM Zoomcamp course?")
print(f"๐Ÿ“ Answer: {result1['answer'][:200]}...")
print(f"๐Ÿท๏ธ Source: {result1['source']}")

print("\n" + "="*50)

# Test 2: General knowledge question (should use own knowledge)
print("๐ŸŒ Test 2: General knowledge question")
result2 = agentic_rag_v1("How do I install Python on my computer?")
print(f"๐Ÿ“ Answer: {result2['answer'][:200]}...")
print(f"๐Ÿท๏ธ Source: {result2['source']}")
Enter fullscreen mode Exit fullscreen mode

๐ŸŽ“ LLM Zoomcamp Explanation: Our smart assistant works like this:

  1. ๐Ÿค” Think First: "Do I need to search, or do I already know this?"
  2. ๐Ÿ” Search If Needed: If it's about the course, search the FAQ
  3. ๐Ÿง  Use Knowledge: If it's general knowledge, answer directly
  4. ๐Ÿ“ Always Cite Sources: Tell us where the answer came from!

It's like having a study buddy who knows when to check the textbook vs. when they already know the answer! ๐ŸŽ“

๐ŸŽ“ Key Concepts Introduced in Part 1 (LLM Zoomcamp Fundamentals)

Congratulations! You've just built your first intelligent agent! ๐ŸŽ‰ Let's review what you've learned:

  1. ๐Ÿ—๏ธ RAG Pipeline: Search โ†’ Context Building โ†’ LLM Query

    • Like having a research assistant who finds info, organizes it, and writes an answer
  2. ๐Ÿง  Agentic Decision Making: LLM chooses actions based on available information

    • Your assistant can now think: "Should I search or do I already know this?"
  3. ๐Ÿ“ Structured Output: Using JSON format for consistent action parsing

    • Like having a standard form for the AI to fill out its decisions
  4. ๐Ÿ—‚๏ธ Context Management: Handling empty vs. populated context states

    • Knowing when you have enough information vs. when you need more
  5. ๐Ÿท๏ธ Source Attribution: Tracking whether answers come from FAQ or general knowledge

    • Always citing your sources - good academic practice! ๐Ÿ“š

๐Ÿค– Understanding Agent Behavior (LLM Zoomcamp Insights)

Your agentic system now exhibits intelligent behavior! ๐Ÿง โœจ

  • ๐Ÿ“š For course-specific questions: Recognizes need to search FAQ database

    • "How do I join the course?" โ†’ ๐Ÿ” Search FAQ โ†’ ๐Ÿ“– Answer from course materials
  • ๐ŸŒ For general questions: Uses built-in knowledge without unnecessary searches

    • "How do I install Python?" โ†’ ๐Ÿง  Use own knowledge โ†’ ๐Ÿ’ฌ Direct answer
  • ๐ŸŽฏ Context awareness: Makes decisions based on available information

    • Knows the difference between "I have info" vs. "I need to find info"
  • ๐Ÿ’ญ Reasoning: Provides explanations for its chosen actions

    • Not just doing things, but explaining WHY it's doing them

๐ŸŽ“ LLM Zoomcamp Achievement Unlocked: You now understand the fundamental difference between basic RAG and agentic RAG! Your assistant doesn't just follow a script - it makes intelligent decisions! ๐Ÿš€

๐Ÿš€ What's Next?

This foundation prepares you for more sophisticated agentic behaviors in Part 2, where we'll implement:

  • ๐Ÿ”„ Iterative search strategies that explore topics deeply
  • โšก OpenAI Function Calling for professional tool integration
  • ๐Ÿ’ฌ Conversational agents with memory
  • ๐ŸŽจ Beautiful user interfaces

Ready to level up to Part 2? ๐ŸŽ“โœจ


๐Ÿ“š Resources for Part 1


๐ŸŽ“ LLM Zoomcamp Tutorial Series - Part 1 Complete! ๐ŸŽ‰

Continue your journey with Part 2 to master advanced function calling and iterative search! ๐Ÿš€

#LLMZoomcamp

Top comments (0)