Maureen Muthoni

Posted on Apr 11

Building a Smart Travel Assistant with RAG: A Journey Through Kenya's Tourism Landscape

#rag #beginners #python #machinelearning

How I Built an AI-Powered Q&A System

Have you ever wished you could ask specific questions about a travel destination and get accurate, sourced answers? That's precisely what I set out to build and in this article, I'll walk you through creating a Retrieval-Augmented Generation (RAG) system for Kenya's tourism industry.

The Problem: AI That Makes Things Up

Large Language Models (LLMs) are impressive, but they have a fatal flaw: they confidently generate information that sounds right but might be completely wrong. Ask ChatGPT about the best time to visit the Maasai Mara, and it might give you a reasonable answer or it might hallucinate facts about wildebeest migration patterns.

This is where RAG comes in. Instead of relying on what the AI "thinks" it knows, we give it a library of trusted documents and teach it to search through them before answering. Think of it as moving from a student who wings their exam to one who brings a cheat sheet with verified facts.

What We're Building

Our system ingests PDF documents about Kenyan tourism destinations (Maasai Mara, Mombasa, Mount Kenya, etc.) and provides a REST API where users can ask questions like the following:

"What wildlife can I see at Maasai Mara?"
"What are the best beaches in Mombasa?"
"How difficult is it to climb Mount Kenya?"

The system will:

Search through the PDF documents for relevant information
Extract the most pertinent passages
Use an LLM to generate a natural language answer based only on those passages
Return the sources so users can verify the information

The Tech Stack

Here's what we're using and why:

FastAPI: Lightning-fast Python web framework, perfect for building APIs
Sentence Transformers: Converts text to embeddings (fancy math that makes similar text have similar numbers)
ChromaDB: Stores and searches through those embeddings efficiently
Groq: Blazingly fast LLM inference (seriously, it's ridiculously fast)
pypdf: Extracts text from PDF documents

Architecture: The 30,000-Foot View

PDFs → Text Extraction → Chunking → Embeddings → Vector Database
                                                        ↓
User Query → Embedding → Similarity Search → Context → LLM → Answer

We have two main pipelines:

Ingestion Pipeline (run once): Takes PDFs, breaks them into chunks, converts chunks to vectors, stores in a database.
Query Pipeline (run every query): Takes question, converts to vector, finds similar chunks, sends to LLM for an answer.

Step 1: Document Ingestion — Teaching the System to Read

Let's start with the ingestion script. This is where the magic of preparing our knowledge base happens.

Extracting Text from PDFs

from pypdf import PdfReader

def extract_text(path):
    reader = PdfReader(path)
    text = ""

    for page in reader.pages:
        page_text = page.extract_text()
        if page_text:
            text += page_text + "\n"

    return text

Simple enough, we read each page and concatenate the text. But here's the thing: PDFs are notoriously tricky. Some have scanned images (which need OCR), some have weird encodings, and some have tables that don't extract well. For this project, I assumed clean, text-based PDFs. In production, you'd want more robust error handling.

The Chunking Strategy: Why Size Matters

def chunk_text(text, size=300):
    words = text.split()
    chunks = []
    for i in range(0, len(words), size):
        chunks.append(" ".join(words[i:i+size]))
    return chunks

Why chunk at all? LLMs have context windows, and we can't feed them entire books. More importantly, smaller chunks mean more precise retrieval. If your document chunk is an entire chapter about Mombasa and someone asks about beaches, you'll retrieve all of Mombasa's beaches, hotels, restaurants and history. That's too much noise.

I chose 300 words per chunk through experimentation. Too small (100 words) and you lose context. Too large (1000 words) and your retrieval becomes imprecise.

Embeddings

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("BAAI/bge-small-en-v1.5")

def normalize_embeddings(embeddings):
    norms = np.linalg.norm(embeddings, axis=1, keepdims=True)
    return (embeddings/norms).tolist()

Here's where things get interesting. Embeddings convert text into high dimensional vectors (arrays of numbers). Similar text gets similar vectors. "The lion roared" and "The big cat made a loud sound" will have vectors that are close together in this mathematical space.

I chose BAAI/bge-small-en-v1.5because:

It's small (133M parameters) fast inference
It's good at semantic search tasks
It's actively maintained and well documented

The normalization step is crucial. It converts vectors to unit length, which makes cosine similarity (how ChromaDB compares vectors) equivalent to dot product and dot product is faster to compute.

Storing Everything in ChromaDB

import chromadb

client = chromadb.PersistentClient(path="./chromadb")
collection = client.get_or_create_collection(
    name="travel_and_tourism",
    metadata={"description": "Multi PDF Tourism documents"}
)

collection.add(
    documents=all_chunks,
    embeddings=all_embeddings,
    ids=all_ids,
    metadatas=all_metadatas
)

ChromaDB is a vector database designed for this exact use case. It:

Stores embeddings efficiently
Provides fast similarity search
Persists data to disk
Has a simple Python API

The PersistentClient means our vectors survive restarts. We don't have to re-embed all our documents every time we start the server.

Step 2: The Query Pipeline

Now for the fun part: answering questions.

Converting Questions to Vectors

def ask(question: str):
    query_embedding = model.encode([question])
    query_embedding = normalize_embeddings(query_embedding)

We use the same embedding model we used for documents. This is critical. If you embed documents with Model A and queries with Model B, the vector spaces won't align.

Similarity Search

results = collection.query(
    query_embeddings=query_embedding,
    n_results=3
)

docs = results['documents'][0]
metadatas = results["metadatas"][0]

ChromaDB finds the 3 most similar document chunks to our query. How does it know what's similar? It computes the distance between the query vector and every document vector, then returns the closest ones.

Why 3? Another Goldilocks number. Too few (1) and you might miss important context. Too many (10) and you'll include irrelevant information that confuses the LLM. I tested several values and found 3 provided the best balance.

The LLM

from groq import Groq

groq_client = Groq(api_key=os.getenv("GROQ_API_KEY"))

context = "\n\n".join(docs)

response = groq_client.chat.completions.create(
    model="meta-llama/llama-4-scout-17b-16e-instruct",
    messages=[
        {"role": "system", "content": "Answer only using provided context"},
        {"role": "user", "content": f"Context:\n{context}\n\nQuestion:\n{question}"}
    ],
    temperature=0
)

answer = response.choices[0].message.content

This is where RAG shines. We give the LLM:

A system instruction: "Only use the provided context" (reducing hallucinations)
The retrieved context
The user's question

The temperature=0 setting makes the model deterministic; the same input always produces the same output. This is crucial for reliability.

Why Groq? Speed. Seriously, it's fast. What takes OpenAI 3-4 seconds, Groq does in under a second. For user facing applications, this matters.

Source Attribution

sources = list({meta["source"] for meta in metadatas})
return answer, sources

We return the source PDFs used to generate the answer. This serves two purposes:

Users can verify the information
It builds trust in the system

Step 3: The FastAPI

from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware

app = FastAPI(title="Travel and Tourism")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)

@app.post("/ask", response_model=QuestionResponse)
def ask_question(request: QuestionRequest):
    try:
        answer, sources = ask(request.question)
        return QuestionResponse(answer=answer, sources=sources)
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

FastAPI gives us:

Automatic API documentation (visit /docs to see it)
Request validation via Pydantic models
Type hints that actually work
Easy async support (though we're not using it here)

The CORS middleware allows frontend applications from any origin to call our API. In production, you'd restrict this to your specific domain.

The Results: Does It Actually Work?

Let's test it:

Query: "What wildlife can I see at Maasai Mara?"

Response:

{
  "answer": "At Maasai Mara, you can see the Big Five: lions, elephants, leopards, rhinos, and buffalo. The park is famous for the annual wildebeest migration between July and October, where millions of wildebeest, zebras, and gazelles cross the Mara River. You can also spot cheetahs, hyenas, giraffes, hippos, crocodiles, and over 450 bird species.",
  "sources": ["Maasai_Mara.pdf"]
}

Beautiful. The answer is specific, accurate, and sourced.

The Bigger Picture: Why RAG Matters

RAG represents a fundamental shift in how we build AI applications. Instead of:

Fine-tuning models (expensive, time-consuming, static)
Relying on model knowledge (outdated, prone to hallucination)

We can:

Use any LLM as a reasoning engine
Plug in our own knowledge dynamically
Update information without retraining
Provide source attribution for trust

This pattern works for:

Customer support bots trained on company documentation
Legal research tools searching case law
Medical assistants referencing clinical guidelines
Internal knowledge bases for enterprises

Conclusion

Building this RAG system taught me that the real challenge isn't the AI it's the data pipeline, retrieval strategy, and user experience. The LLM is just the final step that ties everything together.

RAG won't solve all AI problems. But for question-answering over documents, it's incredibly powerful. And as embedding models improve, vector databases get faster, and LLMs become more capable, RAG systems will only get better.

Code Snippets

All code in this article is available in my GitHub repository [https://github.com/maureenmuthoni-hue/Travel_and_Tourism_RAG_System]. Feel free to star, fork, and adapt it for your own projects!

DEV Community