Ragavi S

Posted on Mar 20 • Originally published at mrs-journalism.blogspot.com on Mar 20

RAG Tutorial 2026: Build AI Chatbot with LangChain & ChromaDB (Step-by-Step Guide)

#ai #python #machinelearning #langchain

I've seen a lot of RAG tutorials that explain the concept beautifully — then leave you staring at a 10-line pseudocode example. Not useful. This one is different. We're building an actual working AI chatbot using LangChain and ChromaDB, the kind where you drop in a real PDF and start asking questions immediately. Every file. Every line. Explained. If you've been googling "RAG LangChain tutorial" or "ChromaDB vector database" and landing on half-finished Medium posts — stick around. This is the guide I wish existed when I first started.

💬 "Most teams that think they need fine-tuning actually just need RAG. Fine-tuning costs thousands. RAG costs an API call."

RAG Tutorial for Beginners — What is Retrieval Augmented Generation?

Here's the thing nobody tells you when you first start building with LLMs — they're confidently wrong. GPT-4, Claude, Gemini, it doesn't matter. Ask any of them about your internal documentation, your last product release, or something that happened three months ago and you'll get either a hallucinated answer or a polite "I don't have access to that." Neither works in production. This RAG tutorial is how you fix that properly.

The core idea behind Retrieval Augmented Generation is honestly pretty elegant. Rather than trying to shove all your knowledge into model weights (expensive, slow, inflexible), you just retrieve the relevant pieces at query time and hand them to the model as context. Three steps, and you're done:

Break your documents into chunks, embed them, store in a ChromaDB vector database
When someone asks something, run a semantic similarity search to pull the relevant chunks
Feed those chunks to your LLM as context — it answers from your actual data, not from training memory

RAG Pipeline:

📄 Your Docs → ✂️ Chunking → 🧮 Embeddings → 🗄️ ChromaDB → 🤖 LLM Answer

How to Build AI Chatbot using LangChain — Architecture

Before writing code, here are the 4 components you need to understand:

Component	What it does
🔀 Document Loader	Ingests PDFs, Word docs, web pages. LangChain has 100+ loaders built-in
✂️ Text Splitter	Breaks docs into overlapping chunks — overlap stops answers being cut mid-thought
🧮 Embedding Model	Converts text to vectors. We use OpenAI text-embedding-3-small — cheap and good
🗄️ Vector Store	ChromaDB locally, Pinecone for production. Switching takes 3 lines

Environment Setup — LangChain RAG Python Installation

Python 3.10 or higher. Use a virtual environment — mixing global packages is how you end up spending an afternoon debugging import errors that have nothing to do with your actual RAG code.

# Create virtual environment
python -m venv rag-env
source rag-env/bin/activate  # Windows: rag-env\Scripts\activate

# Install all dependencies for RAG LangChain ChromaDB
pip install langchain langchain-community langchain-openai
pip install chromadb openai tiktoken pypdf
pip install python-dotenv streamlit

RAG Document Loader — Load and Split Documents with LangChain

# document_processor.py
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

def load_and_split(file_path: str) -> list:
    loader = PyPDFLoader(file_path)
    documents = loader.load()

    splitter = RecursiveCharacterTextSplitter(
        chunk_size=1000,
        chunk_overlap=200,
        separators=["\n\n", "\n", " ", ""]
    )
    chunks = splitter.split_documents(documents)
    print(f"✅ {len(documents)} pages → {len(chunks)} chunks")
    return chunks

The chunk_size and chunk_overlap parameters here are things I've tweaked across multiple projects. 1000 characters per chunk works well for most documents. The 200-character overlap? That's the bit most tutorials skip. Without it, you'll occasionally get answers that feel cut off because the relevant sentence happened to fall right at a chunk boundary.

ChromaDB Vector Database Explained — Build Your Vector Store

ChromaDB is what makes this whole setup so easy to get running locally. It's just a folder on your disk — no Docker, no cloud account, no sign-up. You embed your chunks, it saves them, and later you query by meaning instead of keyword matching. That last part is what makes vector search feel almost magical the first time you try it.

# vectorstore.py
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

def create_vectorstore(chunks, persist_dir="./chroma_db"):
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    vectorstore = Chroma.from_documents(
        documents=chunks,
        embedding=embeddings,
        persist_directory=persist_dir
    )
    return vectorstore

def load_vectorstore(persist_dir="./chroma_db"):
    embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
    return Chroma(persist_directory=persist_dir, embedding_function=embeddings)

Build AI Chatbot with RAG — The Core Chain Logic

# rag_chain.py
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

PROMPT_TEMPLATE = """Use ONLY the context below to answer.
If not in context, say "I don't have enough information."

Context: {context}
Question: {question}
Answer:"""

def build_rag_chain(vectorstore):
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
    chain = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
        chain_type_kwargs={"prompt": PromptTemplate.from_template(PROMPT_TEMPLATE)},
        return_source_documents=True
    )
    return chain

⚠️ Keep temperature=0. I know it's tempting to bump it up for more "creative" responses — don't. The whole point of RAG is factual grounding. A temperature above 0 lets the model start improvising around your retrieved context, which brings hallucinations right back in through the side door.

Streamlit Chat Interface — Run Your RAG AI Chatbot

# app.py
import streamlit as st
from vectorstore import load_vectorstore
from rag_chain import build_rag_chain

st.title("📄 RAG AI Chatbot — Document Aware")

@st.cache_resource
def load_chain():
    return build_rag_chain(load_vectorstore("./chroma_db"))

chain = load_chain()

if "messages" not in st.session_state:
    st.session_state.messages = []

for msg in st.session_state.messages:
    with st.chat_message(msg["role"]):
        st.write(msg["content"])

if prompt := st.chat_input("Ask anything about your document..."):
    result = chain({"query": prompt})
    answer = result["result"]
    with st.chat_message("assistant"):
        st.write(answer)
    st.session_state.messages.append({"role": "assistant", "content": answer})

Run it:

streamlit run app.py

ChromaDB vs Pinecone vs Weaviate — Which Vector Database to Choose?

Quick honest answer: start with ChromaDB, switch to Pinecone when you ship. I've used all three — ChromaDB is unbeatable for local development because there's literally nothing to set up. Pinecone is where you go when you have real users and need reliability and scale. Weaviate is worth looking at if you need hybrid search.

Feature	ChromaDB	Pinecone	Weaviate
Setup	pip install · local	Managed cloud	Docker / Cloud
Cost	Free	Paid after free tier	Open source
Scale	Millions of docs	Billions of vectors	Enterprise scale
Best For	Dev · Prototyping	Production SaaS	Hybrid search

Retrieval Augmented Generation Example — RAG vs Naive LLM

	✗ Naive LLM	✓ RAG System
Data source	Training data only	Your actual documents
Hallucinations	Confident and frequent	Grounded in retrieved context
Knowledge	Frozen at cutoff	Update docs anytime
Attribution	None	Cites exact source + page

GitHub Repository

All code from this tutorial is on GitHub:

👉 ragavi-s/rag-langchain-chromadb

rag-langchain-chromadb/
├── docs/                   # Put your PDFs here
├── chroma_db/              # Auto-generated vector store
├── document_processor.py   # Step 4 — Load & split
├── vectorstore.py          # Step 5 — ChromaDB logic
├── rag_chain.py            # Step 6 — RAG chain
├── app.py                  # Step 7 — Streamlit UI
├── requirements.txt
└── README.md

Recommended Tools for Production RAG

🌲 Pinecone — What I'd use when this goes to production. Fully managed, scales without you touching anything.
🔷 Weaviate — Open-source vector DB with hybrid search. Self-hostable, solid docs.
🦜 LangChain Cloud — LangSmith is genuinely useful once you start debugging why your retrieval is returning wrong chunks.

Before You Close This Tab

What is RAG — Stop thinking of it as a fancy feature. It's just: retrieve relevant chunks, hand them to the LLM, get an answer grounded in your data.

Core Stack — LangChain (orchestration) + ChromaDB (vectors) + OpenAI (embeddings + generation). Each piece is replaceable.

Dev → Prod — Build with ChromaDB locally. Swap to Pinecone when you ship. Seriously, it's 3 lines.

Don't Skip This — temperature=0, chunk_overlap=200, k=4 chunks retrieved. These defaults took me a while to land on — trust them until you have a reason to change them.

What's Next — Once this works, look into re-ranking and hybrid search (BM25 + vector) if your documents have lots of proper nouns or codes that semantic search struggles with.

Independent tech writer based in India. I write about AI, Python, and developer tools — mostly things I've actually built and broken myself before writing about them. Founder of Tech Journalism. No sponsored opinions, no hype, just code that works.

If this saved you time — drop a GitHub link in the comments. Always curious to see how people extend it.

DEV Community