I've seen a lot of RAG tutorials that explain the concept beautifully — then leave you staring at a 10-line pseudocode example. Not useful. This one is different. We're building an actual working AI chatbot using LangChain and ChromaDB, the kind where you drop in a real PDF and start asking questions immediately. Every file. Every line. Explained. If you've been googling "RAG LangChain tutorial" or "ChromaDB vector database" and landing on half-finished Medium posts — stick around. This is the guide I wish existed when I first started.
💬 "Most teams that think they need fine-tuning actually just need RAG. Fine-tuning costs thousands. RAG costs an API call."
RAG Tutorial for Beginners — What is Retrieval Augmented Generation?
Here's the thing nobody tells you when you first start building with LLMs — they're confidently wrong. GPT-4, Claude, Gemini, it doesn't matter. Ask any of them about your internal documentation, your last product release, or something that happened three months ago and you'll get either a hallucinated answer or a polite "I don't have access to that." Neither works in production. This RAG tutorial is how you fix that properly.
The core idea behind Retrieval Augmented Generation is honestly pretty elegant. Rather than trying to shove all your knowledge into model weights (expensive, slow, inflexible), you just retrieve the relevant pieces at query time and hand them to the model as context. Three steps, and you're done:
- Break your documents into chunks, embed them, store in a ChromaDB vector database
- When someone asks something, run a semantic similarity search to pull the relevant chunks
- Feed those chunks to your LLM as context — it answers from your actual data, not from training memory
RAG Pipeline:
📄 Your Docs → ✂️ Chunking → 🧮 Embeddings → 🗄️ ChromaDB → 🤖 LLM Answer
How to Build AI Chatbot using LangChain — Architecture
Before writing code, here are the 4 components you need to understand:
| Component | What it does |
|---|---|
| 🔀 Document Loader | Ingests PDFs, Word docs, web pages. LangChain has 100+ loaders built-in |
| ✂️ Text Splitter | Breaks docs into overlapping chunks — overlap stops answers being cut mid-thought |
| 🧮 Embedding Model | Converts text to vectors. We use OpenAI text-embedding-3-small — cheap and good |
| 🗄️ Vector Store | ChromaDB locally, Pinecone for production. Switching takes 3 lines |
Environment Setup — LangChain RAG Python Installation
Python 3.10 or higher. Use a virtual environment — mixing global packages is how you end up spending an afternoon debugging import errors that have nothing to do with your actual RAG code.
# Create virtual environment
python -m venv rag-env
source rag-env/bin/activate # Windows: rag-env\Scripts\activate
# Install all dependencies for RAG LangChain ChromaDB
pip install langchain langchain-community langchain-openai
pip install chromadb openai tiktoken pypdf
pip install python-dotenv streamlit
RAG Document Loader — Load and Split Documents with LangChain
# document_processor.py
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
def load_and_split(file_path: str) -> list:
loader = PyPDFLoader(file_path)
documents = loader.load()
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
separators=["\n\n", "\n", " ", ""]
)
chunks = splitter.split_documents(documents)
print(f"✅ {len(documents)} pages → {len(chunks)} chunks")
return chunks
The chunk_size and chunk_overlap parameters here are things I've tweaked across multiple projects. 1000 characters per chunk works well for most documents. The 200-character overlap? That's the bit most tutorials skip. Without it, you'll occasionally get answers that feel cut off because the relevant sentence happened to fall right at a chunk boundary.
ChromaDB Vector Database Explained — Build Your Vector Store
ChromaDB is what makes this whole setup so easy to get running locally. It's just a folder on your disk — no Docker, no cloud account, no sign-up. You embed your chunks, it saves them, and later you query by meaning instead of keyword matching. That last part is what makes vector search feel almost magical the first time you try it.
# vectorstore.py
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
def create_vectorstore(chunks, persist_dir="./chroma_db"):
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(
documents=chunks,
embedding=embeddings,
persist_directory=persist_dir
)
return vectorstore
def load_vectorstore(persist_dir="./chroma_db"):
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
return Chroma(persist_directory=persist_dir, embedding_function=embeddings)
Build AI Chatbot with RAG — The Core Chain Logic
# rag_chain.py
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
PROMPT_TEMPLATE = """Use ONLY the context below to answer.
If not in context, say "I don't have enough information."
Context: {context}
Question: {question}
Answer:"""
def build_rag_chain(vectorstore):
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=vectorstore.as_retriever(search_kwargs={"k": 4}),
chain_type_kwargs={"prompt": PromptTemplate.from_template(PROMPT_TEMPLATE)},
return_source_documents=True
)
return chain
⚠️ Keep
temperature=0. I know it's tempting to bump it up for more "creative" responses — don't. The whole point of RAG is factual grounding. A temperature above 0 lets the model start improvising around your retrieved context, which brings hallucinations right back in through the side door.
Streamlit Chat Interface — Run Your RAG AI Chatbot
# app.py
import streamlit as st
from vectorstore import load_vectorstore
from rag_chain import build_rag_chain
st.title("📄 RAG AI Chatbot — Document Aware")
@st.cache_resource
def load_chain():
return build_rag_chain(load_vectorstore("./chroma_db"))
chain = load_chain()
if "messages" not in st.session_state:
st.session_state.messages = []
for msg in st.session_state.messages:
with st.chat_message(msg["role"]):
st.write(msg["content"])
if prompt := st.chat_input("Ask anything about your document..."):
result = chain({"query": prompt})
answer = result["result"]
with st.chat_message("assistant"):
st.write(answer)
st.session_state.messages.append({"role": "assistant", "content": answer})
Run it:
streamlit run app.py
ChromaDB vs Pinecone vs Weaviate — Which Vector Database to Choose?
Quick honest answer: start with ChromaDB, switch to Pinecone when you ship. I've used all three — ChromaDB is unbeatable for local development because there's literally nothing to set up. Pinecone is where you go when you have real users and need reliability and scale. Weaviate is worth looking at if you need hybrid search.
| Feature | ChromaDB | Pinecone | Weaviate |
|---|---|---|---|
| Setup | pip install · local | Managed cloud | Docker / Cloud |
| Cost | Free | Paid after free tier | Open source |
| Scale | Millions of docs | Billions of vectors | Enterprise scale |
| Best For | Dev · Prototyping | Production SaaS | Hybrid search |
Retrieval Augmented Generation Example — RAG vs Naive LLM
| ✗ Naive LLM | ✓ RAG System | |
|---|---|---|
| Data source | Training data only | Your actual documents |
| Hallucinations | Confident and frequent | Grounded in retrieved context |
| Knowledge | Frozen at cutoff | Update docs anytime |
| Attribution | None | Cites exact source + page |
GitHub Repository
All code from this tutorial is on GitHub:
👉 ragavi-s/rag-langchain-chromadb
rag-langchain-chromadb/
├── docs/ # Put your PDFs here
├── chroma_db/ # Auto-generated vector store
├── document_processor.py # Step 4 — Load & split
├── vectorstore.py # Step 5 — ChromaDB logic
├── rag_chain.py # Step 6 — RAG chain
├── app.py # Step 7 — Streamlit UI
├── requirements.txt
└── README.md
Recommended Tools for Production RAG
- 🌲 Pinecone — What I'd use when this goes to production. Fully managed, scales without you touching anything.
- 🔷 Weaviate — Open-source vector DB with hybrid search. Self-hostable, solid docs.
- 🦜 LangChain Cloud — LangSmith is genuinely useful once you start debugging why your retrieval is returning wrong chunks.
Before You Close This Tab
What is RAG — Stop thinking of it as a fancy feature. It's just: retrieve relevant chunks, hand them to the LLM, get an answer grounded in your data.
Core Stack — LangChain (orchestration) + ChromaDB (vectors) + OpenAI (embeddings + generation). Each piece is replaceable.
Dev → Prod — Build with ChromaDB locally. Swap to Pinecone when you ship. Seriously, it's 3 lines.
Don't Skip This — temperature=0, chunk_overlap=200, k=4 chunks retrieved. These defaults took me a while to land on — trust them until you have a reason to change them.
What's Next — Once this works, look into re-ranking and hybrid search (BM25 + vector) if your documents have lots of proper nouns or codes that semantic search struggles with.
Independent tech writer based in India. I write about AI, Python, and developer tools — mostly things I've actually built and broken myself before writing about them. Founder of Tech Journalism. No sponsored opinions, no hype, just code that works.
If this saved you time — drop a GitHub link in the comments. Always curious to see how people extend it.
Top comments (0)