🔗 Live Demo:
https://pdf-chat-rag-fx5nczbrwczzpou6qyczmj.streamlit.app/
📦 GitHub Repo:
https://github.com/aliabdm/pdf-chat-rag
🤔 The Idea
Ever wished you could talk to your documents instead of endlessly scrolling through pages?
That’s exactly what I built using Retrieval-Augmented Generation (RAG) and modern GenAI tools.
Upload a PDF → ask questions → get accurate, context-aware answers in seconds.
❌ The Problem
We’ve all been there:
50-page research papers
Long contracts
Dense technical docs
CVs in recruitment workflows
Ctrl + F isn’t enough when you need:
Summaries
Cross-section answers
Simple explanations
Context-aware responses
✅ The Solution: PDF Chat with RAG
I built a web app that lets you:
Upload any PDF
Ask questions in natural language
Get answers grounded only in your document
👉 Try it live:
https://pdf-chat-rag-fx5nczbrwczzpou6qyczmj.streamlit.app/
🧱 Tech Stack (Why Each Tool Matters)
🧩 LangChain — The RAG Backbone
LangChain makes RAG production-ready by handling:
Document chunking
Embeddings
Retrieval + generation orchestration
Pythonfrom langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
chunks = text_splitter.split_text(text)
⚡ Groq — Lightning-Fast LLM Inference
Groq uses custom LPU hardware and delivers:
~2s response time
Models like Llama 3.3 70B
Generous free tier
Pythonfrom langchain_groq import ChatGroq
llm = ChatGroq(
model_name="llama-3.3-70b-versatile",
temperature=0,
groq_api_key=api_key
)
🔍 FAISS — Vector Similarity Search
When your PDF becomes 100+ chunks, FAISS finds the most relevant ones fast.
Pythonfrom langchain_community.vectorstores import FAISS
vector_store = FAISS.from_texts(chunks, embeddings)
🎨 Streamlit — UI in Minutes
Why Streamlit?
No frontend boilerplate
Built-in chat + file upload
Free deployment
Pythonimport streamlit as st
uploaded_file = st.file_uploader("Upload PDF", type=["pdf"])
if question := st.chat_input("Ask a question"):
pass
🧠 HuggingFace Embeddings
We use all-MiniLM-L6-v2:
Fast
High quality
Runs locally
No API cost
Pythonfrom langchain_community.embeddings import HuggingFaceEmbeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
🔄 How RAG Works (Simple Breakdown)
Phase 1 — Document Processing
Upload PDF
Extract text
Split into chunks
Generate embeddings
Store in FAISS
Phase 2 — Question Answering
Embed the question
Retrieve top 3 relevant chunks
Build context
Send to LLM
Return grounded answer
Pythondocs = vector_store.similarity_search(question, k=3)
context = "\n\n".join([doc.page_content for doc in docs])
prompt = f"""
Context:
{context}
Question:
{question}
Answer ONLY based on the context above.
"""
answer = llm.invoke(prompt)
🧪 Core RAG Logic (That’s It)
Pythondef answer_question(question, vector_store, llm):
docs = vector_store.similarity_search(question, k=3)
context = "\n\n".join([doc.page_content for doc in docs])
prompt = ChatPromptTemplate.from_template("""
Context: {context}
Question: {question}
Provide a detailed answer based on the context.
""")
chain = prompt | llm | StrOutputParser()
return chain.invoke({"context": context, "question": question})
🧠 Key Design Decisions
Chunk overlap: avoids cutting context
Temperature = 0: deterministic answers
k = 3 chunks: best speed/accuracy balance
⚠️ Challenges & Fixes
PDF Text Extraction
Some PDFs return broken text.
✔️ Added validation + clear error messages.
Context Window Limits
Large docs exceeded limits.
✔️ Limited chunk size + retrieval count.
Answer Quality
Early answers were vague.
✔️ Strong prompt constraints.
📊 Performance
Metric,Value
PDF size,50 pages
Processing time,~15s
Response time,~2s
Chunks,87
Accuracy,⭐ 8.5 / 10
🚀 What’s Next?
- Multi-PDF support
- Conversation memory
- Export chat history
- Word / TXT support
🧑💻 Run It Locally
Bashgit clone https://github.com/aliabdm/pdf-chat-rag
pip install -r requirements.txt
streamlit run app.py
Deploy on Streamlit Cloud in one click 🚀
🧠 Lessons Learned
- RAG is simpler than it looks
- Speed > model size
- Prompt engineering matters
- Start simple, iterate fast
🔚 Final Thoughts
Modern AI is about orchestration, not reinventing tools.
If this helped you, consider giving the repo a ⭐
🔗 Connect With Me
LinkedIn: https://www.linkedin.com/in/mohammad-ali-abdul-wahed-1533b9171/
GitHub: https://github.com/aliabdm
Dev.to: https://dev.to/maliano63717738
Happy coding 🚀
Top comments (0)