DEV Community: Harith Y

New Project

Harith Y — Thu, 24 Jul 2025 17:01:02 +0000

Harith Y

Jul 24 '25

RAG Chatbot - MoviesGPT

#rag #ai #chatbot #llm

2 min read

RAG Chatbot - MoviesGPT

Harith Y — Thu, 24 Jul 2025 17:00:51 +0000

What is RAG?

Retrieval-Augmented Generation or RAG is when you change the output of a Large Language Model (LLM) by providing the model more context alongside a user’s input. That way, the model can use its ability to generate text along with extra context to provide accurate answers to users’ questions

Why is RAG useful?

Cost Effective
Models have cut-off dates, after which knowledge isn’t updated.
Covers up for information that does not exist

What is Vector Embedding?

A popular technique to represent information in a format that algorithms, especially deep learning models, can easily process. This ‘information’ can be text, pictures, video or audio.

Step-by-step workflow of MoviesGPT

Data Collection (Wikipedia Scraping)

The project uses Puppeteer (via LangChain) to scrape Wikipedia pages containing lists of movies in various Indian languages for the year 2025.
Each Wikipedia page’s content is fetched and cleaned of HTML tags.

Text Chunking

The scraped content is split into manageable chunks using a text splitter (RecursiveCharacterTextSplitter).
This ensures each chunk is of optimal size for embedding and storage.

Embedding Generation

Each text chunk is sent to NVIDIA’s embedding API (nvidia/nv-embedqa-e5-v5 model) to generate a high-dimensional vector representation.
These embeddings capture the semantic meaning of each chunk.

Database Storage (AstraDB)

The vector embeddings and their corresponding text chunks are stored in AstraDB, a vector database.
The database is set up to support efficient similarity search using the chosen metric (e.g., dot product).

User Interaction (Frontend)

Users interact with a chat interface built with Next.js.
When a user submits a question, it is sent to the backend API.

Query Embedding & Context Retrieval

The backend generates an embedding for the user’s question using the same NVIDIA model.
It then queries AstraDB for the most similar text chunks (context) based on vector similarity to the question embedding.

Prompt Construction

The retrieved context is formatted and combined with the user’s question to create a system prompt.
This prompt instructs the AI to use the provided context to answer the question, but to fall back on its own knowledge if needed.

AI Response Generation

The prompt and chat history are sent to OpenRouter’s chat API (using a model like deepseek/deepseek-chat).
The AI generates a streaming response, which is sent back to the frontend in real time.

User Receives Answer

The user sees the AI’s answer in the chat interface, formatted in markdown for readability.

Workflow Diagram (Textual)

Wikipedia Pages
↓
[Scraping & Cleaning]
↓
[Text Chunking]
↓
[Embedding Generation]
↓
[AstraDB Storage]
↓
(User asks a question)
↓
[Question Embedding]
↓
[Vector Search in AstraDB]
↓
[Relevant Context Retrieved]
↓
[Prompt Construction]
↓
[OpenRouter AI Chat Completion]
↓
[Streaming Response to User]

Summary

Backend: Handles scraping, embedding, storage, and retrieval.
Frontend: Provides a chat interface for users using NextJS via TypeScript.
AI Models: NVIDIA for embeddings, OpenRouter for chat.
Database: AstraDB for vector search and storage.

This workflow ensures that MoviesGPT can answer movie-related questions with up-to-date, contextually relevant information, providing a seamless and intelligent user experience.

Links

GitHub: GitHub
Demo: Demo

New project

Harith Y — Thu, 24 Jul 2025 16:51:47 +0000

Harith Y

Jul 24 '25

I built a Supercharged PDFs-ChatBot with LangChain and Streamlit, and you can too!

#langchain #streamlit #ai #chatbot

4 min read

I built a Supercharged PDFs-ChatBot with LangChain and Streamlit, and you can too!

Harith Y — Thu, 24 Jul 2025 16:49:45 +0000

A step-by-step journey into building a flexible, multi-provider RAG application that lets you talk to your documents.

We’ve all been there: staring at a 100-page PDF, knowing the answer to our question is buried somewhere inside. Skimming through dense academic papers, legal documents, or technical manuals is a tedious process. What if you could just… ask the document a question and get a straight answer instead of doing Ctrl+F multiple times?

That’s exactly what I set out to build: an interactive chatbot that ingests any PDF (not 1 but many hehe) and allows you to have a natural conversation with it.

In this post, I’ll walk you through how I built this PDFs-ChatBot using the power of Streamlit for the UI and LangChain for the AI orchestration. More importantly, I’ll show you how I designed it to be incredibly flexible, allowing users to switch between different AI providers like Google Gemini, Cohere, and NVIDIA on the fly.

You can find the full source code on GitHub.

Also, try it out Here!

The Core Idea: Retrieval-Augmented Generation (RAG)

The magic behind this application is a technique called Retrieval-Augmented Generation (RAG). Instead of just relying on a Large Language Model’s (LLM) general knowledge, we give it access to specific information from our documents.

Here’s the workflow in a nutshell:

Indexing (The “Library”):

First, we process the PDFs and create a searchable knowledge library.

Load & Split: The text from the PDFs is extracted and broken down into smaller, manageable chunks.
Embed: Each chunk is converted into a numerical representation called an “embedding” using a model (like Google’s embedding-001 or NVIDIA’s nv-embedqa-e5-v5, or any other model. The Project supports dynamic addition of LLMs). These embeddings capture the semantic meaning of the text.
Store: These embeddings are stored in a specialized database called a vector store (I’m using FAISS), which is incredibly fast at finding similar vectors.

Retrieval & Generation (The “Conversation”):

When you ask a question:

Your question is also converted into an embedding (Make sure to use the same Embedding model used for processing the PDFs for obtaining optimal results).
The vector store finds the text chunks from the PDF that are most semantically similar to your question.
Finally, your question and these relevant chunks are passed to an LLM (like Gemini or Cohere) with a prompt like: “Based on the following context, answer this question: [Your Question] [Relevant Chunks]”.
The LLM generates a coherent, context-aware answer. This approach ensures the answers are grounded in the document’s content, dramatically reducing hallucinations and providing accurate, relevant information.

Architecting for Flexibility: The Factory Pattern

One of my main goals was to avoid being locked into a single AI provider. I wanted the freedom to experiment with different models. To achieve this, I used a simple but powerful software design pattern: the Factory Pattern.

Instead of cluttering my main app.py with model initialisation logic, I created separate files: llm_providers.py and embedding_providers.py.

# llm_providers.py

import os
from langchain_google_genai import ChatGoogleGenerativeAI
from langchain.llms import Cohere, HuggingFaceHub

# The “Factory” dictionary
llm_providers = {
# Helper function for Initializing and returning the Google Gemini LLM.
‘gemini’: _get_gemini_llm,
# Helper function for Initializing and returning the Cohere LLM.
‘cohere’: _get_cohere_llm
}

# The main factory function
def get_llm(model_choice: str):
“””
Factory function to select and initialize the chosen LLM.
“””
return llm_providers[model_choice]()

This design keeps my code clean and modular. Adding a new LLM provider is as simple as adding a new helper function and one line to the dictionary.

Building the UI with Streamlit

Streamlit is a game-changer for building interactive data and AI applications in Python. It allowed me to create a polished user interface with minimal code.

The sidebar is the control center of the application.

A key part of making a Streamlit app feel seamless is managing its state. I used st.session_state extensively to store the conversation history, the user’s model choices, and the main conversation chain object, ensuring they persist between user interactions.

For the chat bubbles, I used a little custom CSS and HTML templating to create a familiar and visually appealing chat experience.

Final Thoughts and Next Steps

Building this PDF-ChatBot was an incredibly rewarding experience. It clarified the RAG pipeline and highlighted the importance of clean, modular code when working with rapidly evolving AI technologies. Of course, I had the trouble of going through multiple Documentations, I’m not an expert either!

The final application is a powerful tool that not only works well but is also easy to extend. Here are a few ideas for future improvements (Maybe any of you could create a PR, I would be very much happy to accept it :D):

Displaying Sources: Highlighting which parts of the PDF were used to generate an answer.
More Vector Stores: Adding support for other vector databases like Chroma or Pinecone.
Deployment: Packaging the application in Docker and deploying it to the cloud. If you’re looking to get your hands dirty with building practical LLM applications, I highly encourage you to try building a project like this. It’s a fantastic way to learn the fundamentals of the modern AI stack along with scouring the internet, looking through docs and stack overflow!

Thanks for reading! Feel free to check out the project on GitHub and build your own.