DEV Community

Sumit Kumar
Sumit Kumar

Posted on

I Built a RAG-Based PDF Reader Web App Using Node.js, LangChain, Ollama, and Hugging Face

I Built a RAG-Based PDF Reader Web App Using Node.js, LangChain, Ollama, and Hugging Face

Turn any PDF into an interactive AI knowledge base using Retrieval-Augmented Generation (RAG).

If you've ever wanted to upload a PDF and chat with it like ChatGPT, this project does exactly that.

I built a RAG-based PDF Reader Web App that allows users to:

  • 📄 Upload a PDF file
  • 🔍 Extract and process the content
  • ✂️ Split the content into chunks
  • 🧠 Generate embeddings locally
  • 💾 Store them in a vector store
  • 🎯 Retrieve relevant sections based on user questions
  • 🤖 Generate grounded answers using a local LLM

This project combines traditional web development with modern AI application design, making it a great hands-on example of how RAG works in practice.


💡 Live Idea of the Project

The goal of this app is simple:

Upload a PDF and ask questions about its content in natural language.

Instead of manually searching through long reports, research papers, notes, books, or documentation, users can just ask:

  • "What is the main topic of this document?"
  • "Summarize chapter 2"
  • "What are the important conclusions?"
  • "What technologies are mentioned in the PDF?"

The application finds the most relevant content from the uploaded PDF and uses that context to answer the question.


📦 GitHub Repository

You can find the full source code here:

🔗 GitHub: https://github.com/SumitK25/rag-pdf-webapp

If you like the project, feel free to ⭐ star the repository, fork it, or contribute improvements!


🧩 What is RAG?

RAG stands for Retrieval-Augmented Generation.

It is a technique that improves LLM responses by combining two key steps:

  1. Retrieval – Search for the most relevant information from your own data
  2. Generation – Pass that retrieved information to a language model to generate an answer

Instead of relying only on what the model already knows, RAG helps the model answer based on specific external knowledge, such as:

  • 📄 PDFs
  • 📝 Documents
  • 📒 Notes
  • 🏢 Internal company data
  • 📚 Research papers
  • 📖 Support manuals

Why RAG is Useful

Large language models are powerful, but they have some limitations:

  • ❌ They may hallucinate
  • ❌ They may not know your private data
  • ❌ They may not know newly added content
  • ❌ They may answer confidently even when wrong

RAG helps solve this by providing the model with relevant context at query time.

In this project, that context comes from the uploaded PDF.


✨ Project Features

Here are the main features of this application:

  • ✅ Upload PDF files from the browser
  • ✅ Extract text content from PDF documents
  • ✅ Split the document into manageable overlapping chunks
  • ✅ Convert chunks into embeddings using a local embedding model
  • ✅ Store embeddings in memory for semantic retrieval
  • ✅ Ask questions about the uploaded PDF
  • ✅ Retrieve the top relevant chunks
  • ✅ Generate answers using Ollama
  • ✅ Keep short chat history for conversational continuity
  • ✅ Clear chat and reset the app when needed
  • ✅ Modern and clean chat-style web interface

🛠️ Tech Stack

This project uses the following technologies.

Frontend

Technology Purpose
HTML Page structure
CSS Styling and layout
Vanilla JavaScript Interactivity and API calls

Backend

Technology Purpose
Node.js Runtime environment
Express.js Web server framework
Multer File upload handling
CORS Cross-origin support
dotenv Environment variables

AI / RAG Stack

Technology Purpose
LangChain Orchestration framework
PDFLoader PDF text extraction
RecursiveCharacterTextSplitter Document chunking
MemoryVectorStore In-memory vector database
HuggingFaceTransformersEmbeddings Local embedding generation
ChatOllama Local LLM integration

Local Model Infrastructure

Model Role
Ollama Local LLM server
llama3.2 Answer generation
Xenova/all-MiniLM-L6-v2 Embedding generation

🤔 Why I Built This Project

I wanted to build something practical that demonstrates how AI can work with user-provided documents.

A lot of people hear about RAG in theory, but the best way to understand it is to build a complete project.

This project helped me explore:

  • 📄 How to process PDF files
  • 🔎 How semantic search works
  • 🧠 How embeddings are generated
  • 📦 How vector retrieval powers document Q&A
  • 🤖 How to integrate local LLMs into a web app
  • 🔗 How to connect a frontend chat UI to an AI backend

It's also a useful real-world application because PDFs are everywhere.


🎯 Use Cases

This kind of app can be useful for many scenarios.

🎓 For Students

  • Ask questions from notes or textbooks
  • Summarize chapters
  • Clarify concepts from study material

🔬 For Researchers

  • Query research papers quickly
  • Identify key findings
  • Summarize long documents

💻 For Developers

  • Search technical documentation
  • Ask questions from API docs
  • Understand large manuals

🏢 For Businesses

  • Interact with internal PDFs
  • Extract insights from reports
  • Build internal Q&A assistants

⚙️ How the Application Works

Let's walk through the full architecture and workflow of the application.

🔄 High-Level Flow

1.  User uploads a PDF
2.  Backend receives the file
3.  PDF text is extracted using LangChain
4.  Extracted text is split into chunks
5.  Chunks are converted into embeddings
6.  Embeddings are stored in a vector store
7.  User asks a question
8.  App retrieves the most relevant chunks
9.  Retrieved chunks are used as context in the LLM prompt
10. Ollama generates the answer
11. Answer is shown in the chat UI
Enter fullscreen mode Exit fullscreen mode

🖥️ Frontend

The frontend is designed to be simple, clean, and interactive.

It contains:

  • 📌 A heading
  • 📎 A PDF upload section
  • ✅ Upload status messages
  • 💬 A chat interface
  • ⌨️ An input box for questions
  • 🔘 A button to ask questions
  • 🗑️ A clear chat button

📝 Frontend HTML Structure

The UI is built using HTML and styled with CSS.

1. Upload Section

This allows the user to select and upload a PDF.

<div class="upload-section">
  <input type="file" id="pdfFile" accept=".pdf"/>
  <button class="btn-upload" onclick="uploadPDF()">
    Upload PDF
  </button>
</div>
Enter fullscreen mode Exit fullscreen mode

2. Upload Status

This displays upload progress or error messages.

<div id="uploadStatus"></div>
Enter fullscreen mode Exit fullscreen mode

3. Chat Box

This is the main interface for user interaction.

<div class="chat-box">
  <div class="chat-header">
    💬 Chat with your PDF
    <button class="btn-clear" onclick="clearChat()">
      🗑 Clear
    </button>
  </div>
  <div class="messages" id="messages">
    <div class="message ai">
      👋 Upload a PDF and ask me anything about it!
    </div>
  </div>
  <div class="input-area">
    <input
      type="text"
      id="questionInput"
      placeholder="Ask a question about the PDF..."
      onkeydown="if(event.key==='Enter') askQuestion()"
    />
    <button class="btn-ask" id="askBtn" onclick="askQuestion()">
      Ask
    </button>
  </div>
</div>
Enter fullscreen mode Exit fullscreen mode

🎨 Frontend CSS Styling

The app uses a modern card-style design.

* {
  box-sizing: border-box;
  margin: 0;
  padding: 0;
}

body {
  font-family: 'Segoe UI', sans-serif;
  background: #f0f2f5;
  display: flex;
  flex-direction: column;
  align-items: center;
  min-height: 100vh;
  padding: 30px 20px;
}

h1 {
  font-size: 26px;
  margin-bottom: 20px;
  color: #1a1a2e;
}

.upload-section {
  background: white;
  padding: 16px 24px;
  border-radius: 12px;
  box-shadow: 0 2px 10px rgba(0,0,0,0.1);
  display: flex;
  align-items: center;
  gap: 12px;
  margin-bottom: 12px;
  width: 100%;
  max-width: 720px;
}

.chat-box {
  background: white;
  border-radius: 12px;
  box-shadow: 0 2px 10px rgba(0,0,0,0.1);
  width: 100%;
  max-width: 720px;
  display: flex;
  flex-direction: column;
  height: 520px;
}

.chat-header {
  padding: 14px 20px;
  background: #4f46e5;
  color: white;
  border-radius: 12px 12px 0 0;
  display: flex;
  justify-content: space-between;
  align-items: center;
  font-size: 15px;
  font-weight: 600;
}

.messages {
  flex: 1;
  overflow-y: auto;
  padding: 20px;
  display: flex;
  flex-direction: column;
  gap: 12px;
}

.message {
  max-width: 82%;
  padding: 10px 14px;
  border-radius: 12px;
  font-size: 14px;
  line-height: 1.6;
  white-space: pre-wrap;
  word-break: break-word;
}

.message.user {
  background: #4f46e5;
  color: white;
  align-self: flex-end;
  border-bottom-right-radius: 3px;
}

.message.ai {
  background: #f1f5f9;
  color: #1a1a2e;
  align-self: flex-start;
  border-bottom-left-radius: 3px;
}

.message.loading {
  background: #f1f5f9;
  color: #94a3b8;
  align-self: flex-start;
  font-style: italic;
}

.input-area {
  display: flex;
  padding: 14px;
  gap: 10px;
  border-top: 1px solid #e2e8f0;
}

.input-area input {
  flex: 1;
  padding: 10px 14px;
  border: 1px solid #cbd5e1;
  border-radius: 8px;
  font-size: 14px;
  outline: none;
}

.input-area input:focus {
  border-color: #4f46e5;
}

.btn-ask {
  background: #4f46e5;
  color: white;
  border: none;
  padding: 10px 22px;
  border-radius: 8px;
  cursor: pointer;
  font-size: 14px;
  font-weight: 600;
}

.btn-ask:hover {
  background: #4338ca;
}

.btn-ask:disabled {
  background: #a5b4fc;
  cursor: not-allowed;
}

.btn-upload {
  background: #4f46e5;
  color: white;
  border: none;
  padding: 10px 20px;
  border-radius: 8px;
  cursor: pointer;
  font-size: 14px;
  font-weight: 600;
}

.btn-upload:hover {
  background: #4338ca;
}

.btn-clear {
  background: rgba(255,255,255,0.2);
  color: white;
  border: none;
  padding: 6px 14px;
  border-radius: 6px;
  cursor: pointer;
  font-size: 13px;
}

.btn-clear:hover {
  background: rgba(255,255,255,0.35);
}
Enter fullscreen mode Exit fullscreen mode

CSS Design Highlights

  • ✅ Responsive centered layout
  • ✅ Upload card with spacing and shadows
  • ✅ Scrollable message container
  • ✅ Separate styles for user and AI messages
  • ✅ Loading state for thinking indicator
  • ✅ Disabled state for the ask button
  • ✅ Clean indigo color scheme

⚡ Frontend JavaScript Logic

The frontend JavaScript handles:

  • 📤 PDF upload
  • ✅ Status updates
  • 📨 Sending user questions
  • 💬 Showing chat messages
  • ⏳ Displaying loading states
  • 🗑️ Clearing chat
  • 📜 Auto-scrolling the chat window

📤 Uploading a PDF

When the user selects a PDF and clicks Upload PDF, the file is sent to the server using FormData.

async function uploadPDF() {
  const fileInput = document.getElementById("pdfFile");
  const statusDiv = document.getElementById("uploadStatus");

  if (!fileInput.files[0]) {
    statusDiv.style.color = "red";
    statusDiv.textContent = "❌ Please select a PDF.";
    return;
  }

  statusDiv.style.color = "orange";
  statusDiv.textContent = "⏳ Processing PDF...";

  const formData = new FormData();
  formData.append("pdf", fileInput.files[0]);

  try {
    const res = await fetch("/upload", {
      method: "POST",
      body: formData,
    });
    const data = await res.json();

    if (res.ok) {
      statusDiv.style.color = "green";
      statusDiv.textContent =
        `✅ Ready! ${data.chunks} chunks indexed.`;
      addMessage(
        "ai",
        "✅ PDF loaded successfully! Ask me anything."
      );
    } else {
      statusDiv.style.color = "red";
      statusDiv.textContent = "" + data.error;
    }
  } catch (e) {
    statusDiv.style.color = "red";
    statusDiv.textContent = "❌ Upload failed.";
  }
}
Enter fullscreen mode Exit fullscreen mode

What this does:

  • ✅ Checks if a file is selected
  • ✅ Updates UI status with color-coded messages
  • ✅ Sends the PDF to /upload via FormData
  • ✅ Handles success and failure responses
  • ✅ Informs the user when the PDF is ready

❓ Asking Questions

Once the PDF is processed, the user can ask questions about it.

async function askQuestion() {
  const input = document.getElementById("questionInput");
  const askBtn = document.getElementById("askBtn");
  const question = input.value.trim();
  if (!question) return;

  addMessage("user", question);
  input.value = "";
  askBtn.disabled = true;

  const loadingId = addMessage("loading", "🤖 Thinking...");

  try {
    const res = await fetch("/ask", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ question }),
    });
    const data = await res.json();
    removeMessage(loadingId);

    if (res.ok) {
      addMessage("ai", data.answer);
    } else {
      addMessage("ai", "" + data.error);
    }
  } catch (e) {
    removeMessage(loadingId);
    addMessage("ai", "❌ Request failed.");
  }

  askBtn.disabled = false;
  scrollToBottom();
}
Enter fullscreen mode Exit fullscreen mode

What this does:

  • ✅ Reads the user question
  • ✅ Displays it in the chat as a user message
  • ✅ Sends the question to the backend
  • ✅ Shows a "🤖 Thinking..." message while waiting
  • ✅ Displays the AI response
  • ✅ Handles API errors gracefully
  • ✅ Disables the ask button during processing

🛠️ Message Utility Functions

The UI uses helper functions to manage chat messages dynamically.

function addMessage(type, text) {
  const box = document.getElementById("messages");
  const div = document.createElement("div");
  const id = "msg_" + Date.now() + Math.random();
  div.id = id;
  div.className = `message ${type}`;
  div.textContent = text;
  box.appendChild(div);
  scrollToBottom();
  return id;
}

function removeMessage(id) {
  const el = document.getElementById(id);
  if (el) el.remove();
}

function scrollToBottom() {
  const box = document.getElementById("messages");
  box.scrollTop = box.scrollHeight;
}
Enter fullscreen mode Exit fullscreen mode

What these do:

  • addMessage() — Creates a new message bubble in the chat
  • removeMessage() — Removes a specific message (used for loading indicators)
  • scrollToBottom() — Auto-scrolls the chat to show the latest message

🗑️ Clearing the Chat

The app supports resetting everything with a single click.

async function clearChat() {
  document.getElementById("messages").innerHTML =
    '<div class="message ai">🗑️ Chat cleared! Upload a PDF to start.</div>';
  document.getElementById("uploadStatus").textContent = "";
  await fetch("/clear", { method: "POST" });
}
Enter fullscreen mode Exit fullscreen mode

This clears:

  • 💬 Chat UI messages
  • ✅ Upload status text
  • 🧠 Backend vector store
  • 📝 Backend chat history

🔧 Backend

Now let's look at the backend, which handles all the AI and document processing logic.


📦 Backend Dependencies

The server imports the following modules:

import "dotenv/config";

import express from "express";
import multer from "multer";
import cors from "cors";
import fs from "fs";
import path from "path";
import { fileURLToPath } from "url";

import { PDFLoader }
  from "@langchain/community/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter }
  from "@langchain/textsplitters";
import { MemoryVectorStore }
  from "langchain/vectorstores/memory";
import { HuggingFaceTransformersEmbeddings }
  from "@langchain/community/embeddings/huggingface_transformers";
import { ChatOllama }
  from "@langchain/ollama";
Enter fullscreen mode Exit fullscreen mode

Why These Packages Are Used

Package Purpose
express Server framework
multer File upload handling
cors Cross-origin request support
fs File system operations
dotenv Environment variable management
PDFLoader Reads PDF content
RecursiveCharacterTextSplitter Splits long text into chunks
MemoryVectorStore Stores embeddings in memory
HuggingFaceTransformersEmbeddings Generates embeddings locally
ChatOllama Connects to the local Ollama model

🚀 Server Initialization

The app starts by setting up Express and middleware.

const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);

const app = express();
app.use(cors());
app.use(express.json());
app.use(express.static("public"));

const upload = multer({ dest: "uploads/" });
Enter fullscreen mode Exit fullscreen mode

What this does:

  • ✅ Enables CORS for cross-origin requests
  • ✅ Parses JSON request bodies
  • ✅ Serves static frontend files from the public directory
  • ✅ Configures Multer to store uploads temporarily in the uploads/ folder

💾 Application State

Two in-memory variables are used to track the current session:

let vectorStore = null;
let chatHistory = [];
Enter fullscreen mode Exit fullscreen mode
Variable Purpose
vectorStore Stores the indexed PDF chunks as embeddings
chatHistory Stores recent conversation pairs for context continuity

Important note: Because these are stored in memory:

  • ⚠️ They reset when the server restarts
  • ✅ They are suitable for demos and prototypes
  • ❌ They are not ideal for production persistence

🧠 Embedding Model Setup

The app uses a local Hugging Face embedding model:

const embeddings = new HuggingFaceTransformersEmbeddings({
  modelName: "Xenova/all-MiniLM-L6-v2",
});
Enter fullscreen mode Exit fullscreen mode

Why This Model?

Xenova/all-MiniLM-L6-v2 is popular because it is:

  • ✅ Lightweight and fast
  • ✅ Runs locally without external API
  • ✅ Good for semantic similarity tasks
  • ✅ Suitable for local RAG prototypes
  • ✅ Well-supported in LangChain

Its role is to convert text chunks into numerical vectors (embeddings) that capture the semantic meaning of the text.


🤖 LLM Setup with Ollama

For answer generation, the app uses Ollama with the llama3.2 model.

const llm = new ChatOllama({
  model: "llama3.2",
  temperature: 0,
  baseUrl: "http://127.0.0.1:11434",
});
Enter fullscreen mode Exit fullscreen mode

Why Use Ollama?

Ollama allows you to run open-source LLMs locally.

  • ✅ Local execution — no cloud dependency
  • ✅ Privacy-friendly — your data stays on your machine
  • ✅ No API costs
  • ✅ Easy model switching
  • ✅ Great for experimentation and learning

Why Temperature is 0?

A lower temperature makes the output more deterministic and focused, which is ideal for document Q&A where accuracy matters more than creativity.


📤 Upload Route: POST /upload

This route handles PDF uploading, text extraction, chunking, embedding, and indexing.

app.post("/upload", upload.single("pdf"), async (req, res) => {
  try {
    if (!req.file) {
      return res.status(400).json({ error: "No file uploaded" });
    }

    const filePath = req.file.path;
    chatHistory = [];

    console.log("📄 Loading PDF...");
    const loader = new PDFLoader(filePath);
    const docs = await loader.load();
    console.log(`   → Loaded ${docs.length} page(s)`);

    console.log("✂️  Splitting text...");
    const splitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,
      chunkOverlap: 200,
    });
    const splits = await splitter.splitDocuments(docs);
    console.log(`   → Created ${splits.length} chunks`);

    console.log("🧠 Creating LOCAL embeddings...");
    vectorStore = await MemoryVectorStore.fromDocuments(
      splits,
      embeddings
    );

    fs.unlinkSync(filePath);

    console.log("✅ PDF processed successfully!");
    res.json({
      message: "✅ PDF processed successfully!",
      chunks: splits.length,
    });

  } catch (err) {
    console.error("❌ Upload error:", err.message);
    res.status(500).json({ error: "PDF processing failed" });
  }
});
Enter fullscreen mode Exit fullscreen mode

🔍 Step-by-Step Breakdown of /upload

Step 1: Validate the Upload

If no file is uploaded, the server returns an error immediately.

if (!req.file) {
  return res.status(400).json({ error: "No file uploaded" });
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Reset Chat History

Uploading a new PDF starts a fresh document session.

chatHistory = [];
Enter fullscreen mode Exit fullscreen mode

Step 3: Load the PDF

The PDF is parsed into LangChain document objects using PDFLoader.

const loader = new PDFLoader(filePath);
const docs = await loader.load();
Enter fullscreen mode Exit fullscreen mode

Each page of the PDF becomes a separate document object containing:

  • pageContent — the text of the page
  • metadata — page number and source info

Step 4: Split the Text into Chunks

The content is divided into manageable chunks using RecursiveCharacterTextSplitter.

const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200,
});
const splits = await splitter.splitDocuments(docs);
Enter fullscreen mode Exit fullscreen mode

Step 5: Generate Embeddings and Create the Vector Store

Each chunk is embedded and stored for later retrieval.

vectorStore = await MemoryVectorStore.fromDocuments(
  splits,
  embeddings
);
Enter fullscreen mode Exit fullscreen mode

Step 6: Delete the Uploaded File

The temporary file is removed after processing to save disk space.

fs.unlinkSync(filePath);
Enter fullscreen mode Exit fullscreen mode

Step 7: Return Success Response

The server sends back the chunk count and a success message.

res.json({
  message: "✅ PDF processed successfully!",
  chunks: splits.length,
});
Enter fullscreen mode Exit fullscreen mode

✂️ Why Chunking is Important

Chunking is a critical part of any RAG system.

If you pass an entire PDF into an LLM:

  • ❌ It may exceed token limits
  • ❌ Retrieval becomes inefficient
  • ❌ Answers may become noisy or unfocused

Top comments (0)