Implementing a Retrieval-Augmented Generation (RAG) Chatbot with LangChain, Firebase, and Pinecone

#llm #typescript #tutorial #rag

Recently, I was working on a Retrieval-Augmented Generation (RAG) chatbot where users can upload PDFs or where I can scrape structured website data. The goal was to convert this data into embeddings, store it in a vector database, and then use it to answer user queries.

Step 1: Extract Raw Text

From pdf

import { PdfReader } from 'pdfreader';

let text = '';
const pdfReader = new PdfReader();

await new Promise<void>((resolve, reject) => {
  pdfReader.parseBuffer(buffer, (err: any, item: any) => {
    if (err) reject(err);
    else if (!item) resolve();
    else if (item.text) text += item.text + ' ';
  });
});

From scraped website data:

function extractTextFromScrapedData(data: ScrapedData): string {
  const sections: string[] = [];

  if (data.about) sections.push(`About: ${data.about}`);
  if (data.services) sections.push(`Services: ${data.services.join('\n• ')}`);
  if (data.contact?.phone) sections.push(`Phone: ${data.contact.phone}`);

  return sections.join('\n\n');
}

Step 2: Split Text into Chunks

Long documents need to be split so the LLM can handle them. I used LangChain’s RecursiveCharacterTextSplitter:

import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters";
import { Document } from "@langchain/core/documents";

const doc = new Document({ pageContent: text });
const splitter = new RecursiveCharacterTextSplitter({
  chunkSize: 1000,
  chunkOverlap: 200
});

const allSplits = await splitter.splitDocuments([doc]);
console.log("Chunks created:", allSplits.length);

Step 3: Generate Embeddings + Store

This is the most important part:
When you call vectorStore.addDocuments(), LangChain automatically calls your embedding model (e.g. OpenAI, Cohere) and saves those vectors into your configured database (Pinecone in my case).

import { vectorStore } from "@/app/lib/langchain";

// Store with namespace (client-specific)
await vectorStore.addDocuments(allSplits, { namespace: clientId });
console.log("✅ Documents added to vectorStore");

Step 4: Retrieve Context During Chat

When a user asks a question, we retrieve the top-k similar chunks:

const retrieved = await vectorStore.similaritySearchWithScore(
  userMessage,
  3,
  { namespace: clientId }
);

const relevantMatches = retrieved.filter(([doc, score]) => score > 0.6);
console.log("Relevant matches:", relevantMatches.length);

The other chunking methods are :

Agentic Chunking → LLM-guided, semantic-aware chunking based on meaning and context.
CharacterTextSplitter → Splits by character count (simple, fast).
RecursiveCharacterTextSplitter → Splits by hierarchy (paragraph → sentence → word → char). Most commonly used.
TokenTextSplitter → Splits by tokens, aligns with model tokenization.
MarkdownTextSplitter → Splits Markdown docs by headers/sections.
HTMLTextSplitter → Splits HTML while respecting tags.
CodeTextSplitter → Splits source code by functions, classes, logical blocks.

Refer to this doc for more clarifying
DOC