DEV Community

Cover image for Building a Production-Ready RAG System: Zero to Hero with TypeScript, Docker, Google Gemini & LangChain.js
Glaucia Lemos
Glaucia Lemos

Posted on

Building a Production-Ready RAG System: Zero to Hero with TypeScript, Docker, Google Gemini & LangChain.js

๐Ÿš€ Building a Production-Ready RAG System: Zero to Hero with TypeScript, Docker, Google Gemini & LangChain.js

GitHub logo glaucia86 / rag-search-ingestion-langchainjs-gemini

A PDF search ingestion RAG application with Docker + LangChain.js + Gemini

๐Ÿค– RAG Search Ingestion - LangChain.js + Docker + Gemini

Node.js TypeScript LangChain Google Gemini PostgreSQL pgVector Docker License

Uma aplicaรงรฃo completa de Retrieval-Augmented Generation (RAG) para busca inteligente em documentos PDF, construรญda com TypeScript, Node.js e tecnologias modernas de IA.

๐Ÿ“‹ รndice

๐ŸŽฏ Visรฃo Geral

Este projeto implementa um sistema RAG completo que permite fazer perguntas em linguagem natural sobre o conteรบdo de documentos PDF. O sistema processa documentos, cria embeddings vetoriais, armazena em um banco de dados PostgreSQL com pgVector e responde perguntas usando Google Gemini.

Como Funciona

  1. Ingestรฃo: O sistema carrega e processa documentos PDF, dividindo-os em chunks
  2. Vetorizaรงรฃo: Cada chunk รฉ convertido em embeddings usando Google Gemini
  3. Armazenamento: Os embeddings sรฃo armazenados no PostgreSQL com extensรฃo pgVector
  4. Busca: Quando vocรช faz uma pergunta, o sistema encontra os chunks mais relevantes
  5. โ€ฆ

Have you ever wondered how to build an AI system that can answer questions about your specific documents without hallucinating? Welcome to the world of Retrieval-Augmented Generation (RAG) - the game-changing architecture that's revolutionizing how we interact with AI systems!

In this comprehensive tutorial, I'll walk you through building a complete, production-ready RAG system from scratch using modern technologies that every developer should know about.

Full Tutorial HERE

๐ŸŽฏ What You'll Learn

By the end of this tutorial, you'll have a fully functional RAG system that can:

  • โœ… Process PDF documents intelligently
  • โœ… Answer natural language questions with precision
  • โœ… Provide source-grounded responses (no more hallucinations!)
  • โœ… Scale to production environments
  • โœ… Run everything in Docker containers

๐Ÿ”ง Our Tech Stack

We're building this with cutting-edge technologies:

  • TypeScript - For type-safe, maintainable code
  • Docker - For containerized, scalable deployment
  • Google Gemini - For powerful AI embeddings and generation
  • LangChain.js - For seamless AI application orchestration
  • PostgreSQL + pgVector - For efficient vector storage and similarity search
  • Node.js - For robust backend runtime

๐Ÿง  Why RAG? The Problem with Traditional LLMs

Large Language Models like GPT, Claude, and Gemini are incredibly powerful, but they have some critical limitations:

The Challenges:

  • Static Knowledge: Limited to training data cutoff dates
  • Hallucinations: Tendency to invent information when uncertain
  • No Domain Context: Can't access your private documents or databases
  • Update Limitations: Can't learn new facts without expensive retraining

The RAG Solution:

RAG elegantly solves these problems by combining two powerful components:

  1. Retrieval Component: Intelligently searches for relevant information in your knowledge base
  2. Generation Component: Uses an LLM to generate responses based exclusively on retrieved context

This ensures your AI responses are always grounded in verifiable sources!

๐Ÿ—๏ธ System Architecture Overview

Our RAG system follows this intelligent pipeline:

PDF Document โ†’ Text Extraction โ†’ Smart Chunking โ†’ 
Vector Embeddings โ†’ PostgreSQL Storage โ†’ Semantic Search โ†’ 
Context Assembly โ†’ AI Response Generation
Enter fullscreen mode Exit fullscreen mode

๐Ÿš€ Quick Start Guide

Prerequisites

Make sure you have these installed:

  • Node.js 22.0.0+
  • Docker 24.0.0+
  • Git 2.40.0+

1. Project Setup

mkdir rag-system-typescript && cd rag-system-typescript
mkdir src
npm init -y
Enter fullscreen mode Exit fullscreen mode

2. Install Dependencies

Production dependencies:

npm install @google/generative-ai @langchain/core @langchain/community @langchain/textsplitters dotenv pg uuid
Enter fullscreen mode Exit fullscreen mode

Development dependencies:

npm install -D @types/node @types/pg @types/pdf-parse tsx typescript
Enter fullscreen mode Exit fullscreen mode

3. TypeScript Configuration

Create a tsconfig.json with optimized settings:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "node",
    "outDir": "./dist",
    "rootDir": "./src",
    "strict": true,
    "esModuleInterop": true,
    "skipLibCheck": true,
    "forceConsistentCasingInFileNames": true
  }
}
Enter fullscreen mode Exit fullscreen mode

4. Docker Infrastructure

Set up PostgreSQL with pgVector using this docker-compose.yml:

services:
  postgres:
    image: pgvector/pgvector:pg17
    container_name: postgres_rag_ts
    environment:
      POSTGRES_USER: postgres
      POSTGRES_PASSWORD: postgres  
      POSTGRES_DB: rag
    ports:
      - "5432:5432"
    volumes:
      - postgres_data:/var/lib/postgresql/data
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U postgres -d rag"]
      interval: 10s
      timeout: 5s
      retries: 5

  bootstrap_vector_ext:
    image: pgvector/pgvector:pg17
    depends_on:
      postgres:
        condition: service_healthy
    entrypoint: ["/bin/sh", "-c"]
    command: >
      PGPASSWORD=postgres
      psql "postgresql://postgres@postgres:5432/rag" -v ON_ERROR_STOP=1
      -c "CREATE EXTENSION IF NOT EXISTS vector;"
    restart: "no"

volumes:
  postgres_data:
Enter fullscreen mode Exit fullscreen mode

๐Ÿค– Google Gemini Integration

Here's how we create a robust Google client:

import { GoogleGenerativeAI } from '@google/generative-ai';

export class GoogleClient {
  private genAI: GoogleGenerativeAI;

  constructor() {
    const apiKey = process.env.GOOGLE_API_KEY;
    if (!apiKey) {
      throw new Error('Google API key is required!');
    }
    this.genAI = new GoogleGenerativeAI(apiKey);
  }

  async getEmbeddings(texts: string[]): Promise<number[][]> {
    const embeddings: number[][] = [];

    for (const text of texts) {
      try {
        const model = this.genAI.getGenerativeModel({ 
          model: 'embedding-001' 
        });
        const result = await model.embedContent(text);

        if (result.embedding?.values) {
          embeddings.push(result.embedding.values);
        }
      } catch (error) {
        console.error('Error generating embedding:', error);
        // Fallback to zero vector
        embeddings.push(new Array(768).fill(0));
      }
    }

    return embeddings;
  }
}
Enter fullscreen mode Exit fullscreen mode

๐ŸŽฏ The Magic of Embeddings

What are embeddings? Think of them as numerical "fingerprints" of text that capture semantic meaning:

"cat" โ†’ [0.1, 0.3, 0.5, ..., 0.8]  // 768 dimensions
"dog" โ†’ [0.2, 0.4, 0.6, ..., 0.7]  // Similar to "cat"
Enter fullscreen mode Exit fullscreen mode

When vectors are close in mathematical space, the concepts are semantically similar!

๐Ÿ“„ Smart Document Processing

Our chunking strategy is crucial for RAG performance:

import { RecursiveCharacterTextSplitter } from '@langchain/textsplitters';

const textSplitter = new RecursiveCharacterTextSplitter({
  chunkSize: 400,        // Optimal for tabular data
  chunkOverlap: 0,       // No overlap needed for tables
});
Enter fullscreen mode Exit fullscreen mode

For tabular PDFs (like our use case), we break documents line by line, preserving table headers in each chunk for maximum semantic clarity.

๐Ÿ’ซ HNSW: The Secret Sauce of Fast Vector Search

Our system uses Hierarchical Navigable Small World (HNSW) indexing - think of it as a GPS for vector space:

  • Hierarchical Structure: Multiple levels for efficient navigation
  • Fast Searches: Millisecond responses even with millions of vectors
  • Scalable: Handles large datasets without performance degradation
-- Automatic index creation by pgVector
CREATE INDEX ON pdf_documents USING hnsw (vector vector_cosine_ops);
Enter fullscreen mode Exit fullscreen mode

๐ŸŽจ Interactive CLI Experience

We've built a user-friendly CLI that includes:

  • Real-time Processing: See your questions being processed
  • System Status: Health checks for all components
  • Smart Commands: help, status, clear, exit
  • Error Handling: Graceful degradation with helpful messages
// Special commands for better UX
if (['exit', 'quit', 'q'].includes(command)) {
  console.log('Thank you for using RAG Chat. Goodbye!');
  break;
}

if (['help', 'h'].includes(command)) {
  printHelp();
  continue;
}
Enter fullscreen mode Exit fullscreen mode

๐Ÿ” Environment Configuration

Keep your secrets safe with proper environment management:

# .env file
GOOGLE_API_KEY=your_google_api_key_here
GOOGLE_EMBEDDING_MODEL=models/embedding-001
GOOGLE_CHAT_MODEL=gemini-2.0-flash
DATABASE_URL=postgresql://postgres:postgres@localhost:5432/rag
PG_VECTOR_COLLECTION_NAME=pdf_documents
PDF_PATH=./document.pdf
Enter fullscreen mode Exit fullscreen mode

๐Ÿš€ Running Your RAG System

  1. Start Infrastructure:
docker-compose up -d
Enter fullscreen mode Exit fullscreen mode
  1. Ingest Your PDF:
npm run dev:ingest
Enter fullscreen mode Exit fullscreen mode
  1. Start Chatting:
npm run dev:chat
Enter fullscreen mode Exit fullscreen mode

๐ŸŽฏ Production-Ready Features

Our system includes enterprise-grade features:

  • Batch Processing: Optimized API calls with rate limiting
  • Connection Pooling: Efficient database connections
  • Error Recovery: Graceful handling of failures
  • Health Monitoring: System status checks
  • Scalable Architecture: Ready for horizontal scaling

๐Ÿ” Performance Metrics

Real-world performance you can expect:

  • Ingestion: 50-page PDF processed in ~30 seconds
  • Query Response: 2-3 seconds per question
  • Throughput: 100+ questions per minute
  • Accuracy: Source-grounded responses (no hallucinations!)

๐Ÿ›ก๏ธ Anti-Hallucination Strategies

We implement several techniques to ensure factual responses:

  • Context-Only Responses: AI only uses retrieved information
  • Low Temperature: Reduces creative/speculative responses
  • Fallback Handling: "I don't know" when information isn't available
  • Source Attribution: Always trace back to original documents

๐Ÿ”ฎ Future Roadmap

Exciting enhancements planned:

  • REST API: Easy integration with web applications
  • React Dashboard: Modern web interface
  • Multi-tenancy: Support multiple users and document sets
  • Redis Caching: Faster response times
  • OpenTelemetry: Complete observability

๐ŸŽ“ Want to Learn More?

๐Ÿš€ Get the Complete Source Code - Clone the repository and start building your own RAG system today!

๐Ÿ“š Additional Resources

๐Ÿค Connect & Learn Together

Building AI systems is more fun with a community! Let's connect:

๐Ÿ’ญ What's Next?

This RAG system is just the beginning! Here are some exciting directions to explore:

  1. Multi-modal RAG: Add support for images and audio
  2. Real-time Updates: Implement live document synchronization
  3. Advanced Retrieval: Experiment with hybrid search strategies
  4. Custom Models: Fine-tune embeddings for your specific domain

๐Ÿ† Key Takeaways

Building a production-ready RAG system involves:

  • โœ… Smart Architecture: Thoughtful component design
  • โœ… Robust Infrastructure: Docker + PostgreSQL + pgVector
  • โœ… Quality Implementation: TypeScript + LangChain.js
  • โœ… Performance Optimization: HNSW indexing + batch processing
  • โœ… User Experience: Intuitive interfaces and error handling

๐ŸŽฏ Ready to Build? Start your RAG journey now with the complete source code and step-by-step guide!


Questions or feedback? Drop a comment below! I love discussing AI architecture and helping fellow developers build amazing systems.

Found this helpful? Give it a โค๏ธ and share it with your developer friends who are interested in AI and TypeScript!

Happy coding, and welcome to the future of intelligent document interaction! ๐Ÿš€โœจ

Top comments (0)