Harshit Chaturvedi

Posted on Oct 13

RAG Systems 101: Build Your First Retrieval-Augmented Generation System 🚀

#webdev #programming #ai #beginners

RAG (Retrieval-Augmented Generation) systems are becoming increasingly popular for building intelligent applications that can answer questions based on specific knowledge.

Building your own RAG system can be complex, and all the concepts, frameworks, and practices you need to follow can be a bit overwhelming.

The good news is, that building a RAG system can be straightforward and I'll show you how.

In this guide, you will learn how to build your first RAG system in just 30 minutes, using a portfolio website as our example. You can see a live demo at harshitchaturvedi.com - try asking it about the case studies!

Let's jump in.

1. What is RAG and how does it work?

RAG (Retrieval Augmented Generation) is a technique that combines the power of large language models with your own data to create more accurate and contextual responses.

Instead of relying solely on the LLM's training data, RAG retrieves relevant information from your specific knowledge base and uses that to generate responses.

How RAG Works

❌ Without RAG: LLM generates responses based only on its training data (can be generic or inaccurate)

✅ With RAG: LLM retrieves relevant information from your data, then generates responses based on that specific context

RAG systems can:

Answer questions about specific content in your knowledge base
Provide accurate, up-to-date information
Give context-aware responses based on retrieved data
Show source attribution (builds trust!)
Handle complex queries about your specific domain

Now that we have understood RAG, it's time to build one in the next section.

2. Building Your RAG System

We'll build a RAG system using a portfolio website as our example. You can adapt this for any use case:

Next.js 15 for the frontend
Supabase with pgvector for the vector database
OpenAI for embeddings and text generation
TypeScript for type safety

Step 1: Project Setup

npx create-next-app@latest my-rag-system --typescript --tailwind --app
cd my-rag-system
npm install @supabase/supabase-js openai

Step 2: Database Setup

Create a Supabase project and set up the vector database:

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Create embeddings table
CREATE TABLE content_embeddings (
  id bigserial PRIMARY KEY,
  content_id text NOT NULL,
  content_title text NOT NULL,
  section_type text NOT NULL,
  content text NOT NULL,
  metadata jsonb,
  embedding vector(1536),
  created_at timestamp with time zone DEFAULT now()
);

-- Create vector similarity index
CREATE INDEX ON content_embeddings USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

-- Enable RLS
ALTER TABLE content_embeddings ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Public read access" ON content_embeddings FOR SELECT TO public USING (true);

Step 3: Content Chunking

Break your content into searchable pieces. This is crucial because RAG works best with smaller, focused chunks rather than large documents:

// src/lib/content-chunker.ts
export interface ContentChunk {
  contentId: string;        // Unique identifier for the original content
  contentTitle: string;     // Title for context and display
  sectionType: string;      // Type of section (overview, problem, solution, etc.)
  content: string;          // The actual text content to be embedded
  metadata: Record<string, any>; // Additional context information
}

export function chunkContent(item: any): ContentChunk[] {
  const chunks: ContentChunk[] = [];

  // Create an overview chunk - this gives general context about the content
  chunks.push({
    contentId: item.id,
    contentTitle: item.title,
    sectionType: 'overview',
    content: `${item.title}. ${item.description}`,
    metadata: { description: item.description }
  });

  // Add specific sections based on your content structure
  // This allows the RAG system to find relevant parts of your content
  if (item.problem) {
    chunks.push({
      contentId: item.id,
      contentTitle: item.title,
      sectionType: 'problem',
      content: `Problem: ${item.problem}`,
      metadata: { problem: item.problem }
    });
  }

  return chunks;
}

Step 4: Embedding Generation

Convert text into numerical vectors that can be searched for similarity. Embeddings capture the semantic meaning of text:

// src/lib/embeddings.ts
import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

// Generate a single embedding - converts text to a 1536-dimensional vector
export async function generateEmbedding(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small',  // OpenAI's efficient embedding model
    input: text.trim(),
  });
  return response.data[0].embedding;  // Returns array of 1536 numbers
}

// Generate multiple embeddings in batches for efficiency
export async function generateEmbeddings(texts: string[]): Promise<number[][]> {
  const BATCH_SIZE = 100;  // Process 100 texts at once to avoid rate limits
  const embeddings: number[][] = [];

  for (let i = 0; i < texts.length; i += BATCH_SIZE) {
    const batch = texts.slice(i, i + BATCH_SIZE);
    const response = await openai.embeddings.create({
      model: 'text-embedding-3-small',
      input: batch.map(t => t.trim()),
    });
    embeddings.push(...response.data.map(d => d.embedding));
  }
  return embeddings;
}

Step 5: RAG Service

The core of your RAG system - this searches for relevant content and formats it for the LLM:

// src/lib/rag-service.ts
import { supabase } from './supabase';
import { generateEmbedding } from './embeddings';

export interface RAGContext {
  content: string;      // The actual text content found
  source: string;       // Where this content came from (for attribution)
  similarity: number;   // How similar this content is to the query (0-1)
}

// Search for content similar to the user's query
export async function searchContent(
  query: string,
  options: { threshold?: number; limit?: number } = {}
): Promise<RAGContext[]> {
  const { threshold = 0.2, limit = 8 } = options;  // Lower threshold = more results

  // Convert the user's query into an embedding
  const queryEmbedding = await generateEmbedding(query);

  // Search the database for similar content using vector similarity
  const { data, error } = await supabase.rpc('match_content', {
    query_embedding: queryEmbedding,
    match_threshold: threshold,  // Only return content above this similarity score
    match_count: limit           // Maximum number of results to return
  });

  if (error) return [];

  // Format the results for the LLM
  return data.map((row: any) => ({
    content: row.content,
    source: `${row.content_title} - ${row.section_type}`,  // For source attribution
    similarity: row.similarity
  }));
}

// Format the retrieved context for the LLM prompt
export function formatRAGContext(contexts: RAGContext[]): string {
  if (contexts.length === 0) return '';

  const formatted = contexts.map((ctx, index) => 
    `[${index + 1}] ${ctx.content} (Source: ${ctx.source})`
  ).join('\n\n');

  return `Relevant context:\n\n${formatted}`;
}

Step 6: Database Function

This PostgreSQL function performs the actual vector similarity search. It's the heart of your RAG system:

-- Create vector search function
CREATE OR REPLACE FUNCTION match_content(
  query_embedding vector(1536),    -- The user's query as a vector
  match_threshold float default 0.7,  -- Minimum similarity score (0-1)
  match_count int default 5        -- Maximum number of results
)
RETURNS TABLE (
  id bigint,
  content_id text,
  content_title text,
  section_type text,
  content text,
  similarity float                 -- How similar this content is to the query
)
LANGUAGE plpgsql
AS $$
BEGIN
  RETURN QUERY
  SELECT
    content_embeddings.id,
    content_embeddings.content_id,
    content_embeddings.content_title,
    content_embeddings.section_type,
    content_embeddings.content,
    -- Calculate cosine similarity: 1 - distance = similarity score
    1 - (content_embeddings.embedding <=> query_embedding) as similarity
  FROM content_embeddings
  -- Only return content above the similarity threshold
  WHERE 1 - (content_embeddings.embedding <=> query_embedding) > match_threshold
  -- Order by similarity (most similar first)
  ORDER BY content_embeddings.embedding <=> query_embedding
  LIMIT match_count;
END;
$$;

Step 7: Chat API

This is where RAG meets LLM - it retrieves relevant content and uses it to generate accurate responses:

// src/app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import OpenAI from 'openai';
import { searchContent, formatRAGContext } from '@/lib/rag-service';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export async function POST(request: NextRequest) {
  const { message, conversationHistory = [] } = await request.json();

  // Step 1: Search for relevant content using RAG
  const ragContexts = await searchContent(message);
  const ragContext = formatRAGContext(ragContexts);

  // Step 2: Generate response using OpenAI with the retrieved context
  const completion = await openai.chat.completions.create({
    model: 'gpt-4o-mini',  // Efficient model for most use cases
    messages: [
      { role: 'system', content: `Use this context to answer: ${ragContext}` },
      ...conversationHistory,  // Maintain conversation history
      { role: 'user', content: message }
    ],
    temperature: 0.7,  // Balance between creativity and consistency
  });

  return NextResponse.json({
    reply: completion.choices[0]?.message?.content || 'Sorry, I could not generate a response.',
    sources: ragContexts.map(ctx => ctx.source)  // Show where the information came from
  });
}

Step 8: Data Ingestion

This API populates your vector database with your content. Run this once to set up your knowledge base:

// src/app/api/ingest/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { supabase } from '@/lib/supabase';
import { generateEmbeddings } from '@/lib/embeddings';
import { chunkContent } from '@/lib/content-chunker';

export async function POST(request: NextRequest) {
  // Security: Only allow authorized requests (use a secret key)
  const authHeader = request.headers.get('authorization');
  if (authHeader !== `Bearer ${process.env.CRON_SECRET}`) {
    return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
  }

  // Your content data - replace with your actual content
  const contentItems = []; // Add your content here

  let totalChunks = 0;
  for (const item of contentItems) {
    // Step 1: Break content into chunks
    const chunks = chunkContent(item);

    // Step 2: Generate embeddings for all chunks
    const texts = chunks.map(chunk => chunk.content);
    const embeddings = await generateEmbeddings(texts);

    // Step 3: Store in database with embeddings
    const records = chunks.map((chunk, index) => ({
      content_id: chunk.contentId,
      content_title: chunk.contentTitle,
      section_type: chunk.sectionType,
      content: chunk.content,
      metadata: chunk.metadata,
      embedding: embeddings[index]  // The vector representation
    }));

    await supabase.from('content_embeddings').insert(records);
    totalChunks += chunks.length;
  }

  return NextResponse.json({
    success: true,
    message: `Successfully ingested ${totalChunks} chunks`
  });
}

Step 9: Frontend Interface

A simple chat interface that connects to your RAG system:

// src/components/chat-window.tsx
'use client';
import { useState } from 'react';

export default function ChatWindow() {
  const [messages, setMessages] = useState<any[]>([]);  // Store conversation history
  const [input, setInput] = useState('');               // Current user input
  const [loading, setLoading] = useState(false);        // Loading state

  const sendMessage = async () => {
    if (!input.trim() || loading) return;

    // Add user message to conversation
    const userMessage = { role: 'user', content: input };
    setMessages(prev => [...prev, userMessage]);
    setInput('');
    setLoading(true);

    try {
      // Send message to your RAG API
      const response = await fetch('/api/chat', {
        method: 'POST',
        headers: { 'Content-Type': 'application/json' },
        body: JSON.stringify({
          message: input,
          conversationHistory: messages  // Pass conversation history for context
        })
      });

      const data = await response.json();

      // Add AI response to conversation
      setMessages(prev => [...prev, {
        role: 'assistant',
        content: data.reply,
        sources: data.sources  // Show source attribution
      }]);
    } catch (error) {
      console.error('Chat error:', error);
    } finally {
      setLoading(false);
    }
  };

  return (
    <div className="flex flex-col h-96 border rounded-lg p-4">
      {/* Message display area */}
      <div className="flex-1 overflow-y-auto space-y-4 mb-4">
        {messages.map((message, index) => (
          <div key={index} className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`}>
            <div className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${
              message.role === 'user' 
                ? 'bg-blue-500 text-white' 
                : 'bg-gray-200 text-gray-800'
            }`}>
              <p>{message.content}</p>
              {/* Show source attribution for AI responses */}
              {message.sources && (
                <p className="text-xs mt-2 opacity-70">
                  Sources: {message.sources.join(', ')}
                </p>
              )}
            </div>
          </div>
        ))}
        {/* Loading indicator */}
        {loading && (
          <div className="flex justify-start">
            <div className="bg-gray-200 text-gray-800 px-4 py-2 rounded-lg">
              <p>Thinking...</p>
            </div>
          </div>
        )}
      </div>

      {/* Input area */}
      <div className="flex space-x-2">
        <input
          type="text"
          value={input}
          onChange={(e) => setInput(e.target.value)}
          onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
          placeholder="Ask about the content..."
          className="flex-1 px-3 py-2 border rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
        />
        <button
          onClick={sendMessage}
          disabled={loading || !input.trim()}
          className="px-4 py-2 bg-blue-500 text-white rounded-lg hover:bg-blue-600 disabled:opacity-50"
        >
          Send
        </button>
      </div>
    </div>
  );
}

Step 10: Environment Variables

Set up your environment variables for API keys and database access:

# .env.local
OPENAI_API_KEY=sk-your-openai-key                    # OpenAI API key for embeddings and chat
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co  # Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key         # Public Supabase key (safe for frontend)
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key     # Private key for server-side operations
CRON_SECRET=your-secret-for-ingestion               # Secret for protecting ingestion API

3. Advanced Features

Smart Context Routing

Route queries to specific content sections based on user intent:

// Detect what type of information the user is looking for
export function detectIntent(query: string): string {
  const lowerQuery = query.toLowerCase();
  if (lowerQuery.includes('team')) return 'team';        // Team-related questions
  if (lowerQuery.includes('problem')) return 'problem';  // Problem statements
  if (lowerQuery.includes('solution')) return 'solution'; // Solutions and approaches
  if (lowerQuery.includes('result')) return 'results';   // Results and outcomes
  return 'overview';  // Default to general overview
}

// Use intent detection to search specific content sections
const intent = detectIntent(message);
const ragContexts = await searchContent(message, { sectionType: intent });

Performance Optimization

Add caching to improve response times for repeated queries:

// In-memory cache for RAG responses
const cache = new Map<string, { data: RAGContext[], timestamp: number }>();
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes cache duration

export async function searchContent(query: string, options = {}) {
  const cacheKey = `${query}:${JSON.stringify(options)}`;
  const cached = cache.get(cacheKey);

  // Return cached result if still valid
  if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
    return cached.data;
  }

  // Perform actual search and cache the result
  const results = await performSearch(query, options);
  cache.set(cacheKey, { data: results, timestamp: Date.now() });
  return results;
}

Conclusion

Congratulations! 🎉 You've built a RAG system that can intelligently answer questions about your content.

What you've accomplished:

Set up a vector database with Supabase and pgvector
Created a content chunking system that breaks your data into searchable pieces
Implemented RAG search that finds relevant content based on user queries
Built an interface that provides contextual responses about your content

Next steps:

Add more content types: Documentation, FAQs, knowledge bases
Implement conversation memory: Remember previous questions in the session
Add multimedia support: Images, videos, interactive demos
Create analytics: Track what users ask about most

Resources:

Your RAG system is now ready to provide intelligent, context-aware responses about your content! 🚀

Want to see this in action? Check out the live demo at harshitchaturvedi.com - try asking it about the case studies!

Have questions? Drop a comment below. Happy coding! ✨

DEV Community