RAG (Retrieval-Augmented Generation) systems are becoming increasingly popular for building intelligent applications that can answer questions based on specific knowledge.
Building your own RAG system can be complex, and all the concepts, frameworks, and practices you need to follow can be a bit overwhelming.
The good news is, that building a RAG system can be straightforward and I'll show you how.
In this guide, you will learn how to build your first RAG system in just 30 minutes, using a portfolio website as our example. You can see a live demo at harshitchaturvedi.com - try asking it about the case studies!
Let's jump in.
1. What is RAG and how does it work?
RAG (Retrieval Augmented Generation) is a technique that combines the power of large language models with your own data to create more accurate and contextual responses.
Instead of relying solely on the LLM's training data, RAG retrieves relevant information from your specific knowledge base and uses that to generate responses.
How RAG Works
β Without RAG: LLM generates responses based only on its training data (can be generic or inaccurate)
β
With RAG: LLM retrieves relevant information from your data, then generates responses based on that specific context
RAG systems can:
- Answer questions about specific content in your knowledge base
- Provide accurate, up-to-date information
- Give context-aware responses based on retrieved data
- Show source attribution (builds trust!)
- Handle complex queries about your specific domain
Now that we have understood RAG, it's time to build one in the next section.
2. Building Your RAG System
We'll build a RAG system using a portfolio website as our example. You can adapt this for any use case:
- Next.js 15 for the frontend
- Supabase with pgvector for the vector database
- OpenAI for embeddings and text generation
- TypeScript for type safety
Step 1: Project Setup
npx create-next-app@latest my-rag-system --typescript --tailwind --app
cd my-rag-system
npm install @supabase/supabase-js openai
Step 2: Database Setup
Create a Supabase project and set up the vector database:
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Create embeddings table
CREATE TABLE content_embeddings (
id bigserial PRIMARY KEY,
content_id text NOT NULL,
content_title text NOT NULL,
section_type text NOT NULL,
content text NOT NULL,
metadata jsonb,
embedding vector(1536),
created_at timestamp with time zone DEFAULT now()
);
-- Create vector similarity index
CREATE INDEX ON content_embeddings USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Enable RLS
ALTER TABLE content_embeddings ENABLE ROW LEVEL SECURITY;
CREATE POLICY "Public read access" ON content_embeddings FOR SELECT TO public USING (true);
Step 3: Content Chunking
Break your content into searchable pieces. This is crucial because RAG works best with smaller, focused chunks rather than large documents:
// src/lib/content-chunker.ts
export interface ContentChunk {
contentId: string; // Unique identifier for the original content
contentTitle: string; // Title for context and display
sectionType: string; // Type of section (overview, problem, solution, etc.)
content: string; // The actual text content to be embedded
metadata: Record<string, any>; // Additional context information
}
export function chunkContent(item: any): ContentChunk[] {
const chunks: ContentChunk[] = [];
// Create an overview chunk - this gives general context about the content
chunks.push({
contentId: item.id,
contentTitle: item.title,
sectionType: 'overview',
content: `${item.title}. ${item.description}`,
metadata: { description: item.description }
});
// Add specific sections based on your content structure
// This allows the RAG system to find relevant parts of your content
if (item.problem) {
chunks.push({
contentId: item.id,
contentTitle: item.title,
sectionType: 'problem',
content: `Problem: ${item.problem}`,
metadata: { problem: item.problem }
});
}
return chunks;
}
Step 4: Embedding Generation
Convert text into numerical vectors that can be searched for similarity. Embeddings capture the semantic meaning of text:
// src/lib/embeddings.ts
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
// Generate a single embedding - converts text to a 1536-dimensional vector
export async function generateEmbedding(text: string): Promise<number[]> {
const response = await openai.embeddings.create({
model: 'text-embedding-3-small', // OpenAI's efficient embedding model
input: text.trim(),
});
return response.data[0].embedding; // Returns array of 1536 numbers
}
// Generate multiple embeddings in batches for efficiency
export async function generateEmbeddings(texts: string[]): Promise<number[][]> {
const BATCH_SIZE = 100; // Process 100 texts at once to avoid rate limits
const embeddings: number[][] = [];
for (let i = 0; i < texts.length; i += BATCH_SIZE) {
const batch = texts.slice(i, i + BATCH_SIZE);
const response = await openai.embeddings.create({
model: 'text-embedding-3-small',
input: batch.map(t => t.trim()),
});
embeddings.push(...response.data.map(d => d.embedding));
}
return embeddings;
}
Step 5: RAG Service
The core of your RAG system - this searches for relevant content and formats it for the LLM:
// src/lib/rag-service.ts
import { supabase } from './supabase';
import { generateEmbedding } from './embeddings';
export interface RAGContext {
content: string; // The actual text content found
source: string; // Where this content came from (for attribution)
similarity: number; // How similar this content is to the query (0-1)
}
// Search for content similar to the user's query
export async function searchContent(
query: string,
options: { threshold?: number; limit?: number } = {}
): Promise<RAGContext[]> {
const { threshold = 0.2, limit = 8 } = options; // Lower threshold = more results
// Convert the user's query into an embedding
const queryEmbedding = await generateEmbedding(query);
// Search the database for similar content using vector similarity
const { data, error } = await supabase.rpc('match_content', {
query_embedding: queryEmbedding,
match_threshold: threshold, // Only return content above this similarity score
match_count: limit // Maximum number of results to return
});
if (error) return [];
// Format the results for the LLM
return data.map((row: any) => ({
content: row.content,
source: `${row.content_title} - ${row.section_type}`, // For source attribution
similarity: row.similarity
}));
}
// Format the retrieved context for the LLM prompt
export function formatRAGContext(contexts: RAGContext[]): string {
if (contexts.length === 0) return '';
const formatted = contexts.map((ctx, index) =>
`[${index + 1}] ${ctx.content} (Source: ${ctx.source})`
).join('\n\n');
return `Relevant context:\n\n${formatted}`;
}
Step 6: Database Function
This PostgreSQL function performs the actual vector similarity search. It's the heart of your RAG system:
-- Create vector search function
CREATE OR REPLACE FUNCTION match_content(
query_embedding vector(1536), -- The user's query as a vector
match_threshold float default 0.7, -- Minimum similarity score (0-1)
match_count int default 5 -- Maximum number of results
)
RETURNS TABLE (
id bigint,
content_id text,
content_title text,
section_type text,
content text,
similarity float -- How similar this content is to the query
)
LANGUAGE plpgsql
AS $$
BEGIN
RETURN QUERY
SELECT
content_embeddings.id,
content_embeddings.content_id,
content_embeddings.content_title,
content_embeddings.section_type,
content_embeddings.content,
-- Calculate cosine similarity: 1 - distance = similarity score
1 - (content_embeddings.embedding <=> query_embedding) as similarity
FROM content_embeddings
-- Only return content above the similarity threshold
WHERE 1 - (content_embeddings.embedding <=> query_embedding) > match_threshold
-- Order by similarity (most similar first)
ORDER BY content_embeddings.embedding <=> query_embedding
LIMIT match_count;
END;
$$;
Step 7: Chat API
This is where RAG meets LLM - it retrieves relevant content and uses it to generate accurate responses:
// src/app/api/chat/route.ts
import { NextRequest, NextResponse } from 'next/server';
import OpenAI from 'openai';
import { searchContent, formatRAGContext } from '@/lib/rag-service';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export async function POST(request: NextRequest) {
const { message, conversationHistory = [] } = await request.json();
// Step 1: Search for relevant content using RAG
const ragContexts = await searchContent(message);
const ragContext = formatRAGContext(ragContexts);
// Step 2: Generate response using OpenAI with the retrieved context
const completion = await openai.chat.completions.create({
model: 'gpt-4o-mini', // Efficient model for most use cases
messages: [
{ role: 'system', content: `Use this context to answer: ${ragContext}` },
...conversationHistory, // Maintain conversation history
{ role: 'user', content: message }
],
temperature: 0.7, // Balance between creativity and consistency
});
return NextResponse.json({
reply: completion.choices[0]?.message?.content || 'Sorry, I could not generate a response.',
sources: ragContexts.map(ctx => ctx.source) // Show where the information came from
});
}
Step 8: Data Ingestion
This API populates your vector database with your content. Run this once to set up your knowledge base:
// src/app/api/ingest/route.ts
import { NextRequest, NextResponse } from 'next/server';
import { supabase } from '@/lib/supabase';
import { generateEmbeddings } from '@/lib/embeddings';
import { chunkContent } from '@/lib/content-chunker';
export async function POST(request: NextRequest) {
// Security: Only allow authorized requests (use a secret key)
const authHeader = request.headers.get('authorization');
if (authHeader !== `Bearer ${process.env.CRON_SECRET}`) {
return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
}
// Your content data - replace with your actual content
const contentItems = []; // Add your content here
let totalChunks = 0;
for (const item of contentItems) {
// Step 1: Break content into chunks
const chunks = chunkContent(item);
// Step 2: Generate embeddings for all chunks
const texts = chunks.map(chunk => chunk.content);
const embeddings = await generateEmbeddings(texts);
// Step 3: Store in database with embeddings
const records = chunks.map((chunk, index) => ({
content_id: chunk.contentId,
content_title: chunk.contentTitle,
section_type: chunk.sectionType,
content: chunk.content,
metadata: chunk.metadata,
embedding: embeddings[index] // The vector representation
}));
await supabase.from('content_embeddings').insert(records);
totalChunks += chunks.length;
}
return NextResponse.json({
success: true,
message: `Successfully ingested ${totalChunks} chunks`
});
}
Step 9: Frontend Interface
A simple chat interface that connects to your RAG system:
// src/components/chat-window.tsx
'use client';
import { useState } from 'react';
export default function ChatWindow() {
const [messages, setMessages] = useState<any[]>([]); // Store conversation history
const [input, setInput] = useState(''); // Current user input
const [loading, setLoading] = useState(false); // Loading state
const sendMessage = async () => {
if (!input.trim() || loading) return;
// Add user message to conversation
const userMessage = { role: 'user', content: input };
setMessages(prev => [...prev, userMessage]);
setInput('');
setLoading(true);
try {
// Send message to your RAG API
const response = await fetch('/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
message: input,
conversationHistory: messages // Pass conversation history for context
})
});
const data = await response.json();
// Add AI response to conversation
setMessages(prev => [...prev, {
role: 'assistant',
content: data.reply,
sources: data.sources // Show source attribution
}]);
} catch (error) {
console.error('Chat error:', error);
} finally {
setLoading(false);
}
};
return (
<div className="flex flex-col h-96 border rounded-lg p-4">
{/* Message display area */}
<div className="flex-1 overflow-y-auto space-y-4 mb-4">
{messages.map((message, index) => (
<div key={index} className={`flex ${message.role === 'user' ? 'justify-end' : 'justify-start'}`}>
<div className={`max-w-xs lg:max-w-md px-4 py-2 rounded-lg ${
message.role === 'user'
? 'bg-blue-500 text-white'
: 'bg-gray-200 text-gray-800'
}`}>
<p>{message.content}</p>
{/* Show source attribution for AI responses */}
{message.sources && (
<p className="text-xs mt-2 opacity-70">
Sources: {message.sources.join(', ')}
</p>
)}
</div>
</div>
))}
{/* Loading indicator */}
{loading && (
<div className="flex justify-start">
<div className="bg-gray-200 text-gray-800 px-4 py-2 rounded-lg">
<p>Thinking...</p>
</div>
</div>
)}
</div>
{/* Input area */}
<div className="flex space-x-2">
<input
type="text"
value={input}
onChange={(e) => setInput(e.target.value)}
onKeyPress={(e) => e.key === 'Enter' && sendMessage()}
placeholder="Ask about the content..."
className="flex-1 px-3 py-2 border rounded-lg focus:outline-none focus:ring-2 focus:ring-blue-500"
/>
<button
onClick={sendMessage}
disabled={loading || !input.trim()}
className="px-4 py-2 bg-blue-500 text-white rounded-lg hover:bg-blue-600 disabled:opacity-50"
>
Send
</button>
</div>
</div>
);
}
Step 10: Environment Variables
Set up your environment variables for API keys and database access:
# .env.local
OPENAI_API_KEY=sk-your-openai-key # OpenAI API key for embeddings and chat
NEXT_PUBLIC_SUPABASE_URL=https://your-project.supabase.co # Supabase project URL
NEXT_PUBLIC_SUPABASE_ANON_KEY=your-anon-key # Public Supabase key (safe for frontend)
SUPABASE_SERVICE_ROLE_KEY=your-service-role-key # Private key for server-side operations
CRON_SECRET=your-secret-for-ingestion # Secret for protecting ingestion API
3. Advanced Features
Smart Context Routing
Route queries to specific content sections based on user intent:
// Detect what type of information the user is looking for
export function detectIntent(query: string): string {
const lowerQuery = query.toLowerCase();
if (lowerQuery.includes('team')) return 'team'; // Team-related questions
if (lowerQuery.includes('problem')) return 'problem'; // Problem statements
if (lowerQuery.includes('solution')) return 'solution'; // Solutions and approaches
if (lowerQuery.includes('result')) return 'results'; // Results and outcomes
return 'overview'; // Default to general overview
}
// Use intent detection to search specific content sections
const intent = detectIntent(message);
const ragContexts = await searchContent(message, { sectionType: intent });
Performance Optimization
Add caching to improve response times for repeated queries:
// In-memory cache for RAG responses
const cache = new Map<string, { data: RAGContext[], timestamp: number }>();
const CACHE_TTL = 5 * 60 * 1000; // 5 minutes cache duration
export async function searchContent(query: string, options = {}) {
const cacheKey = `${query}:${JSON.stringify(options)}`;
const cached = cache.get(cacheKey);
// Return cached result if still valid
if (cached && Date.now() - cached.timestamp < CACHE_TTL) {
return cached.data;
}
// Perform actual search and cache the result
const results = await performSearch(query, options);
cache.set(cacheKey, { data: results, timestamp: Date.now() });
return results;
}
Conclusion
Congratulations! π You've built a RAG system that can intelligently answer questions about your content.
What you've accomplished:
- Set up a vector database with Supabase and pgvector
- Created a content chunking system that breaks your data into searchable pieces
- Implemented RAG search that finds relevant content based on user queries
- Built an interface that provides contextual responses about your content
Next steps:
- Add more content types: Documentation, FAQs, knowledge bases
- Implement conversation memory: Remember previous questions in the session
- Add multimedia support: Images, videos, interactive demos
- Create analytics: Track what users ask about most
Resources:
Your RAG system is now ready to provide intelligent, context-aware responses about your content! π
Want to see this in action? Check out the live demo at harshitchaturvedi.com - try asking it about the case studies!
Have questions? Drop a comment below. Happy coding! β¨
Top comments (0)