Building a RAG Pipeline with Claude API and Supabase
Tags: claude supabase rag ai
Retrieval-Augmented Generation (RAG) is one of those patterns that sounds academic until you actually build one — then you realize it's just smart plumbing. You store knowledge somewhere searchable, retrieve the relevant bits at query time, and feed them to an LLM as context. The LLM stops hallucinating because it's working from your data, not just its training weights.
In this article, I'll walk you through building a production-ready RAG pipeline using:
- Claude API (Anthropic) — for generation and embeddings
-
Supabase — for vector storage via
pgvector - Node.js — the glue
By the end, you'll have a pipeline that ingests documents, embeds them, stores them in Supabase, and answers questions grounded in that knowledge base.
Architecture Overview
[Documents] → [Chunker] → [Embedder] → [Supabase pgvector]
↓
[User Query] → [Embed Query] → [Similarity Search] → [Top-K Chunks]
↓
[Claude API + Context] → [Answer]
Two phases: ingestion and retrieval + generation.
Prerequisites
- Node.js 18+
- A Supabase project (free tier works)
- An Anthropic API key
npm install @anthropic-ai/sdk @supabase/supabase-js dotenv
Step 1: Set Up Supabase for Vector Search
In your Supabase project, open the SQL editor and run:
-- Enable the pgvector extension
create extension if not exists vector;
-- Create the documents table
create table documents (
id bigserial primary key,
content text not null,
metadata jsonb,
embedding vector(1536)
);
-- Create an index for fast cosine similarity search
create index on documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
Note: The embedding dimension (1536) matches
voyage-3embeddings from Anthropic. Adjust if you use a different model.
Step 2: Initialize Clients
// lib/clients.js
import Anthropic from '@anthropic-ai/sdk';
import { createClient } from '@supabase/supabase-js';
import 'dotenv/config';
export const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
export const supabase = createClient(
process.env.SUPABASE_URL,
process.env.SUPABASE_SERVICE_ROLE_KEY
);
Step 3: Ingestion Pipeline
3a. Chunk your documents
Chunking strategy matters more than people expect. Too large and you dilute relevance; too small and you lose context. A 512-token chunk with 50-token overlap is a solid starting point.
// lib/chunker.js
export function chunkText(text, chunkSize = 512, overlap = 50) {
const words = text.split(/\s+/);
const chunks = [];
for (let i = 0; i < words.length; i += chunkSize - overlap) {
const chunk = words.slice(i, i + chunkSize).join(' ');
if (chunk.trim()) chunks.push(chunk);
}
return chunks;
}
3b. Embed and store
// lib/ingest.js
import { anthropic, supabase } from './clients.js';
import { chunkText } from './chunker.js';
export async function ingestDocument(text, metadata = {}) {
const chunks = chunkText(text);
for (const chunk of chunks) {
// Generate embedding via Anthropic Voyage
const response = await anthropic.embeddings.create({
model: 'voyage-3',
input: chunk,
});
const embedding = response.data[0].embedding;
// Store in Supabase
const { error } = await supabase.from('documents').insert({
content: chunk,
metadata,
embedding,
});
if (error) throw new Error(`Supabase insert failed: ${error.message}`);
}
console.log(`Ingested ${chunks.length} chunks.`);
}
Step 4: Retrieval
// lib/retrieve.js
import { anthropic, supabase } from './clients.js';
export async function retrieve(query, topK = 5) {
// Embed the query
const response = await anthropic.embeddings.create({
model: 'voyage-3',
input: query,
});
const queryEmbedding = response.data[0].embedding;
// Call the Supabase match function
const { data, error } = await supabase.rpc('match_documents', {
query_embedding: queryEmbedding,
match_count: topK,
});
if (error) throw new Error(`Retrieval failed: ${error.message}`);
return data;
}
Add this SQL function to Supabase:
create or replace function match_documents(
query_embedding vector(1536),
match_count int default 5
)
returns table (
id bigint,
content text,
metadata jsonb,
similarity float
)
language sql stable
as $$
select
id,
content,
metadata,
1 - (embedding <=> query_embedding) as similarity
from documents
order by embedding <=> query_embedding
limit match_count;
$$;
Step 5: Generation with Claude
// lib/generate.js
import { anthropic } from './clients.js';
import { retrieve } from './retrieve.js';
export async function answer(userQuery) {
const chunks = await retrieve(userQuery);
const context = chunks
.map((c, i) => `[${i + 1}] ${c.content}`)
.join('\n\n');
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-20250514',
max_tokens: 1024,
system: `You are a helpful assistant. Answer the user's question using ONLY the context provided below.
If the answer isn't in the context, say so — don't make things up.
Context:
${context}`,
messages: [
{ role: 'user', content: userQuery },
],
});
return message.content[0].text;
}
Step 6: Wire It Together
// main.js
import { ingestDocument } from './lib/ingest.js';
import { answer } from './lib/generate.js';
// --- Ingest phase ---
const doc = `
Supabase is an open-source Firebase alternative built on PostgreSQL.
It provides a real-time database, authentication, edge functions, and storage.
The pgvector extension enables storing and querying high-dimensional vectors directly in Postgres.
`;
await ingestDocument(doc, { source: 'manual', topic: 'supabase' });
// --- Query phase ---
const response = await answer('What is Supabase and what does pgvector do?');
console.log(response);
What Good Output Looks Like
Supabase is an open-source alternative to Firebase, built on PostgreSQL.
It offers a real-time database, authentication, edge functions, and storage.
The pgvector extension extends Postgres to support high-dimensional vectors,
enabling you to store and query embeddings directly in the database — which
is exactly what powers semantic search in RAG pipelines.
Grounded, accurate, no hallucination.
Production Considerations
Chunking
- Experiment with chunk size — 256–1024 tokens is the practical range
- Overlapping chunks help preserve sentence-boundary context
- For structured docs (API references, tables), consider semantic chunking
Retrieval quality
- Add a
metadatafilter to scope searches:{ source: 'docs-v2' } - Consider re-ranking retrieved chunks with a cross-encoder before generation
- Log retrieval scores — if similarity drops below ~0.75, you may need better chunking
Scaling
- Use
ivfflatfor up to ~1M vectors; switch tohnswfor larger datasets - Batch embedding calls during ingestion to stay within rate limits
- Cache embeddings for frequently queried terms
Cost
- Voyage embeddings are significantly cheaper than running inference — embed aggressively
- Claude Haiku works well for simple Q&A RAG; use Sonnet when reasoning depth matters
Wrapping Up
This is the core skeleton of a RAG pipeline that actually works. The real craft is in tuning it — better chunking strategies, hybrid search (BM25 + vector), metadata filtering, and smart context window management. But this foundation will take you from zero to a grounded, retrieval-backed Claude assistant in a single afternoon.
If you extend this with streaming responses, a chat history layer, or a file upload frontend — that's a natural follow-up article. Drop a comment if you'd like to see it.
Tags: #claude #supabase #rag #ai #nodejs




Top comments (0)