Omer Farooq

Posted on Mar 30

Building a RAG Pipeline with Claude API and Supabase

#claude #supabase #rag #ai

Building a RAG Pipeline with Claude API and Supabase

Tags: claude supabase rag ai

Retrieval-Augmented Generation (RAG) is one of those patterns that sounds academic until you actually build one — then you realize it's just smart plumbing. You store knowledge somewhere searchable, retrieve the relevant bits at query time, and feed them to an LLM as context. The LLM stops hallucinating because it's working from your data, not just its training weights.

In this article, I'll walk you through building a production-ready RAG pipeline using:

Claude API (Anthropic) — for generation and embeddings
Supabase — for vector storage via pgvector
Node.js — the glue

By the end, you'll have a pipeline that ingests documents, embeds them, stores them in Supabase, and answers questions grounded in that knowledge base.

Architecture Overview

[Documents] → [Chunker] → [Embedder] → [Supabase pgvector]
                                               ↓
[User Query] → [Embed Query] → [Similarity Search] → [Top-K Chunks]
                                                             ↓
                                               [Claude API + Context] → [Answer]

Two phases: ingestion and retrieval + generation.

Prerequisites

Node.js 18+
A Supabase project (free tier works)
An Anthropic API key

npm install @anthropic-ai/sdk @supabase/supabase-js dotenv

Step 1: Set Up Supabase for Vector Search

In your Supabase project, open the SQL editor and run:

-- Enable the pgvector extension
create extension if not exists vector;

-- Create the documents table
create table documents (
  id bigserial primary key,
  content text not null,
  metadata jsonb,
  embedding vector(1536)
);

-- Create an index for fast cosine similarity search
create index on documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);

Note: The embedding dimension (1536) matches voyage-3 embeddings from Anthropic. Adjust if you use a different model.

Step 2: Initialize Clients

// lib/clients.js
import Anthropic from '@anthropic-ai/sdk';
import { createClient } from '@supabase/supabase-js';
import 'dotenv/config';

export const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export const supabase = createClient(
  process.env.SUPABASE_URL,
  process.env.SUPABASE_SERVICE_ROLE_KEY
);

Step 3: Ingestion Pipeline

3a. Chunk your documents

Chunking strategy matters more than people expect. Too large and you dilute relevance; too small and you lose context. A 512-token chunk with 50-token overlap is a solid starting point.

// lib/chunker.js
export function chunkText(text, chunkSize = 512, overlap = 50) {
  const words = text.split(/\s+/);
  const chunks = [];

  for (let i = 0; i < words.length; i += chunkSize - overlap) {
    const chunk = words.slice(i, i + chunkSize).join(' ');
    if (chunk.trim()) chunks.push(chunk);
  }

  return chunks;
}

3b. Embed and store

// lib/ingest.js
import { anthropic, supabase } from './clients.js';
import { chunkText } from './chunker.js';

export async function ingestDocument(text, metadata = {}) {
  const chunks = chunkText(text);

  for (const chunk of chunks) {
    // Generate embedding via Anthropic Voyage
    const response = await anthropic.embeddings.create({
      model: 'voyage-3',
      input: chunk,
    });

    const embedding = response.data[0].embedding;

    // Store in Supabase
    const { error } = await supabase.from('documents').insert({
      content: chunk,
      metadata,
      embedding,
    });

    if (error) throw new Error(`Supabase insert failed: ${error.message}`);
  }

  console.log(`Ingested ${chunks.length} chunks.`);
}

Step 4: Retrieval

// lib/retrieve.js
import { anthropic, supabase } from './clients.js';

export async function retrieve(query, topK = 5) {
  // Embed the query
  const response = await anthropic.embeddings.create({
    model: 'voyage-3',
    input: query,
  });

  const queryEmbedding = response.data[0].embedding;

  // Call the Supabase match function
  const { data, error } = await supabase.rpc('match_documents', {
    query_embedding: queryEmbedding,
    match_count: topK,
  });

  if (error) throw new Error(`Retrieval failed: ${error.message}`);
  return data;
}

Add this SQL function to Supabase:

create or replace function match_documents(
  query_embedding vector(1536),
  match_count int default 5
)
returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
language sql stable
as $$
  select
    id,
    content,
    metadata,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  order by embedding <=> query_embedding
  limit match_count;
$$;

Step 5: Generation with Claude

// lib/generate.js
import { anthropic } from './clients.js';
import { retrieve } from './retrieve.js';

export async function answer(userQuery) {
  const chunks = await retrieve(userQuery);

  const context = chunks
    .map((c, i) => `[${i + 1}] ${c.content}`)
    .join('\n\n');

  const message = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    system: `You are a helpful assistant. Answer the user's question using ONLY the context provided below. 
If the answer isn't in the context, say so — don't make things up.

Context:
${context}`,
    messages: [
      { role: 'user', content: userQuery },
    ],
  });

  return message.content[0].text;
}

Step 6: Wire It Together

// main.js
import { ingestDocument } from './lib/ingest.js';
import { answer } from './lib/generate.js';

// --- Ingest phase ---
const doc = `
Supabase is an open-source Firebase alternative built on PostgreSQL.
It provides a real-time database, authentication, edge functions, and storage.
The pgvector extension enables storing and querying high-dimensional vectors directly in Postgres.
`;

await ingestDocument(doc, { source: 'manual', topic: 'supabase' });

// --- Query phase ---
const response = await answer('What is Supabase and what does pgvector do?');
console.log(response);

What Good Output Looks Like

Supabase is an open-source alternative to Firebase, built on PostgreSQL. 
It offers a real-time database, authentication, edge functions, and storage.

The pgvector extension extends Postgres to support high-dimensional vectors, 
enabling you to store and query embeddings directly in the database — which 
is exactly what powers semantic search in RAG pipelines.

Grounded, accurate, no hallucination.

Production Considerations

Chunking

Experiment with chunk size — 256–1024 tokens is the practical range
Overlapping chunks help preserve sentence-boundary context
For structured docs (API references, tables), consider semantic chunking

Retrieval quality

Add a metadata filter to scope searches: { source: 'docs-v2' }
Consider re-ranking retrieved chunks with a cross-encoder before generation
Log retrieval scores — if similarity drops below ~0.75, you may need better chunking

Scaling

Use ivfflat for up to ~1M vectors; switch to hnsw for larger datasets
Batch embedding calls during ingestion to stay within rate limits
Cache embeddings for frequently queried terms

Cost

Voyage embeddings are significantly cheaper than running inference — embed aggressively
Claude Haiku works well for simple Q&A RAG; use Sonnet when reasoning depth matters

Wrapping Up

This is the core skeleton of a RAG pipeline that actually works. The real craft is in tuning it — better chunking strategies, hybrid search (BM25 + vector), metadata filtering, and smart context window management. But this foundation will take you from zero to a grounded, retrieval-backed Claude assistant in a single afternoon.

If you extend this with streaming responses, a chat history layer, or a file upload frontend — that's a natural follow-up article. Drop a comment if you'd like to see it.

Tags: #claude #supabase #rag #ai #nodejs

DEV Community

Building a RAG Pipeline with Claude API and Supabase

Building a RAG Pipeline with Claude API and Supabase

Architecture Overview

Prerequisites

Step 1: Set Up Supabase for Vector Search

Step 2: Initialize Clients

Step 3: Ingestion Pipeline

3a. Chunk your documents

3b. Embed and store

Step 4: Retrieval

Step 5: Generation with Claude

Step 6: Wire It Together

What Good Output Looks Like

Production Considerations

Wrapping Up

Top comments (0)