DEV Community

Cover image for Building a RAG Pipeline with Claude API and Supabase
Omer Farooq
Omer Farooq

Posted on

Building a RAG Pipeline with Claude API and Supabase

Building a RAG Pipeline with Claude API and Supabase

Tags: claude supabase rag ai


Retrieval-Augmented Generation (RAG) is one of those patterns that sounds academic until you actually build one — then you realize it's just smart plumbing. You store knowledge somewhere searchable, retrieve the relevant bits at query time, and feed them to an LLM as context. The LLM stops hallucinating because it's working from your data, not just its training weights.

In this article, I'll walk you through building a production-ready RAG pipeline using:

  • Claude API (Anthropic) — for generation and embeddings
  • Supabase — for vector storage via pgvector
  • Node.js — the glue

By the end, you'll have a pipeline that ingests documents, embeds them, stores them in Supabase, and answers questions grounded in that knowledge base.


Architecture Overview


[Documents] → [Chunker] → [Embedder] → [Supabase pgvector]
                                               ↓
[User Query] → [Embed Query] → [Similarity Search] → [Top-K Chunks]
                                                             ↓
                                               [Claude API + Context] → [Answer]
Enter fullscreen mode Exit fullscreen mode

Two phases: ingestion and retrieval + generation.


Prerequisites

  • Node.js 18+
  • A Supabase project (free tier works)
  • An Anthropic API key
npm install @anthropic-ai/sdk @supabase/supabase-js dotenv
Enter fullscreen mode Exit fullscreen mode

Step 1: Set Up Supabase for Vector Search

In your Supabase project, open the SQL editor and run:

-- Enable the pgvector extension
create extension if not exists vector;

-- Create the documents table
create table documents (
  id bigserial primary key,
  content text not null,
  metadata jsonb,
  embedding vector(1536)
);

-- Create an index for fast cosine similarity search
create index on documents
using ivfflat (embedding vector_cosine_ops)
with (lists = 100);
Enter fullscreen mode Exit fullscreen mode

Note: The embedding dimension (1536) matches voyage-3 embeddings from Anthropic. Adjust if you use a different model.


Step 2: Initialize Clients

// lib/clients.js
import Anthropic from '@anthropic-ai/sdk';
import { createClient } from '@supabase/supabase-js';
import 'dotenv/config';

export const anthropic = new Anthropic({
  apiKey: process.env.ANTHROPIC_API_KEY,
});

export const supabase = createClient(
  process.env.SUPABASE_URL,
  process.env.SUPABASE_SERVICE_ROLE_KEY
);
Enter fullscreen mode Exit fullscreen mode

Step 3: Ingestion Pipeline

3a. Chunk your documents

Chunking strategy matters more than people expect. Too large and you dilute relevance; too small and you lose context. A 512-token chunk with 50-token overlap is a solid starting point.

// lib/chunker.js
export function chunkText(text, chunkSize = 512, overlap = 50) {
  const words = text.split(/\s+/);
  const chunks = [];

  for (let i = 0; i < words.length; i += chunkSize - overlap) {
    const chunk = words.slice(i, i + chunkSize).join(' ');
    if (chunk.trim()) chunks.push(chunk);
  }

  return chunks;
}
Enter fullscreen mode Exit fullscreen mode

3b. Embed and store

// lib/ingest.js
import { anthropic, supabase } from './clients.js';
import { chunkText } from './chunker.js';

export async function ingestDocument(text, metadata = {}) {
  const chunks = chunkText(text);

  for (const chunk of chunks) {
    // Generate embedding via Anthropic Voyage
    const response = await anthropic.embeddings.create({
      model: 'voyage-3',
      input: chunk,
    });

    const embedding = response.data[0].embedding;

    // Store in Supabase
    const { error } = await supabase.from('documents').insert({
      content: chunk,
      metadata,
      embedding,
    });

    if (error) throw new Error(`Supabase insert failed: ${error.message}`);
  }

  console.log(`Ingested ${chunks.length} chunks.`);
}
Enter fullscreen mode Exit fullscreen mode

Step 4: Retrieval

// lib/retrieve.js
import { anthropic, supabase } from './clients.js';

export async function retrieve(query, topK = 5) {
  // Embed the query
  const response = await anthropic.embeddings.create({
    model: 'voyage-3',
    input: query,
  });

  const queryEmbedding = response.data[0].embedding;

  // Call the Supabase match function
  const { data, error } = await supabase.rpc('match_documents', {
    query_embedding: queryEmbedding,
    match_count: topK,
  });

  if (error) throw new Error(`Retrieval failed: ${error.message}`);
  return data;
}
Enter fullscreen mode Exit fullscreen mode

Add this SQL function to Supabase:

create or replace function match_documents(
  query_embedding vector(1536),
  match_count int default 5
)
returns table (
  id bigint,
  content text,
  metadata jsonb,
  similarity float
)
language sql stable
as $$
  select
    id,
    content,
    metadata,
    1 - (embedding <=> query_embedding) as similarity
  from documents
  order by embedding <=> query_embedding
  limit match_count;
$$;
Enter fullscreen mode Exit fullscreen mode

Step 5: Generation with Claude


// lib/generate.js
import { anthropic } from './clients.js';
import { retrieve } from './retrieve.js';

export async function answer(userQuery) {
  const chunks = await retrieve(userQuery);

  const context = chunks
    .map((c, i) => `[${i + 1}] ${c.content}`)
    .join('\n\n');

  const message = await anthropic.messages.create({
    model: 'claude-sonnet-4-20250514',
    max_tokens: 1024,
    system: `You are a helpful assistant. Answer the user's question using ONLY the context provided below. 
If the answer isn't in the context, say so — don't make things up.

Context:
${context}`,
    messages: [
      { role: 'user', content: userQuery },
    ],
  });

  return message.content[0].text;
}
Enter fullscreen mode Exit fullscreen mode

Step 6: Wire It Together

// main.js
import { ingestDocument } from './lib/ingest.js';
import { answer } from './lib/generate.js';

// --- Ingest phase ---
const doc = `
Supabase is an open-source Firebase alternative built on PostgreSQL.
It provides a real-time database, authentication, edge functions, and storage.
The pgvector extension enables storing and querying high-dimensional vectors directly in Postgres.
`;

await ingestDocument(doc, { source: 'manual', topic: 'supabase' });

// --- Query phase ---
const response = await answer('What is Supabase and what does pgvector do?');
console.log(response);
Enter fullscreen mode Exit fullscreen mode

What Good Output Looks Like

Supabase is an open-source alternative to Firebase, built on PostgreSQL. 
It offers a real-time database, authentication, edge functions, and storage.

The pgvector extension extends Postgres to support high-dimensional vectors, 
enabling you to store and query embeddings directly in the database — which 
is exactly what powers semantic search in RAG pipelines.
Enter fullscreen mode Exit fullscreen mode

Grounded, accurate, no hallucination.


Production Considerations

Chunking

  • Experiment with chunk size — 256–1024 tokens is the practical range
  • Overlapping chunks help preserve sentence-boundary context
  • For structured docs (API references, tables), consider semantic chunking

Retrieval quality

  • Add a metadata filter to scope searches: { source: 'docs-v2' }
  • Consider re-ranking retrieved chunks with a cross-encoder before generation
  • Log retrieval scores — if similarity drops below ~0.75, you may need better chunking

Scaling

  • Use ivfflat for up to ~1M vectors; switch to hnsw for larger datasets
  • Batch embedding calls during ingestion to stay within rate limits
  • Cache embeddings for frequently queried terms

Cost

  • Voyage embeddings are significantly cheaper than running inference — embed aggressively
  • Claude Haiku works well for simple Q&A RAG; use Sonnet when reasoning depth matters

Wrapping Up

This is the core skeleton of a RAG pipeline that actually works. The real craft is in tuning it — better chunking strategies, hybrid search (BM25 + vector), metadata filtering, and smart context window management. But this foundation will take you from zero to a grounded, retrieval-backed Claude assistant in a single afternoon.

If you extend this with streaming responses, a chat history layer, or a file upload frontend — that's a natural follow-up article. Drop a comment if you'd like to see it.


Tags: #claude #supabase #rag #ai #nodejs

Top comments (0)