Everyone reaches for Pinecone or Weaviate when they need a vector database. For most production workloads at under a million documents, you already have everything you need: Postgres + pgvector.
Here's the full RAG setup in Next.js with zero new infrastructure.
What pgvector Gives You
pgvector adds vector storage and similarity search to Postgres. After a one-line extension install, you can:
- Store embeddings as
vectorcolumns - Run approximate nearest-neighbor search with
<->(L2),<#>(inner product), or<=>(cosine) operators - Use standard Postgres indexes (IVFFlat, HNSW) for performance at scale
No new service, no new bill, no data leaving your existing infrastructure.
Setup
Enable pgvector on Supabase (or any Postgres):
create extension if not exists vector;
Create the documents table:
create table documents (
id uuid primary key default gen_random_uuid(),
content text not null,
embedding vector(1536),
metadata jsonb default '{}',
created_at timestamptz default now()
);
-- HNSW index for fast approximate search
create index on documents using hnsw (embedding vector_cosine_ops)
with (m = 16, ef_construction = 64);
Ingestion Pipeline
// lib/rag/ingest.ts
import Anthropic from '@anthropic-ai/sdk'
import { db } from '@/db'
import { documents } from '@/db/schema'
async function embed(text: string): Promise<number[]> {
const response = await fetch('https://api.voyageai.com/v1/embeddings', {
method: 'POST',
headers: {
'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
'Content-Type': 'application/json',
},
body: JSON.stringify({ input: text, model: 'voyage-3' }),
})
const data = await response.json()
return data.data[0].embedding
}
export async function ingestDocument(content: string, metadata = {}) {
const embedding = await embed(content)
await db.insert(documents).values({ content, embedding, metadata })
}
export async function ingestBatch(docs: { content: string; metadata?: object }[]) {
for (const doc of docs) {
await ingestDocument(doc.content, doc.metadata)
await new Promise(r => setTimeout(r, 100))
}
}
Retrieval
// lib/rag/retrieve.ts
import { sql } from 'drizzle-orm'
import { db } from '@/db'
export async function retrieveContext(
query: string,
limit = 5,
threshold = 0.7
): Promise<string[]> {
const queryEmbedding = await embed(query)
const embeddingStr = `[${queryEmbedding.join(',')}]`
const results = await db.execute(sql`
select content, 1 - (embedding <=> ${embeddingStr}::vector) as similarity
from documents
where 1 - (embedding <=> ${embeddingStr}::vector) > ${threshold}
order by embedding <=> ${embeddingStr}::vector
limit ${limit}
`)
return results.rows.map((r: any) => r.content)
}
The RAG API Route
// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk'
import { retrieveContext } from '@/lib/rag/retrieve'
const client = new Anthropic()
export async function POST(req: Request) {
const { messages } = await req.json()
const lastMessage = messages[messages.length - 1]
const context = await retrieveContext(lastMessage.content)
const systemPrompt = context.length > 0
? `You are a helpful assistant. Use the following context to answer questions accurately.\n\n<context>\n${context.join('\n\n---\n\n')}\n</context>\n\nIf the context doesn't contain the answer, say so clearly.`
: 'You are a helpful assistant.'
const stream = client.messages.stream({
model: 'claude-sonnet-4-6',
max_tokens: 1024,
system: systemPrompt,
messages,
})
return new Response(
new ReadableStream({
async start(controller) {
for await (const chunk of stream) {
if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
controller.enqueue(new TextEncoder().encode(chunk.delta.text))
}
}
controller.close()
},
}),
{ headers: { 'Content-Type': 'text/plain; charset=utf-8' } }
)
}
Performance at Scale
pgvector's HNSW index maintains sub-10ms retrieval up to ~1M vectors on standard Postgres hardware.
| Metric | pgvector (HNSW) | Pinecone |
|---|---|---|
| Query latency (10K docs) | 2-5ms | 20-50ms |
| Query latency (1M docs) | 5-15ms | 15-40ms |
| Setup complexity | Low (extension) | Medium (new service) |
| Cost (10K docs/day) | ~$0 (existing DB) | ~$70/mo |
| Exact search available | Yes | No (ANN only) |
| SQL joins on results | Yes | No |
The SQL joins point is underrated. You can filter by metadata, join to users, restrict by permissions — all in one query. Pinecone requires a round-trip to Postgres after retrieval to do any of that.
When to Move to a Dedicated Vector DB
Switch when:
- You're above 5M vectors and seeing latency >50ms
- You need multi-modal embeddings (image + text in same index)
- Your embedding dimensions exceed 2000 (pgvector limit)
- You need real-time index updates at >1K writes/second
For 95% of SaaS products, you won't hit any of these before you're well-funded enough to care.
What I'm Building This Into
The AI SaaS Starter Kit includes a pre-built RAG module: pgvector schema, Drizzle queries, ingest pipeline, and the streaming chat route above.
If you want the full production-ready stack without starting from scratch:
- AI SaaS Starter Kit — $99 one-time — includes RAG module, Claude API integration, auth, Stripe billing, and deployment config
- Workflow Automator MCP — $15/mo — gives Claude access to your document store directly from the IDE
Top comments (0)