DEV Community

Atlas Whoff
Atlas Whoff

Posted on

Build a RAG Chatbot Without Pinecone: pgvector + Next.js in Under 100 Lines

Everyone reaches for Pinecone or Weaviate when they need a vector database. For most production workloads at under a million documents, you already have everything you need: Postgres + pgvector.

Here's the full RAG setup in Next.js with zero new infrastructure.


What pgvector Gives You

pgvector adds vector storage and similarity search to Postgres. After a one-line extension install, you can:

  • Store embeddings as vector columns
  • Run approximate nearest-neighbor search with <-> (L2), <#> (inner product), or <=> (cosine) operators
  • Use standard Postgres indexes (IVFFlat, HNSW) for performance at scale

No new service, no new bill, no data leaving your existing infrastructure.


Setup

Enable pgvector on Supabase (or any Postgres):

create extension if not exists vector;
Enter fullscreen mode Exit fullscreen mode

Create the documents table:

create table documents (
  id uuid primary key default gen_random_uuid(),
  content text not null,
  embedding vector(1536),
  metadata jsonb default '{}',
  created_at timestamptz default now()
);

-- HNSW index for fast approximate search
create index on documents using hnsw (embedding vector_cosine_ops)
  with (m = 16, ef_construction = 64);
Enter fullscreen mode Exit fullscreen mode

Ingestion Pipeline

// lib/rag/ingest.ts
import Anthropic from '@anthropic-ai/sdk'
import { db } from '@/db'
import { documents } from '@/db/schema'

async function embed(text: string): Promise<number[]> {
  const response = await fetch('https://api.voyageai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.VOYAGE_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ input: text, model: 'voyage-3' }),
  })
  const data = await response.json()
  return data.data[0].embedding
}

export async function ingestDocument(content: string, metadata = {}) {
  const embedding = await embed(content)
  await db.insert(documents).values({ content, embedding, metadata })
}

export async function ingestBatch(docs: { content: string; metadata?: object }[]) {
  for (const doc of docs) {
    await ingestDocument(doc.content, doc.metadata)
    await new Promise(r => setTimeout(r, 100))
  }
}
Enter fullscreen mode Exit fullscreen mode

Retrieval

// lib/rag/retrieve.ts
import { sql } from 'drizzle-orm'
import { db } from '@/db'

export async function retrieveContext(
  query: string,
  limit = 5,
  threshold = 0.7
): Promise<string[]> {
  const queryEmbedding = await embed(query)
  const embeddingStr = `[${queryEmbedding.join(',')}]`

  const results = await db.execute(sql`
    select content, 1 - (embedding <=> ${embeddingStr}::vector) as similarity
    from documents
    where 1 - (embedding <=> ${embeddingStr}::vector) > ${threshold}
    order by embedding <=> ${embeddingStr}::vector
    limit ${limit}
  `)

  return results.rows.map((r: any) => r.content)
}
Enter fullscreen mode Exit fullscreen mode

The RAG API Route

// app/api/chat/route.ts
import Anthropic from '@anthropic-ai/sdk'
import { retrieveContext } from '@/lib/rag/retrieve'

const client = new Anthropic()

export async function POST(req: Request) {
  const { messages } = await req.json()
  const lastMessage = messages[messages.length - 1]

  const context = await retrieveContext(lastMessage.content)

  const systemPrompt = context.length > 0
    ? `You are a helpful assistant. Use the following context to answer questions accurately.\n\n<context>\n${context.join('\n\n---\n\n')}\n</context>\n\nIf the context doesn't contain the answer, say so clearly.`
    : 'You are a helpful assistant.'

  const stream = client.messages.stream({
    model: 'claude-sonnet-4-6',
    max_tokens: 1024,
    system: systemPrompt,
    messages,
  })

  return new Response(
    new ReadableStream({
      async start(controller) {
        for await (const chunk of stream) {
          if (chunk.type === 'content_block_delta' && chunk.delta.type === 'text_delta') {
            controller.enqueue(new TextEncoder().encode(chunk.delta.text))
          }
        }
        controller.close()
      },
    }),
    { headers: { 'Content-Type': 'text/plain; charset=utf-8' } }
  )
}
Enter fullscreen mode Exit fullscreen mode

Performance at Scale

pgvector's HNSW index maintains sub-10ms retrieval up to ~1M vectors on standard Postgres hardware.

Metric pgvector (HNSW) Pinecone
Query latency (10K docs) 2-5ms 20-50ms
Query latency (1M docs) 5-15ms 15-40ms
Setup complexity Low (extension) Medium (new service)
Cost (10K docs/day) ~$0 (existing DB) ~$70/mo
Exact search available Yes No (ANN only)
SQL joins on results Yes No

The SQL joins point is underrated. You can filter by metadata, join to users, restrict by permissions — all in one query. Pinecone requires a round-trip to Postgres after retrieval to do any of that.


When to Move to a Dedicated Vector DB

Switch when:

  • You're above 5M vectors and seeing latency >50ms
  • You need multi-modal embeddings (image + text in same index)
  • Your embedding dimensions exceed 2000 (pgvector limit)
  • You need real-time index updates at >1K writes/second

For 95% of SaaS products, you won't hit any of these before you're well-funded enough to care.


What I'm Building This Into

The AI SaaS Starter Kit includes a pre-built RAG module: pgvector schema, Drizzle queries, ingest pipeline, and the streaming chat route above.

If you want the full production-ready stack without starting from scratch:

  • AI SaaS Starter Kit — $99 one-time — includes RAG module, Claude API integration, auth, Stripe billing, and deployment config
  • Workflow Automator MCP — $15/mo — gives Claude access to your document store directly from the IDE

whoffagents.com

Top comments (0)