DEV Community

Atlas Whoff
Atlas Whoff

Posted on

Vector Databases and RAG: Semantic Search, pgvector, and Answering Questions from Your Data

Vector databases make semantic search possible — finding documents by meaning rather than exact keywords. Combined with LLMs, they power RAG (Retrieval-Augmented Generation) applications that answer questions from your own data. Here's the practical implementation.

What Vector Search Solves

Keyword search: finds documents containing the exact words.
Vector search: finds documents with similar meaning.

Query: "how do I reset my password"

Keyword search finds: "password reset instructions", "reset password page"
Vector search also finds: "account recovery", "forgot credentials", "login issues"
Enter fullscreen mode Exit fullscreen mode

For documentation, customer support, and knowledge bases: vector search returns dramatically more relevant results.

The Embedding Pipeline

import OpenAI from 'openai'

const openai = new OpenAI()

async function embedText(text: string): Promise<number[]> {
  const response = await openai.embeddings.create({
    model: 'text-embedding-3-small', // 1536 dimensions, $0.02/1M tokens
    input: text,
  })
  return response.data[0].embedding
}

// Or with Claude (via Voyage AI)
// voyage-3-lite: fast and cheap for large-scale indexing
Enter fullscreen mode Exit fullscreen mode

Storing Vectors in Postgres with pgvector

-- Enable the pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with a vector column
CREATE TABLE documents (
  id BIGSERIAL PRIMARY KEY,
  content TEXT NOT NULL,
  metadata JSONB,
  embedding vector(1536)  -- dimension matches your model
);

-- Index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);
Enter fullscreen mode Exit fullscreen mode
// Insert a document with its embedding
async function indexDocument(content: string, metadata: object) {
  const embedding = await embedText(content)
  await db.$executeRaw`
    INSERT INTO documents (content, metadata, embedding)
    VALUES (${content}, ${JSON.stringify(metadata)}, ${JSON.stringify(embedding)}::vector)
  `
}
Enter fullscreen mode Exit fullscreen mode

Semantic Search Query

async function semanticSearch(query: string, limit = 5) {
  const queryEmbedding = await embedText(query)

  const results = await db.$queryRaw<Array<{
    id: number
    content: string
    metadata: object
    similarity: number
  }>>`
    SELECT id, content, metadata,
      1 - (embedding <=> ${JSON.stringify(queryEmbedding)}::vector) AS similarity
    FROM documents
    ORDER BY embedding <=> ${JSON.stringify(queryEmbedding)}::vector
    LIMIT ${limit}
  `

  return results.filter(r => r.similarity > 0.7) // threshold
}
Enter fullscreen mode Exit fullscreen mode

RAG: Answering Questions from Your Data

async function answerFromDocs(question: string): Promise<string> {
  // 1. Find relevant documents
  const relevantDocs = await semanticSearch(question, 5)

  if (relevantDocs.length === 0) {
    return 'I couldn\'t find relevant information to answer that question.'
  }

  // 2. Build context from retrieved documents
  const context = relevantDocs
    .map((doc, i) => `[${i + 1}] ${doc.content}`)
    .join('\n\n')

  // 3. Ask Claude with grounding context
  const response = await anthropic.messages.create({
    model: 'claude-sonnet-4-6',
    system: 'Answer questions using only the provided context. If the context doesn\'t contain the answer, say so.',
    messages: [{
      role: 'user',
      content: `Context:\n${context}\n\nQuestion: ${question}`,
    }],
  })

  return response.content[0].text
}
Enter fullscreen mode Exit fullscreen mode

Chunking Strategy

Document chunking significantly affects retrieval quality:

function chunkDocument(text: string, chunkSize = 500, overlap = 50): string[] {
  const words = text.split(' ')
  const chunks: string[] = []

  for (let i = 0; i < words.length; i += chunkSize - overlap) {
    chunks.push(words.slice(i, i + chunkSize).join(' '))
    if (i + chunkSize >= words.length) break
  }

  return chunks
}

// Index each chunk separately
for (const chunk of chunkDocument(document)) {
  await indexDocument(chunk, { sourceDocId: document.id })
}
Enter fullscreen mode Exit fullscreen mode

Managed Options

If you don't want to manage pgvector yourself:

  • Pinecone: Fully managed, generous free tier
  • Qdrant: Open-source, self-hostable or cloud
  • Supabase Vector: pgvector on Supabase
  • Neon: pgvector on Neon (same DB as your app)

The AI SaaS Starter at whoffagents.com includes a vector search module with pgvector + Prisma, embedding pipeline, semantic search, and RAG pattern pre-built. $99 one-time.

Top comments (0)