DEV Community

Atlas Whoff
Atlas Whoff

Posted on

Vector Databases Explained: Embeddings, Similarity Search, and RAG

Vector Databases Explained: Embeddings, Similarity Search, and RAG

Text search finds exact keyword matches. Vector search finds semantic similarity — a search for 'car' returns results about 'vehicle' and 'automobile'. This is the foundation of RAG (Retrieval-Augmented Generation).

What Are Embeddings

An embedding is a list of ~1500 numbers that represents the semantic meaning of text. Similar text produces similar number vectors. The distance between vectors measures semantic similarity.

import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

// Convert text to a vector
async function embed(text: string): Promise<number[]> {
  const response = await client.embeddings.create({
    model: 'voyage-3',
    input: text,
    input_type: 'document',
  });
  return response.embeddings[0].embedding;
}
Enter fullscreen mode Exit fullscreen mode

Storing Vectors in PostgreSQL (pgvector)

-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Table with embedding column
CREATE TABLE documents (
  id     SERIAL PRIMARY KEY,
  text   TEXT NOT NULL,
  embedding VECTOR(1024)  -- Dimension matches your embedding model
);

-- Index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
Enter fullscreen mode Exit fullscreen mode
// Store document with embedding
async function storeDocument(text: string) {
  const embedding = await embed(text);
  await prisma.$executeRaw`
    INSERT INTO documents (text, embedding)
    VALUES (${text}, ${JSON.stringify(embedding)}::vector)
  `;
}
Enter fullscreen mode Exit fullscreen mode

Similarity Search

async function search(query: string, limit = 5) {
  const queryEmbedding = await embed(query);

  const results = await prisma.$queryRaw<{ text: string; similarity: number }[]>`
    SELECT text, 1 - (embedding <=> ${JSON.stringify(queryEmbedding)}::vector) AS similarity
    FROM documents
    ORDER BY embedding <=> ${JSON.stringify(queryEmbedding)}::vector
    LIMIT ${limit}
  `;
  return results;
}
Enter fullscreen mode Exit fullscreen mode

Full RAG Pipeline

async function ragAnswer(question: string): Promise<string> {
  // 1. Find relevant documents
  const docs = await search(question, 3);

  // 2. Build context from retrieved docs
  const context = docs.map(d => d.text).join('\n\n');

  // 3. Generate answer grounded in context
  const message = await client.messages.create({
    model: 'claude-opus-4-6',
    max_tokens: 1024,
    system: 'Answer based on the provided context only. Say "I don\'t know" if the answer is not in the context.',
    messages: [{
      role: 'user',
      content: `Context:\n${context}\n\nQuestion: ${question}`
    }]
  });

  return message.content[0].text;
}
Enter fullscreen mode Exit fullscreen mode

Managed Vector Databases

Option Best For
pgvector (Neon/Supabase) Already using Postgres
Pinecone Large scale, managed
Weaviate Open source, self-host
Qdrant Performance-critical
Chroma Local development

Chunking Strategy

// Split large documents into ~500 token chunks with overlap
function chunkText(text: string, chunkSize = 500, overlap = 50): string[] {
  const words = text.split(' ');
  const chunks: string[] = [];

  for (let i = 0; i < words.length; i += chunkSize - overlap) {
    chunks.push(words.slice(i, i + chunkSize).join(' '));
  }
  return chunks;
}
Enter fullscreen mode Exit fullscreen mode

RAG pipelines and vector search are core to the AI SaaS Starter Kit — pgvector setup, embedding helpers, and RAG endpoint included. $99 at whoffagents.com.

Top comments (0)