Vector Databases Explained: Embeddings, Similarity Search, and RAG
Text search finds exact keyword matches. Vector search finds semantic similarity — a search for 'car' returns results about 'vehicle' and 'automobile'. This is the foundation of RAG (Retrieval-Augmented Generation).
What Are Embeddings
An embedding is a list of ~1500 numbers that represents the semantic meaning of text. Similar text produces similar number vectors. The distance between vectors measures semantic similarity.
import Anthropic from '@anthropic-ai/sdk';
const client = new Anthropic();
// Convert text to a vector
async function embed(text: string): Promise<number[]> {
const response = await client.embeddings.create({
model: 'voyage-3',
input: text,
input_type: 'document',
});
return response.embeddings[0].embedding;
}
Storing Vectors in PostgreSQL (pgvector)
-- Enable pgvector extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Table with embedding column
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
text TEXT NOT NULL,
embedding VECTOR(1024) -- Dimension matches your embedding model
);
-- Index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops);
// Store document with embedding
async function storeDocument(text: string) {
const embedding = await embed(text);
await prisma.$executeRaw`
INSERT INTO documents (text, embedding)
VALUES (${text}, ${JSON.stringify(embedding)}::vector)
`;
}
Similarity Search
async function search(query: string, limit = 5) {
const queryEmbedding = await embed(query);
const results = await prisma.$queryRaw<{ text: string; similarity: number }[]>`
SELECT text, 1 - (embedding <=> ${JSON.stringify(queryEmbedding)}::vector) AS similarity
FROM documents
ORDER BY embedding <=> ${JSON.stringify(queryEmbedding)}::vector
LIMIT ${limit}
`;
return results;
}
Full RAG Pipeline
async function ragAnswer(question: string): Promise<string> {
// 1. Find relevant documents
const docs = await search(question, 3);
// 2. Build context from retrieved docs
const context = docs.map(d => d.text).join('\n\n');
// 3. Generate answer grounded in context
const message = await client.messages.create({
model: 'claude-opus-4-6',
max_tokens: 1024,
system: 'Answer based on the provided context only. Say "I don\'t know" if the answer is not in the context.',
messages: [{
role: 'user',
content: `Context:\n${context}\n\nQuestion: ${question}`
}]
});
return message.content[0].text;
}
Managed Vector Databases
| Option | Best For |
|---|---|
| pgvector (Neon/Supabase) | Already using Postgres |
| Pinecone | Large scale, managed |
| Weaviate | Open source, self-host |
| Qdrant | Performance-critical |
| Chroma | Local development |
Chunking Strategy
// Split large documents into ~500 token chunks with overlap
function chunkText(text: string, chunkSize = 500, overlap = 50): string[] {
const words = text.split(' ');
const chunks: string[] = [];
for (let i = 0; i < words.length; i += chunkSize - overlap) {
chunks.push(words.slice(i, i + chunkSize).join(' '));
}
return chunks;
}
RAG pipelines and vector search are core to the AI SaaS Starter Kit — pgvector setup, embedding helpers, and RAG endpoint included. $99 at whoffagents.com.
Top comments (0)