Claude API Semantic Search: Embeddings Alternatives & RAG

#embeddings #rag #retrieval

Originally published at claudeguide.io/claude-api-semantic-search

Claude API for Semantic Search: Embeddings Alternatives and RAG Patterns

Claude doesn't offer a native embeddings API, but Anthropic's recommended partner Voyage AI provides embeddings optimized for Claude — and combining Voyage embeddings with Claude's generation creates a powerful semantic search and RAG (Retrieval-Augmented Generation) pipeline that outperforms single-vendor solutions by 15-20% on retrieval accuracy benchmarks. This guide covers the full architecture: embedding, indexing, retrieval, and generation.

For model selection and cost trade-offs, see Haiku vs Sonnet vs Opus.

Architecture Overview

Query → Voyage AI (embed) → Vector DB (search) → Top-K docs → Claude (generate answer)

Component	Recommended	Alternative
Embeddings	Voyage AI `voyage-3`	OpenAI `text-embedding-3-small`
Vector DB	Pinecone / pgvector	Qdrant / Weaviate / ChromaDB
Generation	Claude Sonnet	Claude Haiku (for cost)
Reranker	Voyage `rerank-2`	Cohere Rerank

Step 1: Generate Embeddings with Voyage AI

import voyageai

vo = voyageai.Client()  # Uses VOYAGE_API_KEY env var

# Embed documents (batch)
documents = [
    "Claude API supports streaming responses via SSE",
    "Prompt caching reduces costs by up to 90%",
    "Tool use enables function calling with type safety",
]

doc_embeddings = vo.embed(
    documents,
    model="voyage-3",
    input_type="document"
).embeddings

# Embed a query
query_embedding = vo.embed(
    ["How do I reduce Claude API costs?"],
    model="voyage-3",
    input_type="query"
).embeddings[0]

Cost: Voyage AI voyage-3 costs $0.06 per 1M tokens — embedding 10,000 documents of ~500 tokens each costs approximately $0.30.

Step 2: Store in a Vector Database

pgvector (PostgreSQL)

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    content TEXT NOT NULL,
    embedding vector(1024),  -- voyage-3 dimensions
    metadata JSONB DEFAULT '{}'
);

CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
    WITH (lists = 100);

import psycopg2

conn = psycopg2.connect("postgresql://...")
cur = conn.cursor()

for doc, emb in zip(documents, doc_embeddings):
    cur.execute(
        "INSERT INTO documents (content, embedding) VALUES (%s, %s)",
        (doc, emb)
    )
conn.commit()

Pinecone

from pinecone import Pinecone

pc = Pinecone()
index = pc.Index("claude-docs")

vectors = [
    {"id": f"doc-{i}", "values": emb, "metadata": {"text": doc}}
    for i, (doc, emb) in enumerate(zip(documents, doc_embeddings))
]
index.upsert(vectors=vectors)

Step 3: Retrieve and Generate with Claude

import anthropic

def semantic_search_and_answer(query: str, top_k: int = 5) -

---

## Advanced: Hybrid Search

Combine vector similarity with keyword search for better results: