Originally published at claudeguide.io/claude-api-semantic-search
Claude API for Semantic Search: Embeddings Alternatives and RAG Patterns
Claude doesn't offer a native embeddings API, but Anthropic's recommended partner Voyage AI provides embeddings optimized for Claude — and combining Voyage embeddings with Claude's generation creates a powerful semantic search and RAG (Retrieval-Augmented Generation) pipeline that outperforms single-vendor solutions by 15-20% on retrieval accuracy benchmarks. This guide covers the full architecture: embedding, indexing, retrieval, and generation.
For model selection and cost trade-offs, see Haiku vs Sonnet vs Opus.
Architecture Overview
Query → Voyage AI (embed) → Vector DB (search) → Top-K docs → Claude (generate answer)
| Component | Recommended | Alternative |
|---|---|---|
| Embeddings | Voyage AI voyage-3
|
OpenAI text-embedding-3-small
|
| Vector DB | Pinecone / pgvector | Qdrant / Weaviate / ChromaDB |
| Generation | Claude Sonnet | Claude Haiku (for cost) |
| Reranker | Voyage rerank-2
|
Cohere Rerank |
Step 1: Generate Embeddings with Voyage AI
import voyageai
vo = voyageai.Client() # Uses VOYAGE_API_KEY env var
# Embed documents (batch)
documents = [
"Claude API supports streaming responses via SSE",
"Prompt caching reduces costs by up to 90%",
"Tool use enables function calling with type safety",
]
doc_embeddings = vo.embed(
documents,
model="voyage-3",
input_type="document"
).embeddings
# Embed a query
query_embedding = vo.embed(
["How do I reduce Claude API costs?"],
model="voyage-3",
input_type="query"
).embeddings[0]
Cost: Voyage AI voyage-3 costs $0.06 per 1M tokens — embedding 10,000 documents of ~500 tokens each costs approximately $0.30.
Step 2: Store in a Vector Database
pgvector (PostgreSQL)
CREATE EXTENSION IF NOT EXISTS vector;
CREATE TABLE documents (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
embedding vector(1024), -- voyage-3 dimensions
metadata JSONB DEFAULT '{}'
);
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
import psycopg2
conn = psycopg2.connect("postgresql://...")
cur = conn.cursor()
for doc, emb in zip(documents, doc_embeddings):
cur.execute(
"INSERT INTO documents (content, embedding) VALUES (%s, %s)",
(doc, emb)
)
conn.commit()
Pinecone
from pinecone import Pinecone
pc = Pinecone()
index = pc.Index("claude-docs")
vectors = [
{"id": f"doc-{i}", "values": emb, "metadata": {"text": doc}}
for i, (doc, emb) in enumerate(zip(documents, doc_embeddings))
]
index.upsert(vectors=vectors)
Step 3: Retrieve and Generate with Claude
import anthropic
def semantic_search_and_answer(query: str, top_k: int = 5) -
---
## Advanced: Hybrid Search
Combine vector similarity with keyword search for better results:
python
def hybrid_search(query: str, top_k: int = 10) -
Top comments (0)