Vector Database Toolkit
Vector databases are the backbone of every RAG pipeline, semantic search engine, and recommendation system — but each one has different APIs, indexing strategies, and operational quirks. This toolkit gives you unified setup guides, working code examples, and benchmarking scripts for ChromaDB, Pinecone, Weaviate, and pgvector. Plus hybrid search patterns, indexing strategies, and production operational guides.
Key Features
- Multi-Database Support — Unified Python client abstraction for ChromaDB, Pinecone, Weaviate, and pgvector with consistent CRUD operations
- Setup & Migration Guides — Step-by-step setup for each database, including Docker configs, cloud provisioning, and schema migration scripts
- Indexing Strategies — HNSW, IVF, and PQ index configuration with tuning guides for recall vs. speed tradeoffs
- Hybrid Search — Combine dense vector search with sparse keyword search across all supported backends
- Benchmarking Scripts — Measure query latency, throughput, recall@K, and memory usage across databases with your own data
- Production Operations — Backup/restore procedures, monitoring queries, scaling guides, and cost estimation per database
- Embedding Pipeline — Batch embedding generation with rate limiting, retry logic, and incremental upsert support
Quick Start
from vector_toolkit import VectorClient, EmbeddingPipeline
# 1. Initialize with any backend (same API for all)
client = VectorClient(
backend="chromadb",
connection={
"persist_directory": "./chroma_db",
},
collection="product_catalog",
embedding_model="text-embedding-3-small",
dimensions=1536,
)
# 2. Index documents
documents = [
{"id": "doc_1", "text": "Premium leather wallet with RFID blocking", "category": "accessories"},
{"id": "doc_2", "text": "Wireless noise-canceling headphones", "category": "electronics"},
{"id": "doc_3", "text": "Organic cotton crew neck t-shirt", "category": "apparel"},
]
client.upsert(documents, text_key="text", metadata_keys=["category"])
# 3. Search
results = client.search("high-quality audio equipment", top_k=5)
for r in results:
print(f"[{r.score:.3f}] {r.id}: {r.text}")
Architecture
┌─────────────────────────────────────────────┐
│ VectorClient (Unified API) │
│ │
│ upsert() │ search() │ delete() │ count() │
└──────────────────┬───────────────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐
│ChromaDB│ │ Pinecone │ │ Weaviate │ │ pgvector │
│(local) │ │ (cloud) │ │(hybrid) │ │ (SQL) │
└────────┘ └──────────┘ └──────────┘ └──────────┘
│
▼
┌─────────────────┐
│EmbeddingPipeline│
│ Batch + Rate │
│ Limit + Retry │
└─────────────────┘
Usage Examples
Switch Backends Without Code Changes
# Development: Use ChromaDB (local, no setup)
dev_client = VectorClient(backend="chromadb", connection={"persist_directory": "./db"})
# Staging: Use pgvector (existing Postgres)
staging_client = VectorClient(
backend="pgvector",
connection={
"host": "localhost",
"port": 5432,
"database": "vectors",
"user": "app_user",
"password": "${PGVECTOR_PASSWORD}",
},
)
# Production: Use Pinecone (managed, scalable)
prod_client = VectorClient(
backend="pinecone",
connection={
"api_key": "${PINECONE_API_KEY}",
"environment": "us-east-1",
"index_name": "product-catalog",
},
)
# Same code works with all three:
results = client.search("wireless headphones", top_k=5)
Hybrid Search (Dense + Sparse)
from vector_toolkit.search import HybridSearch
hybrid = HybridSearch(
client=client,
dense_weight=0.7,
sparse_weight=0.3, # BM25 keyword matching
fusion="reciprocal_rank", # reciprocal_rank | weighted_sum
)
results = hybrid.search(
query="error code ERR-4012 connection timeout",
top_k=10,
filters={"category": "troubleshooting"},
)
# Dense search finds semantically similar docs about connection issues
# Sparse search catches the exact error code "ERR-4012"
Benchmarking
from vector_toolkit.benchmark import Benchmark
bench = Benchmark(
backends=["chromadb", "pgvector", "pinecone"],
dataset="benchmark_data/1M_embeddings.npy",
queries="benchmark_data/1000_queries.npy",
ground_truth="benchmark_data/ground_truth.json",
metrics=["latency_p50", "latency_p99", "recall_at_10", "throughput_qps"],
)
report = bench.run()
print(report.table())
# ┌───────────┬────────────┬────────────┬───────────┬──────────┐
# │ Backend │ Latency P50│ Latency P99│ Recall@10 │ QPS │
# ├───────────┼────────────┼────────────┼───────────┼──────────┤
# │ ChromaDB │ 12ms │ 45ms │ 0.94 │ 180 │
# │ pgvector │ 18ms │ 62ms │ 0.92 │ 250 │
# │ Pinecone │ 22ms │ 58ms │ 0.96 │ 1200 │
# └───────────┴────────────┴────────────┴───────────┴──────────┘
Index Tuning
from vector_toolkit.indexing import IndexConfig
# HNSW (best for most use cases)
hnsw_config = IndexConfig(
type="hnsw",
m=16, # Connections per node (higher = better recall, more memory)
ef_construction=200, # Build-time accuracy (higher = better index, slower build)
ef_search=100, # Query-time accuracy (higher = better recall, slower query)
)
# IVF (better for very large datasets > 10M vectors)
ivf_config = IndexConfig(
type="ivf",
nlist=1024, # Number of clusters
nprobe=32, # Clusters to search (higher = better recall, slower)
)
client.create_index(config=hnsw_config)
Configuration
# vector_toolkit_config.yaml
default_backend: "chromadb"
backends:
chromadb:
persist_directory: "./chroma_db"
anonymized_telemetry: false
pinecone:
api_key: "${PINECONE_API_KEY}"
environment: "us-east-1"
index_name: "product-catalog"
metric: "cosine" # cosine | dotproduct | euclidean
pod_type: "s1.x1"
weaviate:
url: "https://api.example.com"
api_key: "${WEAVIATE_API_KEY}"
schema_auto_create: true
pgvector:
host: "localhost"
port: 5432
database: "vectors"
user: "${PG_USER}"
password: "${PG_PASSWORD}"
pool_size: 10
embedding:
model: "text-embedding-3-small"
dimensions: 1536
batch_size: 100
rate_limit_rpm: 3000
retry_max: 3
retry_delay_seconds: 1
indexing:
type: "hnsw"
m: 16
ef_construction: 200
ef_search: 100
search:
default_top_k: 10
hybrid_enabled: true
dense_weight: 0.7
sparse_weight: 0.3
fusion_method: "reciprocal_rank"
benchmark:
dataset_sizes: [10000, 100000, 1000000]
query_count: 1000
output_dir: "benchmark_results/"
Best Practices
- Start with ChromaDB locally, migrate to managed in production — ChromaDB requires zero setup for prototyping; switch to Pinecone/Weaviate when you need scale.
-
Choose the right distance metric — Use
cosinefor normalized embeddings (most common),dotproductfor unnormalized,euclideanfor absolute distances. -
Tune HNSW parameters for your recall target — Default
m=16, ef=100gives ~95% recall. For 99%+ recall, increaseef_searchto 200+. - Use metadata filters before vector search — Filtering first, then searching the filtered subset is much faster than searching everything and post-filtering.
- Batch your upserts — Insert documents in batches of 100-500. Single-document inserts are 10-50x slower.
- Benchmark with YOUR data — Published benchmarks use synthetic data. Run the benchmarking scripts with your actual embeddings and query patterns.
Troubleshooting
| Problem | Cause | Fix |
|---|---|---|
| Search returns irrelevant results | Wrong distance metric or poor embedding model | Switch to cosine metric; try text-embedding-3-large for better quality |
| Upsert is extremely slow | Single-document inserts or no batching | Use client.upsert_batch() with batch_size=500 |
| pgvector queries slow on large tables | Missing HNSW or IVF index | Run client.create_index() — without an index, pgvector does brute-force scan |
| Pinecone returns timeout errors | Index not fully initialized or quota exceeded | Wait 2-3 minutes after index creation; check plan limits in Pinecone console |
This is 1 of 11 resources in the AI Builder Pro toolkit. Get the complete [Vector Database Toolkit] with all files, templates, and documentation for $39.
Or grab the entire AI Builder Pro bundle (11 products) for $169 — save 30%.
Top comments (0)