DEV Community

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Vector Database Toolkit

Vector Database Toolkit

Vector databases are the backbone of every RAG pipeline, semantic search engine, and recommendation system — but each one has different APIs, indexing strategies, and operational quirks. This toolkit gives you unified setup guides, working code examples, and benchmarking scripts for ChromaDB, Pinecone, Weaviate, and pgvector. Plus hybrid search patterns, indexing strategies, and production operational guides.

Key Features

  • Multi-Database Support — Unified Python client abstraction for ChromaDB, Pinecone, Weaviate, and pgvector with consistent CRUD operations
  • Setup & Migration Guides — Step-by-step setup for each database, including Docker configs, cloud provisioning, and schema migration scripts
  • Indexing Strategies — HNSW, IVF, and PQ index configuration with tuning guides for recall vs. speed tradeoffs
  • Hybrid Search — Combine dense vector search with sparse keyword search across all supported backends
  • Benchmarking Scripts — Measure query latency, throughput, recall@K, and memory usage across databases with your own data
  • Production Operations — Backup/restore procedures, monitoring queries, scaling guides, and cost estimation per database
  • Embedding Pipeline — Batch embedding generation with rate limiting, retry logic, and incremental upsert support

Quick Start

from vector_toolkit import VectorClient, EmbeddingPipeline

# 1. Initialize with any backend (same API for all)
client = VectorClient(
    backend="chromadb",
    connection={
        "persist_directory": "./chroma_db",
    },
    collection="product_catalog",
    embedding_model="text-embedding-3-small",
    dimensions=1536,
)

# 2. Index documents
documents = [
    {"id": "doc_1", "text": "Premium leather wallet with RFID blocking", "category": "accessories"},
    {"id": "doc_2", "text": "Wireless noise-canceling headphones", "category": "electronics"},
    {"id": "doc_3", "text": "Organic cotton crew neck t-shirt", "category": "apparel"},
]
client.upsert(documents, text_key="text", metadata_keys=["category"])

# 3. Search
results = client.search("high-quality audio equipment", top_k=5)
for r in results:
    print(f"[{r.score:.3f}] {r.id}: {r.text}")
Enter fullscreen mode Exit fullscreen mode

Architecture

┌─────────────────────────────────────────────┐
│           VectorClient (Unified API)         │
│                                              │
│  upsert() │ search() │ delete() │ count()   │
└──────────────────┬───────────────────────────┘
                   │
    ┌──────────────┼──────────────┐
    │              │              │
    ▼              ▼              ▼
┌────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│ChromaDB│  │ Pinecone │  │ Weaviate │  │ pgvector │
│(local) │  │ (cloud)  │  │(hybrid)  │  │  (SQL)   │
└────────┘  └──────────┘  └──────────┘  └──────────┘
                   │
                   ▼
         ┌─────────────────┐
         │EmbeddingPipeline│
         │ Batch + Rate    │
         │ Limit + Retry   │
         └─────────────────┘
Enter fullscreen mode Exit fullscreen mode

Usage Examples

Switch Backends Without Code Changes

# Development: Use ChromaDB (local, no setup)
dev_client = VectorClient(backend="chromadb", connection={"persist_directory": "./db"})

# Staging: Use pgvector (existing Postgres)
staging_client = VectorClient(
    backend="pgvector",
    connection={
        "host": "localhost",
        "port": 5432,
        "database": "vectors",
        "user": "app_user",
        "password": "${PGVECTOR_PASSWORD}",
    },
)

# Production: Use Pinecone (managed, scalable)
prod_client = VectorClient(
    backend="pinecone",
    connection={
        "api_key": "${PINECONE_API_KEY}",
        "environment": "us-east-1",
        "index_name": "product-catalog",
    },
)

# Same code works with all three:
results = client.search("wireless headphones", top_k=5)
Enter fullscreen mode Exit fullscreen mode

Hybrid Search (Dense + Sparse)

from vector_toolkit.search import HybridSearch

hybrid = HybridSearch(
    client=client,
    dense_weight=0.7,
    sparse_weight=0.3,     # BM25 keyword matching
    fusion="reciprocal_rank",  # reciprocal_rank | weighted_sum
)

results = hybrid.search(
    query="error code ERR-4012 connection timeout",
    top_k=10,
    filters={"category": "troubleshooting"},
)
# Dense search finds semantically similar docs about connection issues
# Sparse search catches the exact error code "ERR-4012"
Enter fullscreen mode Exit fullscreen mode

Benchmarking

from vector_toolkit.benchmark import Benchmark

bench = Benchmark(
    backends=["chromadb", "pgvector", "pinecone"],
    dataset="benchmark_data/1M_embeddings.npy",
    queries="benchmark_data/1000_queries.npy",
    ground_truth="benchmark_data/ground_truth.json",
    metrics=["latency_p50", "latency_p99", "recall_at_10", "throughput_qps"],
)

report = bench.run()
print(report.table())
# ┌───────────┬────────────┬────────────┬───────────┬──────────┐
# │ Backend   │ Latency P50│ Latency P99│ Recall@10 │ QPS      │
# ├───────────┼────────────┼────────────┼───────────┼──────────┤
# │ ChromaDB  │ 12ms       │ 45ms       │ 0.94      │ 180      │
# │ pgvector  │ 18ms       │ 62ms       │ 0.92      │ 250      │
# │ Pinecone  │ 22ms       │ 58ms       │ 0.96      │ 1200     │
# └───────────┴────────────┴────────────┴───────────┴──────────┘
Enter fullscreen mode Exit fullscreen mode

Index Tuning

from vector_toolkit.indexing import IndexConfig

# HNSW (best for most use cases)
hnsw_config = IndexConfig(
    type="hnsw",
    m=16,                    # Connections per node (higher = better recall, more memory)
    ef_construction=200,     # Build-time accuracy (higher = better index, slower build)
    ef_search=100,           # Query-time accuracy (higher = better recall, slower query)
)

# IVF (better for very large datasets > 10M vectors)
ivf_config = IndexConfig(
    type="ivf",
    nlist=1024,              # Number of clusters
    nprobe=32,               # Clusters to search (higher = better recall, slower)
)

client.create_index(config=hnsw_config)
Enter fullscreen mode Exit fullscreen mode

Configuration

# vector_toolkit_config.yaml
default_backend: "chromadb"

backends:
  chromadb:
    persist_directory: "./chroma_db"
    anonymized_telemetry: false

  pinecone:
    api_key: "${PINECONE_API_KEY}"
    environment: "us-east-1"
    index_name: "product-catalog"
    metric: "cosine"             # cosine | dotproduct | euclidean
    pod_type: "s1.x1"

  weaviate:
    url: "https://api.example.com"
    api_key: "${WEAVIATE_API_KEY}"
    schema_auto_create: true

  pgvector:
    host: "localhost"
    port: 5432
    database: "vectors"
    user: "${PG_USER}"
    password: "${PG_PASSWORD}"
    pool_size: 10

embedding:
  model: "text-embedding-3-small"
  dimensions: 1536
  batch_size: 100
  rate_limit_rpm: 3000
  retry_max: 3
  retry_delay_seconds: 1

indexing:
  type: "hnsw"
  m: 16
  ef_construction: 200
  ef_search: 100

search:
  default_top_k: 10
  hybrid_enabled: true
  dense_weight: 0.7
  sparse_weight: 0.3
  fusion_method: "reciprocal_rank"

benchmark:
  dataset_sizes: [10000, 100000, 1000000]
  query_count: 1000
  output_dir: "benchmark_results/"
Enter fullscreen mode Exit fullscreen mode

Best Practices

  1. Start with ChromaDB locally, migrate to managed in production — ChromaDB requires zero setup for prototyping; switch to Pinecone/Weaviate when you need scale.
  2. Choose the right distance metric — Use cosine for normalized embeddings (most common), dotproduct for unnormalized, euclidean for absolute distances.
  3. Tune HNSW parameters for your recall target — Default m=16, ef=100 gives ~95% recall. For 99%+ recall, increase ef_search to 200+.
  4. Use metadata filters before vector search — Filtering first, then searching the filtered subset is much faster than searching everything and post-filtering.
  5. Batch your upserts — Insert documents in batches of 100-500. Single-document inserts are 10-50x slower.
  6. Benchmark with YOUR data — Published benchmarks use synthetic data. Run the benchmarking scripts with your actual embeddings and query patterns.

Troubleshooting

Problem Cause Fix
Search returns irrelevant results Wrong distance metric or poor embedding model Switch to cosine metric; try text-embedding-3-large for better quality
Upsert is extremely slow Single-document inserts or no batching Use client.upsert_batch() with batch_size=500
pgvector queries slow on large tables Missing HNSW or IVF index Run client.create_index() — without an index, pgvector does brute-force scan
Pinecone returns timeout errors Index not fully initialized or quota exceeded Wait 2-3 minutes after index creation; check plan limits in Pinecone console

This is 1 of 11 resources in the AI Builder Pro toolkit. Get the complete [Vector Database Toolkit] with all files, templates, and documentation for $39.

Get the Full Kit →

Or grab the entire AI Builder Pro bundle (11 products) for $169 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)