DEV Community

Marcus Feldman
Marcus Feldman

Posted on

What I Learned About Vector Databases When Building Semantic Search

When I first implemented semantic search for an e-commerce platform, I assumed any vector database would suffice. I quickly learned that engineering trade-offs—not theoretical capabilities—dictate success. After testing five open-source solutions against production workloads, here’s what matters for real-world deployment.

Core Architecture Trade-offs
Vector databases solve one problem: finding neighbors efficiently at scale. How they achieve this diverges dramatically.

Memory vs. Disk-Based Indexing

Testing a 10M vector dataset (768-dim Cohere embeddings), pure in-memory solutions like Faiss delivered 2ms queries but consumed 120GB RAM. Disk-optimized systems like Annoy used 8GB RAM but latency jumped to 15ms—unacceptable for real-time APIs.

Real-Time Updates

Only databases separating storage and compute (e.g., Milvus, Qdrant) handled live writes without rebuild penalties. When simulating user-generated content ingestion:

# Milvus pseudocode
result = client.delete("product_vectors", id=item_id) # Immediate consistency
client.insert(new_embedding) # Index updated in <100ms
Enter fullscreen mode Exit fullscreen mode

Systems requiring full index rebuilds like Annoy introduced 30-minute delays per batch update.

The Filtering Dilemma
Combining vector search with metadata filters seems trivial—until it degrades performance.

Pre- vs. Post-Filtering

Qdrant’s integrated filtering excelled for simple clauses:

where: { price: { gte: 50 }, category: "electronics" }
Enter fullscreen mode Exit fullscreen mode

But in a 50M vector test, complex joins (e.g., user.preferences ∩ product.tags) slowed queries by 4x. Weaviate’s graph traversal compounded latency for interconnected data.

Workaround: Pre-filter reduced dataset size before vector search:

product_ids = sql_db.query("SELECT id FROM products WHERE price > 50") # Fast
vector_results = vector_db.search(embedding, filter_ids=product_ids)
Enter fullscreen mode Exit fullscreen mode

Consistency Levels: When They Burn You
Most vector DBs default to eventual consistency. This caused bugs:

# Simulated user session - flawed flow
insert_vector(user_query_embedding) # Eventual consistency
recommendations = search(similar_to=user_query_embedding) # May miss new data
Enter fullscreen mode Exit fullscreen mode

Fixed with:

  1. Milvus’ session consistency for user sessions
  2. Qdrant’s write-then-read consistency

Hybrid Workload Reality Check
Vector-only benchmarks mislead. Actual search blends vectors, text, and filters:

System Vector + Text Search Latency (p95) Complex Filter Penalty
Milvus 34 ms 2.1x
Elasticsearch 62 ms 1.3x
Qdrant 28 ms 3.8x

Key insight: Elasticsearch’s inverted index aided text-heavy workloads despite slower vector search.

Deployment Considerations
Ignoring these cost me weeks:

  1. Kubernetes Operators: Milvus and Zilliz Cloud Helm charts simplified provisioning. Weaviate required manual StatefulSets.
  2. Index Build Memory: HNSW index creation for 10M vectors needed 2X runtime memory. Crashed pods with default k8s limits.
  3. GPU Acceleration: Faiss with CUDA improved batch inference (9000 QPS) but added NVidia driver dependencies.

What I’d Test Next

  1. Recovery Strategies: How systems rebuild indexes after node failure.
  2. Multi-Tenancy: Isolating customer data without performance hits.
  3. Hybrid Cloud: Storing vectors on-prem with cloud query nodes.

Tools are means, not ends. What worked for my 50M-vector product catalog would fail for real-time gaming analytics. Measure your access patterns first.

Top comments (0)