ANKUSH CHOUDHARY JOHAL

Posted on May 4 • Originally published at johal.in

Retrospective: How We Built a Vector Search Pipeline with Weaviate 1.25 and Redis 7.4 for 100M Embeddings

#retrospective #built #vector #search

Retrospective: How We Built a Vector Search Pipeline with Weaviate 1.25 and Redis 7.4 for 100M Embeddings

Published: [Date], Authors: [Team Name]

Introduction

We needed a vector search pipeline to support 100 million high-dimensional embeddings (768D, BERT-based) for our e-commerce product recommendation engine. Latency requirements: <100ms p95 for search, 99.9% availability. We evaluated several vector databases and caching layers, ultimately choosing Weaviate 1.25 as our primary vector store and Redis 7.4 for hot-cache embedding lookups.

Architecture Overview

Our pipeline had three core layers:

Ingestion Layer: Kafka streams for embedding batch/real-time ingestion, with a Python-based worker pool that validates, normalizes, and batches embeddings before writing to Weaviate.
Storage Layer: Weaviate 1.25 cluster (3 nodes, 64 vCPU, 256GB RAM per node, NVMe storage) for persistent vector storage and ANN search. Redis 7.4 (5-node cluster, 32 vCPU, 128GB RAM per node) for caching frequently accessed embeddings and precomputed user vectors.
Query Layer: GraphQL API gateway that routes search requests: first checks Redis for cached results, falls back to Weaviate for full ANN search, then caches top results in Redis with a 1-hour TTL.

Key Implementation Details

Weaviate 1.25 Configuration

Weaviate 1.25 introduced improved HNSW index tuning and multi-tenancy support, which we leveraged for sharding embeddings by product category. Key config tweaks:

Set HNSW efConstruction to 512 and ef to 256 for a balance between index build time and search accuracy (recall@10 > 0.98).
Enabled compression (pq enabled with 96 centroids) to reduce per-embedding storage from 3KB to ~400B, cutting total storage costs by 65%.
Configured multi-tenancy to isolate high-traffic product categories (e.g., electronics, apparel) to prevent noisy neighbor issues.

Redis 7.4 Integration

Redis 7.4's new vector similarity search (VSS) module and improved cluster failover were critical. We used Redis for two use cases:

Embedding Cache: Stored 10% most frequently accessed embeddings (10M) in Redis using the VSS module for low-latency exact/approximate lookups when Weaviate was under load.
Precomputed User Vectors: Cached personalized user embedding vectors (updated daily) to avoid recomputing them on every search request.

Challenges and Optimizations

Ingestion Throughput

Initial ingestion rates were ~5k embeddings/sec, below our target of 20k/sec. We optimized by:

Batching Weaviate writes into 1k-embedding chunks, reducing API overhead.
Using Weaviate's async write endpoint to parallelize ingestion across worker nodes.
Pre-sharding embeddings by tenant ID before ingestion to avoid cross-node replication overhead.

Final ingestion rate: 22k embeddings/sec, with index build time for 100M embeddings reduced from 48 hours to 14 hours.

Search Latency

Initial p95 search latency was ~180ms. Optimizations:

Added Redis caching for top 100 search results per query, cutting p95 latency to ~65ms for cached queries.
Tuned Weaviate's HNSW ef parameter dynamically: higher ef for low-traffic categories, lower ef for high-traffic to reduce compute.
Deployed Weaviate nodes in the same availability zone as the query layer to reduce network latency.

Results

After 3 months of production use, our pipeline delivered:

100M embeddings indexed with 99.2% recall@10 for ANN search.
p95 search latency of 82ms (down from 180ms initial), meeting our <100ms target.
99.95% availability, with zero unplanned downtime for Weaviate or Redis clusters.
40% reduction in infrastructure costs vs. our initial design using a single managed vector database, thanks to Redis offloading read traffic.

Lessons Learned

Combine specialized tools: Weaviate excels at large-scale ANN search, Redis excels at low-latency caching—using both let us play to each tool's strengths.
Tune HNSW parameters for your workload: generic defaults will not deliver optimal performance for 100M+ embedding scales.
Test failure scenarios early: We found Redis cluster failover took 2 seconds initially, so we tuned heartbeat intervals to reduce failover time to <500ms.
Monitor embedding drift: We added a weekly job to check embedding distribution changes, which caught a model update that broke recall for 3 product categories.

Conclusion

Building a vector search pipeline for 100M embeddings required careful tool selection, iterative tuning, and layered optimization. Weaviate 1.25 and Redis 7.4 proved to be a reliable, cost-effective combination that met our performance and scale requirements. Future work includes migrating to Weaviate's new hybrid search feature and testing Redis 7.4's vector compression for further cost savings.

DEV Community

Retrospective: How We Built a Vector Search Pipeline with Weaviate 1.25 and Redis 7.4 for 100M Embeddings

Retrospective: How We Built a Vector Search Pipeline with Weaviate 1.25 and Redis 7.4 for 100M Embeddings

Introduction

Architecture Overview

Key Implementation Details

Weaviate 1.25 Configuration

Redis 7.4 Integration

Challenges and Optimizations

Ingestion Throughput

Search Latency

Results

Lessons Learned

Conclusion

Top comments (0)