28/30 Days System Design Questions!

#ai #systemdesign #database #llm

You're building a semantic search feature for a B2B SaaS product.

The corpus: 4 million support articles, docs, and user-generated tickets. Users type natural language queries. They expect Google-quality results — not keyword matching.

Your current stack: PostgreSQL 15, Redis, and a Node.js backend. The search team says ILIKE and pg_trgm aren't cutting it. Embeddings are the answer. Now you need a place to store and query 1536-dimensional vectors (OpenAI ada-002) at <100ms p99.

4 million rows. ~24GB of raw embeddings. Query volume: 300 req/s with weekend spikes to 900 req/s.

Where do you store and query those vectors?

A) pgvector extension on your existing PostgreSQL — store embeddings in a new column, query with <-> cosine similarity.
B) Pinecone — fully managed vector database, serverless tier, no infra to run.
C) Weaviate — open-source vector DB, self-hosted on Kubernetes, full control over indexing.
D) Qdrant — open-source vector DB, Rust-based, self-hosted or cloud, optimized for high-throughput filtering.

All four are used in production at scale. But only one fits this scenario without hidden costs that bite you at 300 req/s.

Pick one — A, B, C, or D — and tell me why. Full breakdown in the comments.

If this is the debate your team is about to have, share it. These decisions are hard to reverse.

Drop your answer 👇

30DaysOfSystemDesign #SystemDesign #VectorDatabases #MachineLearning

Top comments (4)

Joud Awad • Jun 3

Why D wins (Qdrant):
Qdrant is purpose-built for exactly this workload. Rust core = low latency and predictable memory under load. At 300 req/s you need an index that handles concurrent ANN queries without degrading — Qdrant's HNSW implementation is tuned for this.

The killer feature here is payload filtering. In a B2B product, users search within a workspace, tenant, or product line — not globally. Qdrant handles vector search + metadata filter in one pass. Every other option forces post-filtering, which blows up recall and adds round-trips.

Self-hosted gives you full control over HNSW params (m, ef_construction), memory mapping, and on-disk indexing — critical when your corpus grows past RAM.

Joud Awad • Jun 3

Why A is the trap (pgvector):
pgvector is great for getting started — under ~500k vectors it's often the right call because you already own PostgreSQL. But at 4M rows and 300 req/s, the HNSW index runs inside Postgres's buffer pool, competing with your transactional workload. Latency goes non-linear under concurrency. VACUUM on a large embeddings table is painful.

Rule of thumb: pgvector under 500k vectors + low concurrency = fine. Above that, you're borrowing against future pain.

Joud Awad • Jun 3

Why B loses (Pinecone):
Pinecone works and teams ship with it fast. But the cost model bites hard at 300 req/s sustained — serverless tier prices per query unit, you're looking at thousands of dollars/month. There's also proprietary lock-in: no standard wire protocol, so migrating means re-implementing your ingestion pipeline from scratch.

Right tool when: speed to market matters more than cost, and query volume is under ~50 req/s.

Joud Awad • Jun 3

Why C loses (Weaviate):
Weaviate is solid with good hybrid search (BM25 + vector) and multi-modal support. But the Kubernetes footprint is heavier than Qdrant for this use case. For a team not already running K8s for their vector layer, the operational overhead is real. Weaviate shines for multi-modal search or semantic graph relationships — for pure dense vector search with filtering, Qdrant is leaner.