DEV Community

Cover image for 45/60 Days System Design Questions!
Joud Awad
Joud Awad

Posted on

45/60 Days System Design Questions!

Your search box returns results in 80ms.
Then you add 50M more documents. Now it's 4 seconds.
Your DBA says "add an index." Your search engineer says "that's not how this works."

They're both right — about different things.

Here's the setup:
Platform: SaaS product, 100M documents
Search fields: title, description, tags, partial phrases
Current stack: PostgreSQL full-text search
Current perf: p95 = 4.2s
Target: sub-200ms at p99

Product is escalating. What do you do?

A) Migrate to Elasticsearch — inverted index, purpose-built for full-text, sub-100ms at scale.
B) Add GIN indexes on your PostgreSQL tsvector columns — no new infra, meaningful performance gain.
C) Cache the top 1,000 most common queries in Redis — free win for the majority of traffic.
D) Move to Typesense or Meilisearch — simpler ops than Elasticsearch, built for search.

Three of these solve something real. Only one solves the problem you actually have.

Pick one — A, B, C, or D — and tell me why. I'll drop the full breakdown in the comments (including why two "almost right" answers get you partway there and then hit a wall).

Drop your answer 👇

Top comments (4)

Collapse
 
thejoud1997 profile image
Joud Awad

Why A wins (Elasticsearch):
Elasticsearch is built on an inverted index. When you index a document, Elasticsearch tokenizes the text and writes posting lists: "database" → [doc_1, doc_5, doc_23...], "index" → [doc_2, doc_5, doc_18...]. A query for "database index" does a fast posting-list intersection — not a table scan.

At 100M documents, that intersection still runs in milliseconds because posting lists are stored in delta-compressed sorted form, Lucene segments are immutable (no row locking, no MVCC overhead), horizontal sharding is native (queries fan out in parallel), and BM25 scoring runs during retrieval — not after it.

Result: sub-100ms at p99, even at 100M+ docs, with proper shard sizing. Netflix, Uber, and GitHub all run some form of this at their search layer.

The tradeoffs you accept: new infrastructure to operate (index lifecycle, shard rebalancing), write amplification from segment merging, and eventual consistency — by default, newly indexed docs aren't searchable for ~1 second.

Collapse
 
thejoud1997 profile image
Joud Awad

Why B is the trap answer (GIN indexes on PostgreSQL):
GIN indexes on tsvector are genuinely good. For 1M–5M documents, this is the right call — no new infra, no ops burden, real improvement.

At 100M documents, you hit three walls: index size balloons (GIN is large — easily 40–80GB), PostgreSQL full-text has no native horizontal read scaling, and BM25 relevance scoring is bolted on, not native. You get from 4.2s down to maybe 500ms. You need 200ms. Close, but the wrong tool at this scale.

GIN is what you reach for before you need a dedicated search engine, not instead of one.

Collapse
 
thejoud1997 profile image
Joud Awad

Why C misses the point (Redis cache):
Caching top queries is a real optimization — but it's not a search architecture. Your top 1,000 queries might cover 40% of traffic. The other 60% hits the slow path. Worse: every document update invalidates your cache logic. You've added complexity without touching the tail latency product is actually complaining about.

C is a band-aid. The p99 problem is still there the moment any user types something that isn't in the cache.

Collapse
 
thejoud1997 profile image
Joud Awad

Why D is tempting but limited (Typesense / Meilisearch):
Typesense and Meilisearch are excellent — simpler ops than Elasticsearch, great developer experience, built-in typo tolerance. For 10M–30M documents, D is a legitimate answer that wins on simplicity.

At 100M documents with strict p99 requirements and complex relevance tuning, you start hitting their limits: distributed indexing is less mature, fine-grained scoring customization is constrained, and operational tooling at scale is thinner than Elasticsearch's ecosystem. For a smaller product, D is the right call. At 100M with a p99 SLA, Elasticsearch is where you end up.