DEV Community

Cover image for Day 25: Elastic Search Scaling - AI System Design in Seconds
Matt Frank
Matt Frank

Posted on

Day 25: Elastic Search Scaling - AI System Design in Seconds

When you're running an e-commerce platform with 100 million products and 50,000 queries per second, a naive search implementation doesn't just fail, it collapses. This architecture challenge sits at the intersection of distributed systems, data structures, and real-time optimization, where every millisecond counts. Today we're exploring how to design a search infrastructure that doesn't just scale to massive query volumes, but handles worst-case scenarios where a single query could match millions of products.

Architecture Overview

The key to elastic search scaling is breaking the problem into three distinct layers: ingestion, indexing, and query serving. At the ingestion layer, product data flows through message queues like Kafka, which decouple data producers from the search system and provide natural backpressure handling. This data then moves into the indexing pipeline, where Elasticsearch clusters organize products into shards based on product ID ranges or category hierarchies. The critical insight here is that sharding isn't optional, it's foundational. With 100 million products across a single node, you'd hit memory and query latency limits almost immediately. Instead, you distribute the index across dozens of nodes, each responsible for a subset of products.

The query serving layer sits in front of Elasticsearch and acts as an intelligent router and cache. This is where elasticity really happens. Search queries first hit a caching layer, Redis or Memcached, which captures frequently searched terms and their results. For cache misses, the query router distributes the request across multiple Elasticsearch shards in parallel, collecting results as they arrive. Load balancers ensure even distribution across nodes, and the system scales horizontally by adding more nodes to the cluster as query volume grows. The beauty of this architecture is that each component can scale independently: you can add more shards for indexing capacity, more query nodes for throughput, and more cache servers for hot queries.

Design Insight: The 5 Million Product Query Problem

Here's where things get genuinely tricky. A query that matches 5 million products out of 100 million can't simply fetch all results and sort them, not within 100 milliseconds. The solution relies on progressive filtering and early termination. First, the query applies broad filters that leverage Elasticsearch's inverted index structure, quickly eliminating products that don't match basic criteria like category, price range, or availability status. This dramatically reduces the candidate set from 5 million to perhaps 50,000 products. Second, instead of returning all 50,000 results, the system uses a two-phase approach: compute relevance scores for the top results needed for the current page using approximate algorithms, then only compute exact scores if the user requests deeper pagination. Finally, aggressive caching of common query patterns means many high-volume searches never hit the full pipeline. InfraSketch helps visualize these decision points clearly, showing exactly where filtering, caching, and parallel execution intersect.

Watch the Full Design Process

Want to see how this architecture evolved from the ground up? We generated this entire system design in real-time, making trade-off decisions as we built it. Check out the full demonstration on your preferred platform:

Try It Yourself

This is Day 25 of our 365-day system design challenge, and we're proving that complex architectures don't require hours of whiteboarding sessions or expensive design tools. Head over to InfraSketch and describe your system in plain English. In seconds, you'll have a professional architecture diagram, complete with a design document. Whether you're tackling search infrastructure, payment processing, or recommendation engines, you can generate production-ready architectures and see the design process happen in real-time.

Top comments (0)