Day 25: Elastic Search Scaling - AI System Design in Seconds

#ecommerce #scalability #systemdesign #infrasketch

When you're searching across 100 million products while fielding 50,000 queries per second, traditional database queries simply won't cut it. Elasticsearch becomes the backbone of modern e-commerce search, but scaling it to handle massive result sets without sacrificing latency is where the real engineering challenge begins. This is the kind of problem that separates good architectures from great ones, and it's exactly what we're exploring today on Day 25 of our 365-day system design challenge.

Architecture Overview

At the core of an e-commerce search infrastructure sits Elasticsearch, a distributed search and analytics engine built on top of Lucene. The system needs to handle both the indexing pipeline (products flowing in from your catalog) and the query pipeline (millions of users searching simultaneously). The key is partitioning your data across multiple shards, each holding a subset of your 100 million products. When a query arrives, it hits all shards in parallel, and each shard returns matching results independently. This horizontal scaling approach is what allows you to handle 50,000 queries per second without overwhelming any single node.

Behind Elasticsearch, you'll typically find a distributed cache layer, such as Redis, sitting between your application and search engine. The cache captures hot queries and frequent product views, reducing the actual load on Elasticsearch. Additionally, you need a robust indexing pipeline that can ingest product updates, deletions, and new catalog entries without disrupting query performance. Message queues like Kafka or RabbitMQ decouple this indexing work from the query path, ensuring that a catalog update doesn't create query latency spikes. Your API layer then coordinates between cache, search engine, and data stores, routing requests intelligently based on whether results are cached or need a fresh search.

Load balancing across multiple Elasticsearch clusters is another critical component. Running a single cluster becomes a single point of failure and a bottleneck. By distributing traffic across geo-distributed clusters or multiple clusters within a data center, you can absorb spikes and degrade gracefully. Query routing logic can send requests to the least-loaded cluster or route by geographic proximity for lower latency.

Design Insight: Handling 5 Million Matching Products in Under 100ms

Here's the uncomfortable truth: you can't return all 5 million matching products in 100ms. Instead, the architecture relies on three key strategies working together. First, pagination and limiting results per page means your query typically only needs to fetch the top 10 to 100 products, not millions. Elasticsearch uses a technique called "query then fetch" where each shard identifies its top N matches, then only retrieves document details for the final result set. Second, aggressive filtering in the query phase reduces the 5 million matches to a much smaller working set before ranking begins. Pre-computing aggregations and facets ahead of time (during indexing) allows you to serve "100 laptops matching your search" instantly. Third, circuit breakers and timeout policies prevent queries from consuming unbounded resources. If a query would require scanning too many documents, it fails fast and returns partial results rather than hanging the entire system. The combination of these techniques is what makes sub-100ms response times achievable even with massive match sets.

Watch the Full Design Process

Want to see how this architecture came together? Check out the real-time design session where we sketched out this entire system:

Watching the design process unfold shows you not just the final architecture, but the reasoning behind each component and how they interact under pressure.

Try It Yourself

Ready to design your own search infrastructure? Head over to InfraSketch and describe your system in plain English. In seconds, you'll have a professional architecture diagram, complete with a design document. Whether you're scaling Elasticsearch, designing a recommendation engine, or building a distributed cache strategy, InfraSketch helps you visualize and validate your architecture instantly.