The Life of a Search Query in OpenSearch: From REST Request to Results
Photo by comparefibre on Unsplash
Every time you send a search request to OpenSearch, you're not just running a query. You're orchestrating a mini-distributed computation across dozens or hundreds of nodes, shards, and segments. The journey from your GET /my-index/_search to the JSON response you get back is a masterclass in distributed systems design -- and understanding it is the key to debugging slow queries, tuning relevance, and operating clusters at scale.
I spend a lot of time working on the OpenSearch search-relevance plugin at Cloudera, and I've found that the engineers who understand the query path end up writing better search applications. So let's trace the complete life of a search query -- from the REST request to the final merged result.
The Entry Point: Coordinating Node
When you send a search request to your OpenSearch cluster, it hits one of the nodes -- any node. That node becomes the coordinating node for this request. It doesn't matter which node you hit; they all speak the cluster language and can route your query to the right places.
The coordinating node does something simple but critical: it figures out which shards need to participate.
Shard Routing: Hashing Your Way to the Right Data
OpenSearch distributes data across the cluster using shards. Each index is split into primary shards (with replica copies for redundancy). Every document lives on exactly one primary shard, determined by this formula:
shard = hash(routing_key) % num_shards
By default, the routing key is the document's _id. But you can specify a custom routing key at index time -- and if you do, you must use the same routing key at search time. This is how OpenSearch keeps writes and reads consistent without any central coordinator for data placement.
The coordinating node consults the cluster state (which it keeps in memory, synced from the cluster manager) to know which nodes currently hold which shards. Then it builds a list of target shards. If you have replicas, it intelligently round-robins between primary and replica shards to spread the load.
The Two-Phase Search Pattern
Here's where it gets interesting. OpenSearch doesn't just broadcast your query and wait for full documents. It uses a two-phase search pattern to minimize network traffic and memory pressure:
- Query Phase -- Lightweight. Ask each shard for the top-K document IDs and their scores.
- Fetch Phase -- Only if needed. Retrieve the full document source for the final top results.
This separation is crucial. Imagine a query that matches 10 million documents across 50 shards. You don't want 50 nodes sending you 10 million full JSON documents. You want them to tell you: "Here are the 100 best matches from my slice of the data." Then you pick the globally best 100 and fetch only those.

Photo by Christina Morillo from Pexels
Phase 1: Query Execution at the Shard Level
Each target shard receives the query and executes it independently. This is the scatter part of the scatter-gather pattern. The shard's local search happens against its Lucene segments -- the immutable, append-only data structures that Lucene (and therefore OpenSearch) uses for indexing.
Segments and Concurrent Search
Each shard contains multiple Lucene segments. In older OpenSearch versions, segments within a shard were searched sequentially. Starting with OpenSearch 3.0, concurrent segment search is enabled by default in "auto" mode.
When active, the segments are divided into slices and searched in parallel. The default slicing mechanism calculates:
max_slices = max(1, min(CPU_cores / 2, 4))
This is a game-changer for aggregations and large-range queries -- but it comes with trade-offs. It increases CPU usage, and it doesn't support certain query types like parent aggregations on join fields or queries with terminate_after. The coordinating node makes this decision per-query based on the query type and any pluggable deciders.
BM25 Scoring: Relevance at the Shard Level
For each matching document, the shard computes a relevance score using BM25 -- the default Okapi BM25 algorithm. This is computed at query time, not pre-computed at index time. The formula considers:
- Term Frequency (TF) -- How often the query term appears in the document
- Inverse Document Frequency (IDF) -- How rare the term is across all documents in the shard
-
Field Length Normalization -- Controlled by the
bparameter (default 0.75)
The b parameter is worth tuning. A higher value means longer documents are penalized more heavily. If your index has mixed-length documents -- say, tweets and blog posts -- you might want to lower b to 0.5 so shorter documents don't get an unfair boost.
Each segment returns its local top-K results (document IDs + scores) to the shard, which merges them and sends the shard's top-K to the coordinating node.
Phase 2: Gather and Merge
The coordinating node now has a collection of top-K results from every participating shard. It merges them by score to produce the global top-K. This is the gather phase.
Only now -- if the client requested full documents (the default for _search) -- does the coordinating node enter the fetch phase. It sends targeted requests to the shards that hold the winning documents, asking for the full _source JSON. The shards return the document bodies, and the coordinating node assembles the final response, preserving the globally sorted order.
This two-phase design means that even if your query touches 100 shards, the expensive full-document fetching only happens for the final size results (default 10). Everything else is just lightweight IDs and scores.
Why This Matters: Query Tuning and Debugging
Understanding this flow explains several common OpenSearch behaviors and optimization strategies:
1. The Refresh Interval and Near-Real-Time Search
When you index a document, it goes to an in-memory buffer and the translog. It's not searchable until the next refresh (default every 1 second), which creates a new Lucene segment. This is why OpenSearch is "near-real-time" -- not truly real-time. You can force a refresh or reduce the interval, but more frequent refreshes mean more segments and more merge pressure.
2. Deep Pagination is Expensive
If you request from=10000, size=100, the coordinating node must ask every shard for its top 10100 results, merge them all, and then throw away 10000 of them. This is why from + size pagination doesn't scale past a few thousand documents. Use search_after or the scroll API for deep pagination instead.
3. Relevance is Shard-Relative, Not Global
Because BM25 IDF is computed per-shard, a term that appears in 50% of documents on one shard but 1% on another will score differently. For small shards, this can cause noticeable scoring inconsistencies. The search_type=dfs_query_then_fetch option pre-computes global term statistics before the query phase -- but it adds a round-trip, so use it only when you need perfectly consistent scoring across shards.
4. Custom Routing for Performance
If your queries naturally filter by a dimension -- user ID, tenant ID, geographic region -- you can use that dimension as a custom routing key. This ensures all documents for that dimension live on the same shard, so only one shard needs to execute the query. The coordinating node becomes mostly a pass-through, and your query latency drops dramatically.
5. Force Merge with Caution
Segments are immutable, so background merging consolidates small segments into larger ones. Fewer segments generally mean faster searches -- but force-merging to a single segment is expensive and can hurt write performance. It also loses the benefits of concurrent segment search, since there's nothing left to parallelize.
The Plugin Perspective: Where the Query Path Can Be Extended
One of the reasons I find OpenSearch powerful is its plugin architecture. At various points in the query path, plugins can inject custom behavior:
- SearchPlugin registers custom query types, aggregations, and scorers -- directly affecting how the query phase executes at the shard level.
- ActionPlugin intercepts and modifies REST requests before they even reach the search logic.
- ConcurrentSearchRequestDecider (new in 2.17+) allows plugins to customize whether concurrent segment search activates for a given query.
In my work on the search-relevance plugin, I've seen how experiments with different scoring models can be plugged in at the SearchPlugin layer and executed during the query phase without changing the core scatter-gather flow. The architecture is modular enough to extend without breaking the fundamentals.
Conclusion: Respect the Query Path
A search query in OpenSearch isn't a simple database lookup. It's a distributed computation that involves:
- Hash routing to locate shards
- Scatter phase: parallel query execution across shards and segments
- BM25 scoring computed locally per shard
- Gather phase: merging and ranking across the cluster
- Optional fetch phase: retrieving full documents only for the winners
When you understand this flow, you stop treating OpenSearch like a black box. You start asking the right questions: Is my query hitting too many shards? Should I use custom routing? Is my refresh interval too aggressive? Why is this aggregation slow on large segments?
The best OpenSearch operators I know all have a mental model of this query path. I hope this post helps you build yours.
I'm Prithvi S, Staff Software Engineer at Cloudera and an OpenSearch contributor. I work on search-relevance and plugin internals. Follow my open-source work on GitHub: https://github.com/iprithv
Tags: opensearch, search-engine, distributed-systems, elasticsearch, lucene, backend, database, performance
Top comments (0)