OpenSearch is an open‑source search and analytics engine built on Apache Lucene. When you send a search request, a complex dance of components runs behind the scenes to turn a simple HTTP call into a ranked list of results. In this article we follow a query from the moment it hits the REST endpoint all the way to the final merged response, explaining each step in plain language while preserving the technical depth that engineers expect.
1. The Entry Point – REST Request
Everything starts with an HTTP request to the OpenSearch REST API, typically a GET /my-index/_search with a JSON body describing the query. The request can include parameters such as size, from, and sorting directives. The client can be anything from curl to a Python SDK, but the wire format is always the same: JSON over HTTP.
OpenSearch runs a lightweight HTTP server that parses the request and hands it over to the coordinating node – the node that received the request. In a cluster, any node can act as the coordinating node; it does not need to store data.
2. Routing the Request – Shard Selection
OpenSearch stores data in shards – Lucene indexes that are distributed across the cluster. Each document is assigned to a primary shard based on a routing value, which defaults to the document _id. The routing formula is:
hash(routing) % number_of_primary_shards
The coordinating node runs this hash function to determine which primary shards are responsible for the data the query touches. For a simple term query across a single index, the coordinating node may need to contact all primary shards of that index. If the query targets a specific routing value, the set of shards can be reduced dramatically, improving latency.
3. Query Phase – Parallel Execution on Shards
Once the responsible shards are known, the coordinating node forwards the query to the shard nodes (which may be the same physical node or a different one). Each shard executes the query locally against its Lucene segments. This phase has two important sub‑steps:
3.1 Segment Search
Lucene stores data in immutable segments. During the query phase, each segment is searched independently. OpenSearch can search segments in parallel within a shard – a feature called concurrent segment search introduced in version 3.0. The engine decides automatically how many slices to create based on CPU cores and segment size.
3.2 Scoring with BM25
For each matching document, Lucene computes a relevance score using the BM25 algorithm. The key parameters are term frequency, inverse document frequency, and the b length‑normalisation factor (default 0.75). The shard returns the top‑k (default 10) documents along with their scores.
4. Fetch Phase – Getting Full Documents
The query phase only returns document IDs and scores. If the client also requested the _source field (which most do), a second round called the fetch phase runs. The coordinating node asks each shard for the full source of the selected documents. Shards retrieve the stored _source from the Lucene stored fields and send it back.
Because the fetch phase may involve moving larger payloads over the network, OpenSearch tries to keep the number of fetched documents small. This is why pagination (from/size) and stored_fields filters are important performance knobs.
5. Merging Results – The Coordinating Node’s Role
After receiving the top‑k results from each shard, the coordinating node merges them into a single ranked list. It re‑applies the global size and from parameters, then sorts the combined set based on the BM25 scores returned by each shard. If the query includes custom sorts, the coordinating node also applies those rules.
The final merged list is then formatted as a JSON response and sent back to the client.
6. Behind the Scenes – Refresh, Translog, and Near‑Real‑Time Search
While the query is being processed, OpenSearch maintains a near‑real‑time view of the data. New documents are first written to an in‑memory buffer and appended to the translog for durability. Every second (the default refresh interval) the buffer is flushed to a new Lucene segment, making the freshly indexed documents searchable. This means there is typically a < 1‑second lag between indexing and visibility in search results.
7. Practical Tips for Optimising Queries
| Issue | Why it Happens | Mitigation |
|---|---|---|
| Slow query latency | Too many shards queried, high segment count | Use routing, configure index.routing_partition_size, force‑merge to reduce segments |
| High CPU usage | Concurrent segment search on large shards | Tune search.max_concurrent_shard_requests and search.max_concurrent_segments
|
| Stale results | Refresh interval too large for real‑time needs | Reduce index.refresh_interval on hot indices |
| Large payloads | Fetching full _source for many docs |
Use stored_fields or docvalue_fields, limit size
|
8. Conclusion
A search query in OpenSearch is more than a simple HTTP call. It involves routing, parallel shard execution, scoring, optional fetching, and a final merge step that stitches everything together. Understanding each stage helps you design better schemas, tune performance, and avoid common pitfalls such as unnecessary shard scans or excessive refresh intervals.
By visualising the journey of a query, you gain the confidence to diagnose latency issues, choose the right indexing strategies, and make the most of OpenSearch’s powerful plugin and analysis ecosystems.
Author bio: I'm Prithvi S, Staff Software Engineer at Cloudera and Opensource Enthusiast. Follow my work on GitHub: https://github.com/iprithv
Top comments (0)