Understanding the two-phase search architecture that powers every OpenSearch query
Every time you run a search query in OpenSearch, the engine executes a carefully orchestrated two-phase dance across your cluster. The query phase finds the right documents. The fetch phase retrieves their contents. Understanding how these phases work - and how they interact - is the difference between a search that returns in milliseconds and one that times out.
As someone who has spent the last year maintaining the search-relevance plugin and contributing to OpenSearch core, I have seen how often engineers conflate these two phases. They optimize the wrong things, tune the wrong parameters, and wonder why their queries still lag. This post is what I wish I had read when I started.
Why Two Phases?
OpenSearch distributes data across multiple shards, often on multiple nodes. When you search, the coordinating node must ask every shard to participate. But there is a critical insight: finding the most relevant documents is lighter than retrieving their full contents.
If OpenSearch retrieved the full _source of every document from every shard just to find the top 10 results, the network bandwidth and I/O would be catastrophic. Instead, the engine splits the work:
Phase 1 - Query: Each shard finds its local top-N results, returning only lightweight metadata (doc IDs and scores).
Phase 2 - Fetch: The coordinating node identifies the true global top-N, then asks only the relevant shards for the full document contents.
This is the scatter-gather pattern in practice. The query phase is the scatter. The fetch phase is the gather.
Phase 1: The Query Phase
What Happens on Each Shard
When a search request arrives at the coordinating node, it is forwarded to every shard in the index (or the relevant shards if routing is specified). Here is what each shard does:
- Parse the query - The query DSL is converted into a Lucene Query object.
- Execute against segments - The query runs against all Lucene segments in the shard. With concurrent segment search (enabled by default in OpenSearch 3.0), these segments may be searched in parallel across CPU slices.
- Score documents - BM25 scoring is computed for each matching document. Term frequency and inverse document frequency are calculated locally within that shard.
-
Collect top-K - Using a priority queue, the shard keeps only the top N results by score (where N is
from + size). -
Return metadata - The shard returns only document IDs, scores, and sort values - never the full
_source.
The Local Scoring Problem
Here is a subtle but important detail: BM25 scores are computed using local shard statistics. Each shard calculates term frequency and document frequency based only on the documents it holds. This means a term that is rare globally but common on one shard will be scored differently there.
For most use cases, this approximation is good enough. But when you need truly accurate global scoring, OpenSearch offers dfs_query_then_fetch:
GET /my-index/_search?search_type=dfs_query_then_fetch
{
"query": {
"match": { "title": "distributed search" }
}
}
This adds a pre-query round-trip: the coordinating node fetches term statistics from all shards, then broadcasts them back so every shard uses global IDF. The trade-off? One extra network hop for scoring accuracy.
Aggregation Execution
If your query includes aggregations, they execute during the query phase too. Each shard builds its local aggregation buckets, then the coordinating node merges them. This is why deeply nested aggregations on high-cardinality fields can be so expensive - every shard is building and serializing large bucket structures.
Phase 2: The Fetch Phase
The Document Retrieval Problem
After the query phase completes, the coordinating node has a sorted list of global top-N results. But it only has document IDs and scores. If the client requested _source fields, highlighting, or script fields, the coordinating node must now fetch those.
What Happens During Fetch
- Merge and rank - The coordinating node merges all shard results, re-sorting by score to produce the true global top-N.
- Request documents - For each of the top-N document IDs, the coordinating node asks the owning shard for the full document content.
- Retrieve stored fields - Each shard fetches the requested fields from its stored field cache or disk.
- Apply highlighting - If requested, the highlighter (unified, plain, or fast vector) generates match snippets by re-analyzing the relevant text.
- Return results - The final documents are assembled and returned to the client.
The Hidden Cost
The fetch phase is where hidden costs emerge. If you request:
- Large
_sourcedocuments (10KB+ each) - Highlighting on large fields
- Script fields that execute at fetch time
- Deep pagination (
from: 10000, size: 10)
...the fetch phase can dominate your query latency. The query phase might finish in 5ms, but the fetch phase takes 200ms because it is retrieving and highlighting 10,000 documents just to return 10.
How size and from Control Everything
The size and from parameters are the primary levers for controlling both phases:
GET /my-index/_search
{
"from": 0,
"size": 10,
"query": { "match_all": {} }
}
Query Phase Impact
During the query phase, each shard must return from + size results. If you have 5 shards and request from: 100, size: 10, every shard must find and return its top 110 results. The coordinating node then merges 550 results to find the true global top 10.
With 10 shards and from: 10000, size: 10? Every shard returns 10,010 results. The coordinating node merges 100,100 results. This is why deep pagination is so expensive in OpenSearch.
Fetch Phase Impact
The fetch phase only retrieves documents for the final returned set. So with size: 10, only 10 documents are fetched, regardless of from. But the query phase already paid the price of finding and ranking all those candidate documents.
The terminate_after Escape Hatch
For use cases where you only need approximate results, terminate_after tells each shard to stop after finding N documents:
GET /my-index/_search
{
"terminate_after": 1000,
"query": { "match": { "status": "active" } }
}
This dramatically speeds up the query phase but means you might miss better-scoring documents that were found later. Use it for analytics, not for user-facing search.
Performance Tuning: Query Phase
1. Reduce Shard Count When Possible
More shards = more work in the query phase. If you have 50 shards for a 10GB index, you are creating unnecessary scatter-gather overhead. Aim for shards in the 20-50GB range.
2. Use Routing for Targeted Queries
If your queries always filter by a specific field (like user_id), use custom routing to ensure all of a user's documents land on the same shard:
POST /my-index/_doc?routing=user_123
{
"user_id": "user_123",
"content": "..."
}
Now searches with routing=user_123 hit only one shard. The query phase just got 10x faster.
3. Leverage Concurrent Segment Search
OpenSearch 3.0 enables concurrent segment search by default. For long-running queries with aggregations, this parallelizes segment processing within each shard. Ensure your CPU has cores to spare, or the overhead of thread coordination may not be worth it.
4. Be Careful with Aggregations
Terms aggregations on high-cardinality fields (like user_id or ip_address) force each shard to build massive bucket arrays. If you do not need exact counts, use shard_size to limit the buckets each shard returns:
{
"aggs": {
"top_users": {
"terms": {
"field": "user_id",
"size": 10,
"shard_size": 100
}
}
}
}
Performance Tuning: Fetch Phase
1. Use docvalue_fields Instead of _source
If you only need specific fields that are stored as doc values (keyword, numeric, date, boolean), request them directly instead of parsing the full _source:
GET /my-index/_search
{
"query": { "match_all": {} },
"docvalue_fields": ["status", "created_at"]
}
Doc values are columnar and much faster to retrieve than the full JSON _source.
2. Disable _source if You Do Not Need It
For purely analytic queries where you only care about aggregation results, explicitly disable _source retrieval:
GET /my-index/_search
{
"_source": false,
"aggs": {
"status_counts": {
"terms": { "field": "status" }
}
}
}
This skips the fetch phase entirely for hits.
3. Avoid Highlighting Large Fields
Highlighting requires re-analyzing the matched text. If your documents have 100KB text fields and you highlight them, the fetch phase will grind. Consider:
- Using the
fast_vector_highlighterwithterm_vectorstorage (trades index size for fetch speed) - Limiting fragment size and count
- Storing a truncated
summaryfield specifically for highlighting
4. Use Search After for Deep Pagination
Instead of from: 10000, use search_after with a point-in-time (PIT) for efficient deep pagination:
GET /my-index/_search
{
"size": 10,
"pit": { "id": "my-pit-id", "keep_alive": "1m" },
"search_after": [1625097600000, "doc_123"],
"sort": [{ "created_at": "asc" }, { "_id": "asc" }]
}
This avoids the from + size explosion in the query phase and is the recommended approach for scrolling through large result sets.
Real-World Example: Debugging a Slow Query
Let me walk through how I debugged a slow search in production. The query was timing out at 30 seconds:
GET /logs-*/_search
{
"from": 0,
"size": 50,
"query": { "match": { "message": "error" } },
"highlight": { "fields": { "message": {} } },
"aggs": {
"by_service": { "terms": { "field": "service.name", "size": 100 } }
}
}
The problem: 100 shards, each with 5GB of logs. The query phase was fast (BM25 on "error" is selective). But the fetch phase was retrieving 50 full log messages, each 50KB, and highlighting them across 100 shards.
The fix:
- Reduced
sizefrom 50 to 10 (users do not need 50 highlighted logs) - Added
"summary": { "type": "text" }field with 500-char limit for highlighting - Switched highlighting to the
summaryfield only - Added
"shard_size": 20to the aggregation
Query time dropped from 30 seconds to 800ms. The query phase was never the problem - the fetch phase was.
The Key Insight
Most OpenSearch performance problems are not query problems. They are fetch problems. The query phase is highly optimized - Lucene is incredibly fast at finding and scoring documents. The fetch phase is where your data model, field sizes, and retrieval strategy matter.
When you are debugging slow searches, ask yourself:
- Is the query phase slow? (Check with
_source: falseand no aggregations) - Is the fetch phase slow? (Compare with and without
_sourceretrieval) - Am I retrieving too many documents? (Reduce
size) - Am I highlighting too much text? (Use dedicated summary fields)
- Am I forcing deep pagination? (Use
search_after)
Conclusion
The two-phase search architecture in OpenSearch is not an implementation detail - it is the fundamental design that makes distributed search feasible. The query phase is about finding and ranking. The fetch phase is about retrieving and formatting. They have different performance characteristics, different tuning parameters, and different failure modes.
Understanding this split lets you diagnose problems faster, tune the right parameters, and build searches that scale. The next time a query is slow, do not just stare at the query DSL. Ask: is this a query phase problem, or a fetch phase problem?
The answer will guide you to the right fix.
I'm Prithvi S, Staff Software Engineer at Cloudera and open-source contributor to the OpenSearch project. I maintain the dashboards-search-relevance plugin and write about the internals of distributed systems. Follow my work on GitHub.
Top comments (0)