Prithvi S

Posted on Jun 17 • Edited on Jul 4

OpenSearch Query Phase vs Fetch Phase: What Actually Happens on Each Shard

#opensearch #search #database #data

Understanding the two-phase search architecture that powers every OpenSearch query

Every time you run a search query in OpenSearch, the engine executes a carefully orchestrated two-phase dance across your cluster. The query phase finds the right documents. The fetch phase retrieves their contents. Understanding how these phases work - and how they interact - is the difference between a search that returns in milliseconds and one that times out.

As someone who has spent the last year maintaining the search-relevance plugin and contributing to OpenSearch core, I have seen how often engineers conflate these two phases. They optimize the wrong things, tune the wrong parameters, and wonder why their queries still lag. This post is what I wish I had read when I started.

Why Two Phases?

OpenSearch distributes data across multiple shards, often on multiple nodes. When you search, the coordinating node must ask every shard to participate. But there is a critical insight: finding the most relevant documents is lighter than retrieving their full contents.

If OpenSearch retrieved the full _source of every document from every shard just to find the top 10 results, the network bandwidth and I/O would be catastrophic. Instead, the engine splits the work:

Phase 1 - Query: Each shard finds its local top-N results, returning only lightweight metadata (doc IDs and scores).

Phase 2 - Fetch: The coordinating node identifies the true global top-N, then asks only the relevant shards for the full document contents.

This is the scatter-gather pattern in practice. The query phase is the scatter. The fetch phase is the gather.

Phase 1: The Query Phase

What Happens on Each Shard

When a search request arrives at the coordinating node, it is forwarded to every shard in the index (or the relevant shards if routing is specified). Here is what each shard does:

Parse the query - The query DSL is converted into a Lucene Query object.
Execute against segments - The query runs against all Lucene segments in the shard. With concurrent segment search (enabled by default in OpenSearch 3.0), these segments may be searched in parallel across CPU slices.
Score documents - BM25 scoring is computed for each matching document. Term frequency and inverse document frequency are calculated locally within that shard.
Collect top-K - Using a priority queue, the shard keeps only the top N results by score (where N is from + size).
Return metadata - The shard returns only document IDs, scores, and sort values - never the full _source.

The Local Scoring Problem

Here is a subtle but important detail: BM25 scores are computed using local shard statistics. Each shard calculates term frequency and document frequency based only on the documents it holds. This means a term that is rare globally but common on one shard will be scored differently there.

For most use cases, this approximation is good enough. But when you need truly accurate global scoring, OpenSearch offers dfs_query_then_fetch:

GET /my-index/_search?search_type=dfs_query_then_fetch
{
  "query": {
    "match": { "title": "distributed search" }
  }
}

This adds a pre-query round-trip: the coordinating node fetches term statistics from all shards, then broadcasts them back so every shard uses global IDF. The trade-off? One extra network hop for scoring accuracy.

Aggregation Execution

If your query includes aggregations, they execute during the query phase too. Each shard builds its local aggregation buckets, then the coordinating node merges them. This is why deeply nested aggregations on high-cardinality fields can be so expensive - every shard is building and serializing large bucket structures.

Phase 2: The Fetch Phase

The Document Retrieval Problem

After the query phase completes, the coordinating node has a sorted list of global top-N results. But it only has document IDs and scores. If the client requested _source fields, highlighting, or script fields, the coordinating node must now fetch those.

What Happens During Fetch

Merge and rank - The coordinating node merges all shard results, re-sorting by score to produce the true global top-N.
Request documents - For each of the top-N document IDs, the coordinating node asks the owning shard for the full document content.
Retrieve stored fields - Each shard fetches the requested fields from its stored field cache or disk.
Apply highlighting - If requested, the highlighter (unified, plain, or fast vector) generates match snippets by re-analyzing the relevant text.
Return results - The final documents are assembled and returned to the client.

The Hidden Cost

The fetch phase is where hidden costs emerge. If you request:

Large _source documents (10KB+ each)
Highlighting on large fields
Script fields that execute at fetch time
Deep pagination (from: 10000, size: 10)

...the fetch phase can dominate your query latency. The query phase might finish in 5ms, but the fetch phase takes 200ms because it is retrieving and highlighting 10,000 documents just to return 10.

How `size` and `from` Control Everything

The size and from parameters are the primary levers for controlling both phases:

GET /my-index/_search
{
  "from": 0,
  "size": 10,
  "query": { "match_all": {} }
}

Query Phase Impact

During the query phase, each shard must return from + size results. If you have 5 shards and request from: 100, size: 10, every shard must find and return its top 110 results. The coordinating node then merges 550 results to find the true global top 10.

With 10 shards and from: 10000, size: 10? Every shard returns 10,010 results. The coordinating node merges 100,100 results. This is why deep pagination is so expensive in OpenSearch.

Fetch Phase Impact

The fetch phase only retrieves documents for the final returned set. So with size: 10, only 10 documents are fetched, regardless of from. But the query phase already paid the price of finding and ranking all those candidate documents.

The `terminate_after` Escape Hatch

For use cases where you only need approximate results, terminate_after tells each shard to stop after finding N documents:

GET /my-index/_search
{
  "terminate_after": 1000,
  "query": { "match": { "status": "active" } }
}

This dramatically speeds up the query phase but means you might miss better-scoring documents that were found later. Use it for analytics, not for user-facing search.

Performance Tuning: Query Phase

1. Reduce Shard Count When Possible

More shards = more work in the query phase. If you have 50 shards for a 10GB index, you are creating unnecessary scatter-gather overhead. Aim for shards in the 20-50GB range.

2. Use Routing for Targeted Queries

If your queries always filter by a specific field (like user_id), use custom routing to ensure all of a user's documents land on the same shard:

POST /my-index/_doc?routing=user_123
{
  "user_id": "user_123",
  "content": "..."
}

Now searches with routing=user_123 hit only one shard. The query phase just got 10x faster.

3. Leverage Concurrent Segment Search

OpenSearch 3.0 enables concurrent segment search by default. For long-running queries with aggregations, this parallelizes segment processing within each shard. Ensure your CPU has cores to spare, or the overhead of thread coordination may not be worth it.

4. Be Careful with Aggregations

Terms aggregations on high-cardinality fields (like user_id or ip_address) force each shard to build massive bucket arrays. If you do not need exact counts, use shard_size to limit the buckets each shard returns:

{
  "aggs": {
    "top_users": {
      "terms": {
        "field": "user_id",
        "size": 10,
        "shard_size": 100
      }
    }
  }
}

Performance Tuning: Fetch Phase

1. Use `docvalue_fields` Instead of `_source`

If you only need specific fields that are stored as doc values (keyword, numeric, date, boolean), request them directly instead of parsing the full _source:

GET /my-index/_search
{
  "query": { "match_all": {} },
  "docvalue_fields": ["status", "created_at"]
}

Doc values are columnar and much faster to retrieve than the full JSON _source.

2. Disable `_source` if You Do Not Need It

For purely analytic queries where you only care about aggregation results, explicitly disable _source retrieval:

GET /my-index/_search
{
  "_source": false,
  "aggs": {
    "status_counts": {
      "terms": { "field": "status" }
    }
  }
}

This skips the fetch phase entirely for hits.

3. Avoid Highlighting Large Fields

Highlighting requires re-analyzing the matched text. If your documents have 100KB text fields and you highlight them, the fetch phase will grind. Consider:

Using the fast_vector_highlighter with term_vector storage (trades index size for fetch speed)
Limiting fragment size and count
Storing a truncated summary field specifically for highlighting

4. Use Search After for Deep Pagination

Instead of from: 10000, use search_after with a point-in-time (PIT) for efficient deep pagination:

GET /my-index/_search
{
  "size": 10,
  "pit": { "id": "my-pit-id", "keep_alive": "1m" },
  "search_after": [1625097600000, "doc_123"],
  "sort": [{ "created_at": "asc" }, { "_id": "asc" }]
}

This avoids the from + size explosion in the query phase and is the recommended approach for scrolling through large result sets.

Real-World Example: Debugging a Slow Query

Let me walk through how I debugged a slow search in production. The query was timing out at 30 seconds:

GET /logs-*/_search
{
  "from": 0,
  "size": 50,
  "query": { "match": { "message": "error" } },
  "highlight": { "fields": { "message": {} } },
  "aggs": {
    "by_service": { "terms": { "field": "service.name", "size": 100 } }
  }
}

The problem: 100 shards, each with 5GB of logs. The query phase was fast (BM25 on "error" is selective). But the fetch phase was retrieving 50 full log messages, each 50KB, and highlighting them across 100 shards.

The fix:

Reduced size from 50 to 10 (users do not need 50 highlighted logs)
Added "summary": { "type": "text" } field with 500-char limit for highlighting
Switched highlighting to the summary field only
Added "shard_size": 20 to the aggregation

Query time dropped from 30 seconds to 800ms. The query phase was never the problem - the fetch phase was.

The Key Insight

Most OpenSearch performance problems are not query problems. They are fetch problems. The query phase is highly optimized - Lucene is incredibly fast at finding and scoring documents. The fetch phase is where your data model, field sizes, and retrieval strategy matter.

When you are debugging slow searches, ask yourself:

Is the query phase slow? (Check with _source: false and no aggregations)
Is the fetch phase slow? (Compare with and without _source retrieval)
Am I retrieving too many documents? (Reduce size)
Am I highlighting too much text? (Use dedicated summary fields)
Am I forcing deep pagination? (Use search_after)

Conclusion

The two-phase search architecture in OpenSearch is not an implementation detail - it is the fundamental design that makes distributed search feasible. The query phase is about finding and ranking. The fetch phase is about retrieving and formatting. They have different performance characteristics, different tuning parameters, and different failure modes.

Understanding this split lets you diagnose problems faster, tune the right parameters, and build searches that scale. The next time a query is slow, do not just stare at the query DSL. Ask: is this a query phase problem, or a fetch phase problem?

The answer will guide you to the right fix.

About the author: I'm Prithvi S, Staff Software Engineer at Cloudera and Opensource Enthusiast. I contribute to Apache Lucene, OpenSearch, and related projects. Follow my work on GitHub.

DEV Community

OpenSearch Query Phase vs Fetch Phase: What Actually Happens on Each Shard

Why Two Phases?

Phase 1: The Query Phase

What Happens on Each Shard

The Local Scoring Problem

Aggregation Execution

Phase 2: The Fetch Phase

The Document Retrieval Problem

What Happens During Fetch

The Hidden Cost

How `size` and `from` Control Everything

Query Phase Impact

Fetch Phase Impact

The `terminate_after` Escape Hatch

Performance Tuning: Query Phase

1. Reduce Shard Count When Possible

2. Use Routing for Targeted Queries

3. Leverage Concurrent Segment Search

4. Be Careful with Aggregations

Performance Tuning: Fetch Phase

1. Use `docvalue_fields` Instead of `_source`

2. Disable `_source` if You Do Not Need It

3. Avoid Highlighting Large Fields

4. Use Search After for Deep Pagination

Real-World Example: Debugging a Slow Query

The Key Insight

Conclusion

Top comments (0)

Why Two Phases?

Phase 1: The Query Phase

What Happens on Each Shard

The Local Scoring Problem

Aggregation Execution

Phase 2: The Fetch Phase

The Document Retrieval Problem

What Happens During Fetch

The Hidden Cost

How size and from Control Everything

Query Phase Impact

Fetch Phase Impact

The terminate_after Escape Hatch

Performance Tuning: Query Phase

1. Reduce Shard Count When Possible

2. Use Routing for Targeted Queries

3. Leverage Concurrent Segment Search

4. Be Careful with Aggregations

Performance Tuning: Fetch Phase

1. Use docvalue_fields Instead of _source

2. Disable _source if You Do Not Need It

3. Avoid Highlighting Large Fields

4. Use Search After for Deep Pagination

Real-World Example: Debugging a Slow Query

The Key Insight

Conclusion

How `size` and `from` Control Everything

The `terminate_after` Escape Hatch

1. Use `docvalue_fields` Instead of `_source`

2. Disable `_source` if You Do Not Need It