Deep Dive: Query and Fetch Phases in OpenSearch

#opensearch #search #database #data

The Inner Workings of OpenSearch: From Query to Fetch, Plugins to Performance

OpenSearch has become the go‑to open‑source search and analytics engine for many modern data‑intensive applications. While most users interact with it through simple REST calls, the system behind the scenes is a sophisticated, distributed architecture that balances latency, relevance, and scalability. This article walks through the core components that make OpenSearch tick, how you can extend it with plugins, and practical tips to squeeze out performance.

Search Execution: Query Phase vs Fetch Phase
Concurrent Segment Search and Caching
Extending OpenSearch with Plugins
Data Lifecycle: Refresh, Flush, and Translog
Practical Performance Tweaks
Conclusion

Search Execution: Query Phase vs Fetch Phase

When a client sends a search request to OpenSearch, the request lands on a coordinating node. The coordinating node does not hold any data itself – its job is to scatter the request to the relevant primary shards, collect partial results, and gather them into the final response.

Step 1 – Routing – OpenSearch computes the target shard for each document using the routing key (by default the document _id). The coordinating node forwards the query to each shard’s primary copy.
Step 2 – Query Phase – Each shard executes the query locally against its Lucene segments. This phase performs scoring, aggregations, and returns the top‑K hits per shard. No source fields are fetched yet; only doc IDs and scores travel back.
Step 3 – Gather Phase – The coordinating node merges the top‑K lists from all shards, re‑ranks them globally, and if the client requested the _source fields, it issues a fetch phase request to retrieve the full documents.
Step 4 – Fetch Phase – Shards look up the stored fields for the selected doc IDs and return them to the coordinating node, which assembles the final JSON response.

Why the Two‑Phase Model?

Latency – By separating scoring from fetching, OpenSearch can limit network payload to just IDs and scores when the client only needs aggregates.
Bandwidth – Large source documents stay on the shard nodes until they are truly needed, reducing unnecessary data transfer.
Scalability – The query phase can be parallelised across thousands of shards, while the fetch phase only touches a small subset of documents.

Concurrent Segment Search and Caching

Concurrent Segment Search

Starting with OpenSearch 3.0, concurrent segment search is enabled by default in auto mode. Instead of searching each Lucene segment sequentially, the engine slices the work and runs it in parallel across CPU cores.

Slice Count – Determined dynamically: max(1, min(CPU_cores/2, 4)) by default.
Benefits – Lower query latency for heavy aggregations or large result sets.
Trade‑offs – Higher CPU consumption; not all queries benefit (e.g., simple term queries).

Request Cache vs Query Cache

Request Cache – Caches the entire request‑response on a per‑shard basis. Works best for identical queries that hit the same segments. Disabled for queries with search_type=dfs_query_then_fetch or those that modify data.
Query Cache – Stores the filter‑only portion of a query (e.g., term filters). It is reusable across different queries that share the same filter clause. Certain constructs like script, nested, or function_score bypass the query cache.

Tuning Tips

Increase indices.queries.cache.size if you have a high hit‑rate on filter‑heavy workloads.
Set indices.requests.cache.size to a reasonable fraction of heap (e.g., 10%).
Use the _cache/clear API during maintenance windows to avoid stale cache entries.

Extending OpenSearch with Plugins

OpenSearch’s plugin system lets you add new REST endpoints, custom queries, analysers, and more. A plugin is a ZIP file containing a plugin-descriptor.properties file and a JAR with compiled Java code.

Core Extension Points

ActionPlugin – Register new transport and REST actions. Ideal for custom admin APIs.
SearchPlugin – Add custom query builders, aggregations, or suggesters.
AnalysisPlugin – Provide new tokenizers, char filters, or token filters.
ScriptPlugin – Introduce a new scripting language or extend Painless.

Minimal Plugin Example (SearchPlugin)

public class MySearchPlugin extends Plugin implements SearchPlugin {
    @Override
    public List<QuerySpec<?>> getQueries() {
        return Collections.singletonList(
            new QuerySpec<>(MyCustomQueryBuilder.NAME, MyCustomQueryBuilder::new,
                MyCustomQueryBuilder::fromXContent));
    }
}

After building the JAR, install with:

opensearch-plugin install file:///path/to/my-plugin.zip

Plugin Development Workflow

Scaffold with the OpenSearch plugin archetype.
Write unit tests using the OpenSearch test framework.
Verify compatibility with the target OpenSearch version (plugins are version‑locked).
Publish the ZIP to an internal Maven repository or GitHub Releases for CI/CD deployment.

Data Lifecycle: Refresh, Flush, and Translog

OpenSearch’s write path is a balance between durability, visibility, and throughput.

In‑Memory Buffer + Translog – Incoming documents first land in a RAM buffer and a write‑ahead translog on disk. This ensures recoverability after a crash.
Refresh – Every refresh_interval (default 1 s) OpenSearch creates a new Lucene segment and makes recent writes searchable. Refresh is lightweight but can be costly at high write rates.
Flush – Persists the translog to disk and clears the in‑memory buffer. Triggered automatically based on indices.flush.threshold_ops or manually via the _flush API.
Segment Replication (3.x+) – Replicas receive segment files instead of individual document operations, reducing network overhead for bulk indexing.

Tuning Refresh and Flush

Lower refresh_interval for near‑real‑time requirements (e.g., log analytics).
Increase refresh_interval or disable refresh (refresh_interval=-1) for bulk indexing jobs, then manually refresh after the load.
Adjust translog.durability to async for higher throughput at the cost of potential data loss on sudden power failure.

Practical Performance Tweaks

Concurrent Search – Enable search.max_concurrent_shard_requests to cap per‑node parallelism.
Cache Warm‑up – Run representative queries after a node restart to populate request cache.
Segment Merges – Tune indices.merge.scheduler.max_thread_count to avoid I/O spikes during heavy indexing.
Shard Sizing – Aim for 20‑40 GB per shard; too many small shards increase cluster state size and coordination overhead.
Thread Pools – Monitor search and write thread pool queue sizes; increase search.thread_pool.queue_size if you see rejections during peak traffic.

Conclusion

OpenSearch packs a rich set of features that go far beyond a simple search API. Understanding the query‑fetch split, leveraging concurrent segment search, and mastering the plugin architecture empowers you to build low‑latency, extensible search experiences. Coupled with careful lifecycle tuning—refresh, flush, translog—you can keep costs under control while delivering real‑time relevance.