The Inner Workings of OpenSearch: From Query to Fetch, Plugins to Performance
OpenSearch has become the go‑to open‑source search and analytics engine for many modern data‑intensive applications. While most users interact with it through simple REST calls, the system behind the scenes is a sophisticated, distributed architecture that balances latency, relevance, and scalability. This article walks through the core components that make OpenSearch tick, how you can extend it with plugins, and practical tips to squeeze out performance.
Table of Contents
- Search Execution: Query Phase vs Fetch Phase
- Concurrent Segment Search and Caching
- Extending OpenSearch with Plugins
- Data Lifecycle: Refresh, Flush, and Translog
- Practical Performance Tweaks
- Conclusion
Search Execution: Query Phase vs Fetch Phase
When a client sends a search request to OpenSearch, the request lands on a coordinating node. The coordinating node does not hold any data itself – its job is to scatter the request to the relevant primary shards, collect partial results, and gather them into the final response.
-
Step 1 – Routing – OpenSearch computes the target shard for each document using the routing key (by default the document
_id). The coordinating node forwards the query to each shard’s primary copy. - Step 2 – Query Phase – Each shard executes the query locally against its Lucene segments. This phase performs scoring, aggregations, and returns the top‑K hits per shard. No source fields are fetched yet; only doc IDs and scores travel back.
-
Step 3 – Gather Phase – The coordinating node merges the top‑K lists from all shards, re‑ranks them globally, and if the client requested the
_sourcefields, it issues a fetch phase request to retrieve the full documents. - Step 4 – Fetch Phase – Shards look up the stored fields for the selected doc IDs and return them to the coordinating node, which assembles the final JSON response.
Why the Two‑Phase Model?
- Latency – By separating scoring from fetching, OpenSearch can limit network payload to just IDs and scores when the client only needs aggregates.
- Bandwidth – Large source documents stay on the shard nodes until they are truly needed, reducing unnecessary data transfer.
- Scalability – The query phase can be parallelised across thousands of shards, while the fetch phase only touches a small subset of documents.
Concurrent Segment Search and Caching
Concurrent Segment Search
Starting with OpenSearch 3.0, concurrent segment search is enabled by default in auto mode. Instead of searching each Lucene segment sequentially, the engine slices the work and runs it in parallel across CPU cores.
-
Slice Count – Determined dynamically:
max(1, min(CPU_cores/2, 4))by default. - Benefits – Lower query latency for heavy aggregations or large result sets.
- Trade‑offs – Higher CPU consumption; not all queries benefit (e.g., simple term queries).
Request Cache vs Query Cache
-
Request Cache – Caches the entire request‑response on a per‑shard basis. Works best for identical queries that hit the same segments. Disabled for queries with
search_type=dfs_query_then_fetchor those that modify data. -
Query Cache – Stores the filter‑only portion of a query (e.g., term filters). It is reusable across different queries that share the same filter clause. Certain constructs like
script,nested, orfunction_scorebypass the query cache.
Tuning Tips
- Increase
indices.queries.cache.sizeif you have a high hit‑rate on filter‑heavy workloads. - Set
indices.requests.cache.sizeto a reasonable fraction of heap (e.g.,10%). - Use the
_cache/clearAPI during maintenance windows to avoid stale cache entries.
Extending OpenSearch with Plugins
OpenSearch’s plugin system lets you add new REST endpoints, custom queries, analysers, and more. A plugin is a ZIP file containing a plugin-descriptor.properties file and a JAR with compiled Java code.
Core Extension Points
- ActionPlugin – Register new transport and REST actions. Ideal for custom admin APIs.
- SearchPlugin – Add custom query builders, aggregations, or suggesters.
- AnalysisPlugin – Provide new tokenizers, char filters, or token filters.
- ScriptPlugin – Introduce a new scripting language or extend Painless.
Minimal Plugin Example (SearchPlugin)
public class MySearchPlugin extends Plugin implements SearchPlugin {
@Override
public List<QuerySpec<?>> getQueries() {
return Collections.singletonList(
new QuerySpec<>(MyCustomQueryBuilder.NAME, MyCustomQueryBuilder::new,
MyCustomQueryBuilder::fromXContent));
}
}
After building the JAR, install with:
opensearch-plugin install file:///path/to/my-plugin.zip
Plugin Development Workflow
- Scaffold with the OpenSearch plugin archetype.
- Write unit tests using the OpenSearch test framework.
- Verify compatibility with the target OpenSearch version (plugins are version‑locked).
- Publish the ZIP to an internal Maven repository or GitHub Releases for CI/CD deployment.
Data Lifecycle: Refresh, Flush, and Translog
OpenSearch’s write path is a balance between durability, visibility, and throughput.
- In‑Memory Buffer + Translog – Incoming documents first land in a RAM buffer and a write‑ahead translog on disk. This ensures recoverability after a crash.
-
Refresh – Every
refresh_interval(default 1 s) OpenSearch creates a new Lucene segment and makes recent writes searchable. Refresh is lightweight but can be costly at high write rates. -
Flush – Persists the translog to disk and clears the in‑memory buffer. Triggered automatically based on
indices.flush.threshold_opsor manually via the_flushAPI. - Segment Replication (3.x+) – Replicas receive segment files instead of individual document operations, reducing network overhead for bulk indexing.
Tuning Refresh and Flush
- Lower
refresh_intervalfor near‑real‑time requirements (e.g., log analytics). - Increase
refresh_intervalor disable refresh (refresh_interval=-1) for bulk indexing jobs, then manually refresh after the load. - Adjust
translog.durabilitytoasyncfor higher throughput at the cost of potential data loss on sudden power failure.
Practical Performance Tweaks
-
Concurrent Search – Enable
search.max_concurrent_shard_requeststo cap per‑node parallelism. - Cache Warm‑up – Run representative queries after a node restart to populate request cache.
-
Segment Merges – Tune
indices.merge.scheduler.max_thread_countto avoid I/O spikes during heavy indexing. - Shard Sizing – Aim for 20‑40 GB per shard; too many small shards increase cluster state size and coordination overhead.
-
Thread Pools – Monitor
searchandwritethread pool queue sizes; increasesearch.thread_pool.queue_sizeif you see rejections during peak traffic.
Conclusion
OpenSearch packs a rich set of features that go far beyond a simple search API. Understanding the query‑fetch split, leveraging concurrent segment search, and mastering the plugin architecture empowers you to build low‑latency, extensible search experiences. Coupled with careful lifecycle tuning—refresh, flush, translog—you can keep costs under control while delivering real‑time relevance.
Illustration of a distributed OpenSearch cluster with coordinating and data nodes.
Developers building a custom OpenSearch plugin.
Author: Prithvi S – Staff Software Engineer at Cloudera and Open‑source Enthusiast.
Top comments (0)