DEV Community

Cover image for 📊🔍 OpenSearch Dashboards: Optimizing Massive Data Queries (Big Data) with Asynchronous Search
João Victor
João Victor

Posted on • Edited on

📊🔍 OpenSearch Dashboards: Optimizing Massive Data Queries (Big Data) with Asynchronous Search

Working with logs, telemetry, or large-scale datasets in OpenSearch can result in slow and heavy queries. This guide covers everything a Tech Lead needs to know about optimizing searches using the Asynchronous Search API in OpenSearch Dashboards. For more insights and to explore my other repositories or access this post in Portuguese, be sure to visit my GitHub profile at my GitHub.


🧩 Architecture and Concepts

OpenSearch consists of:

  • Cluster → group of nodes that store and process data.
  • Shards → distributed index fragments.
  • Dashboards → visualization interface and REST Dev Tools.
  • Plugins → extensions such as Security, Reports, and Asynchronous Search.

The asynchronous search (_plugins/_asynchronous_search) executes queries in the background, allowing progress tracking, cancellation, or retrieval without blocking the client.

Simplified flow:

[Dashboards / API]
   ↓
POST /_plugins/_asynchronous_search
   ↓
→ Cluster processes the query in the background
   ↓
← Returns ID
   ↓
GET /_plugins/_asynchronous_search/{id}
DELETE /_plugins/_asynchronous_search/{id}
Enter fullscreen mode Exit fullscreen mode

⚙️ Full Request Example

POST /_plugins/_asynchronous_search?index=logs*,events*&keep_on_completion=true&wait_for_completion_timeout=2s
{
  "size": 5000,
  "track_total_hits": false,
  "_source": ["timestamp", "source_ip", "event_type"],
  "query": {
    "bool": {
      "filter": [
        {
          "range": {
            "@timestamp": { "gte": "now-30d" }
          }
        }
      ],
      "must": [
        {
          "query_string": {
            "query": "*malware*",
            "fields": ["message", "event_type"],
            "analyze_wildcard": true,
            "allow_leading_wildcard": true
          }
        }
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

✅ Expected Response (simplified)

{
  "id": "F3B7E5A6-24C3-11EF-AEA3-12AB34CD56EF",
  "is_partial": false,
  "is_running": true,
  "response": {
    "took": 134,
    "timed_out": false,
    "hits": { "total": 2381, "hits": [] }
  }
}
Enter fullscreen mode Exit fullscreen mode

🧠 Synchronous vs Asynchronous Search

Type Endpoint Blocks Client Ideal for
Synchronous POST /_search ✅ Yes Fast queries (<5s)
Asynchronous POST /_plugins/_asynchronous_search ❌ No Big Data, reports, multi-index queries

Asynchronous search immediately returns an ID, and processing continues on the server side.


📈 Key Parameters

Parameter Description Example Notes
index Target indices logs*, events* Use wildcards cautiously
keep_on_completion Keeps result after completion true Required for later retrieval
wait_for_completion_timeout Initial wait before returning ID 2s Keeps client responsive
keep_alive Result retention period 1d, 7d Useful for scheduled reports
track_total_hits Counts all documents false Disable for Big Data
_source Returned fields ["timestamp", "message"] Avoid wildcard *

🔍 Query Structure

Optimized example:

{
  "bool": {
    "filter": [
      { "range": { "@timestamp": { "gte": "now-7d" } } }
    ],
    "must": [
      { "match": { "event_type": "unauthorized_access" } }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

Expected Output:

{
  "hits": {
    "total": 542,
    "max_score": 1.0,
    "hits": [
      {
        "_index": "logs-2025.11",
        "_id": "Dfgh12345",
        "_source": {
          "timestamp": "2025-11-03T22:14:25Z",
          "event_type": "unauthorized_access",
          "source_ip": "192.168.3.24"
        }
      }
    ]
  }
}
Enter fullscreen mode Exit fullscreen mode

⚡Performance Tips

  • Prefer filter for exact and date fields (cacheable).
  • Avoid leading wildcards (*malware* → slow).
  • Reduce size (1k–5k) and use scroll or search_after.
  • Adjust refresh_interval for static indices.
  • Monitor heap with _nodes/stats/jvm.
  • Disable track_total_hits for non-analytic searches.

🧮 Monitoring and Management

Check progress

GET /_plugins/_asynchronous_search/F3B7E5A6-24C3-11EF-AEA3-12AB34CD56EF
Enter fullscreen mode Exit fullscreen mode

Expected Output:

{
  "id": "F3B7E5A6-24C3-11EF-AEA3-12AB34CD56EF",
  "is_running": false,
  "is_partial": false,
  "response": {
    "took": 3210,
    "hits": { "total": 2381, "hits": [...] }
  }
}
Enter fullscreen mode Exit fullscreen mode

Global statistics

GET /_plugins/_asynchronous_search/stats
Enter fullscreen mode Exit fullscreen mode

Expected Output:

{
  "total": {
    "submitted": 102,
    "completed": 95,
    "running": 7,
    "failed": 0
  }
}
Enter fullscreen mode Exit fullscreen mode

Cancel or delete old searches

DELETE /_plugins/_asynchronous_search/F3B7E5A6-24C3-11EF-AEA3-12AB34CD56EF
Enter fullscreen mode Exit fullscreen mode

🔒 Security and Access Control

  • Required permission: cluster:admin/opensearch/asynchronous_search/*
  • Results stored in internal indices (.opendistro-asynchronous-search*)
  • Always enable TLS and authentication (BasicAuth, JWT, or SAML).
  • Use DLS/FLS to restrict sensitive data.
  • Perform periodic cleanup of old searches.

📚 References

Top comments (0)