Prithvi S

Posted on Jun 9

Refresh vs Flush in Elasticsearch: The Real-Time Search Latency Trade-Off Explained

#elasticsearch #search #database #analytics

The Missing Documents Problem

You bulk-index ten thousand documents into Elasticsearch. The API returns "acknowledged": true. You immediately run a search query. Zero results. You wait a second. Still nothing. Panic sets in. Is your data lost? Is the cluster broken?

Neither. Your data is sitting in an in-memory buffer, waiting for a refresh. This is the fundamental tension at the heart of Elasticsearch: making documents searchable quickly versus indexing them efficiently. Understanding the refresh and flush mechanisms is the difference between a cluster that flies and one that crawls under load.

In this post, we will dissect the Lucene segment lifecycle, explain what refresh and flush actually do, and show you how to tune these settings for write-heavy logging pipelines, search-heavy e-commerce platforms, and everything in between.

The Ingestion Pipeline: Where Documents Go

When a document arrives at a primary shard, it follows a precise path:

Transaction Log (Translog): The document is appended to the translog on disk for durability. Even if the node crashes right now, this document can be recovered.
In-Memory Buffer: The document is analyzed, tokenized, and added to an in-memory index structure (a Lucene segment buffer).
NOT Searchable: At this point, the document is invisible to search queries. It exists, but no search thread can see it.

This design is intentional. Lucene segments are immutable. Once written to a segment, a document's index structure cannot be changed. Lucene builds these segments in memory first, then publishes them. The operation that publishes them is called a refresh.

Refresh: Making Documents Searchable

What It Does

A refresh creates a new Lucene segment from the in-memory buffer and makes it available to search threads. After a refresh, your documents are searchable. Importantly, the segment is still in the operating system's filesystem cache. It has not been written to physical disk yet.

The Default: One Second

By default, Elasticsearch refreshes every index every second. This is controlled by the index.refresh_interval setting. This one-second delay is why Elasticsearch is called near real-time rather than real-time. There is always a sub-second gap between indexing and searchability.

The Cost of Refreshing

Every refresh creates a new segment. Segments are immutable, so modifying a document actually creates a new version in a new segment and marks the old version for deletion. A high refresh rate means:

Many small segments accumulate
More file handles are consumed
Search threads must check more segments (slower queries)
Background merge operations work harder to consolidate segments

Here is how to check your current refresh interval:

GET /my-index/_settings/index.refresh_interval

# Response
{
  "my-index": {
    "settings": {
      "index": {
        "refresh_interval": "1s"
      }
    }
  }
}

Tuning the Refresh Interval

For write-heavy workloads, increasing the refresh interval dramatically improves indexing throughput:

# Relax to 30 seconds for bulk loading
PUT /my-index/_settings
{
  "index": {
    "refresh_interval": "30s"
  }
}

# Disable entirely for maximum speed (use with caution)
PUT /my-index/_settings
{
  "index": {
    "refresh_interval": "-1"
  }
}

When you disable refresh, documents remain invisible until you either re-enable it or trigger a flush. This is perfect for initial data loads or reindexing operations where search visibility does not matter.

# Re-enable after bulk load is complete
PUT /my-index/_settings
{
  "index": {
    "refresh_interval": "1s"
  }
}

A word of warning: do not forget to re-enable the refresh interval. I have seen production incidents where an index stayed invisible for hours because the bulk-load script skipped the cleanup step.

Flush: Persisting to Disk

What It Does

A flush performs three operations atomically:

Triggers a refresh to publish all in-memory segments
Writes all segments to physical disk via fsync
Truncates the translog (since data is now durably persisted)

After a flush, your data survives a power outage. Before a flush, only the translog protects you.

When Flushes Happen

Flushes are triggered automatically when the translog reaches a threshold size (512MB by default) or age. You can also trigger one manually:

# Force a flush (rarely needed in production)
POST /my-index/_flush

# Flush and wait for completion
POST /my-index/_flush?wait_if_ongoing=true

Manual flushes are useful before taking a snapshot or restarting a node, ensuring all data is on disk.

The Translog: Your Safety Net

The translog is a write-ahead log. Every document is appended to it before being indexed. If a node crashes after indexing but before flushing, Elasticsearch replays the translog on startup to recover the in-memory segments.

The translog durability setting controls how aggressively it is synced:

# Default: fsync after every request (safest, slower)
PUT /my-index/_settings
{
  "index": {
    "translog": {
      "durability": "request"
    }
  }
}

# Async: fsync every 5 seconds (faster, riskier)
PUT /my-index/_settings
{
  "index": {
    "translog": {
      "durability": "async"
    }
  }
}

For write-heavy logging clusters where you can afford to lose a few seconds of data on crash, async provides a meaningful throughput boost. For financial transactions or user data, stick with request.

The Lucene Segment Lifecycle

To truly understand refresh and flush, you must visualize the Lucene segment lifecycle:

Document Arrives
      |
      v
In-Memory Buffer (not searchable)
      |
      | Refresh
      v
New Segment (in OS filesystem cache, searchable)
      |
      | Flush
      v
Persisted Segment (on physical disk, durable)
      |
      | Background Merge
      v
Merged Segment (fewer, larger segments)

Segment Immutability

Once a segment is created, it never changes. Updates and deletes are implemented as new documents with generation markers. Old versions are physically removed only during a merge. This immutability is why Lucene is so fast for reads and so complex for writes.

The Merge Problem

Elasticsearch constantly runs background merge operations to combine small segments into larger ones. The merge policy targets segments of roughly similar size (tiered merge). However, aggressive refreshing creates tiny segments faster than merges can consolidate them. This leads to:

Segment explosion: Hundreds of segments per shard
File descriptor exhaustion: Each segment needs multiple files
Query slowdown: More segments to scan
Merge throttling: Elasticsearch pauses indexing to catch up

Monitor your segment count:

GET /_cat/segments/my-index?v=true&s=size:desc

# Look for many small segments (< 1MB)
# A healthy shard has 20-50 segments

The Trade-Off Matrix: Picking Your Strategy

Refresh Interval	Search Visibility	Indexing Throughput	Segment Pressure	Best For
100ms	Near instant	Poor	Very high	Real-time monitoring, chat
1s (default)	~1 second delay	Good	Moderate	General use, e-commerce
5s	~5 second delay	Better	Lower	Analytics, internal tools
30s	~30 second delay	Much better	Low	Log ingestion, metrics
-1 (disabled)	Manual only	Maximum	Minimal	Bulk loads, reindexing

These numbers are approximate. The actual impact depends on document size, shard count, hardware, and query patterns. Always test in your environment.

Three Production Scenarios

Scenario 1: Write-Heavy Log Ingestion

You are indexing 50,000 log lines per second across a 10-node cluster. The default settings crumble.

Tuning approach:

# Before bulk load
PUT /logs-2026-06-06/_settings
{
  "index": {
    "refresh_interval": "30s",
    "translog": {
      "durability": "async"
    },
    "number_of_replicas": 0
  }
}

# After load completes
PUT /logs-2026-06-06/_settings
{
  "index": {
    "refresh_interval": "5s",
    "translog": {
      "durability": "request"
    },
    "number_of_replicas": 1
  }
}

Reducing replicas to zero during bulk loading eliminates network replication overhead. Add them back for redundancy once the index is no longer actively written. This pattern is common in time-series logging pipelines.

Scenario 2: Search-Heavy E-Commerce

Your product catalog updates once per hour but serves 10,000 searches per minute. Search latency is everything.

Tuning approach:

# Keep default refresh for quick visibility
# Add replicas to spread read load
PUT /products/_settings
{
  "index": {
    "refresh_interval": "1s",
    "number_of_replicas": 2
  }
}

More replicas mean more nodes can handle search queries in parallel. The trade-off is higher storage usage and slower indexing (each write must propagate to all replicas). For read-heavy workloads, this is the correct trade.

Scenario 3: Mixed Workload with ILM

Your application has hot indices (active writes) and warm indices (read-only). Index Lifecycle Management (ILM) can automate the refresh transition.

PUT /_ilm/policy/mixed-workload-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "set_priority": {
            "priority": 100
          }
        }
      },
      "warm": {
        "min_age": "1d",
        "actions": {
          "forcemerge": {
            "max_num_segments": 1
          },
          "set_priority": {
            "priority": 50
          }
        }
      }
    }
  }
}

With this policy, hot indices stay responsive with a 1-second refresh while warm indices are force-merged to a single segment for optimal search performance. The refresh interval becomes irrelevant for read-only indices since there are no new writes to publish.

Common Pitfalls and How to Avoid Them

Pitfall 1: Too Many Tiny Segments

A team set refresh_interval: 100ms to power a real-time dashboard. Within a week, their cluster ran out of file descriptors and queries slowed to a crawl.

Fix: Use the _cat/segments API to monitor segment count. If a shard has more than 100 segments, your refresh interval is too aggressive. Consider increasing it or using a dedicated near-real-time index for the dashboard while keeping the main index at a relaxed setting.

Pitfall 2: Forgetting to Re-enable Refresh

A nightly ETL job disabled refresh for speed. A morning alert fired because the previous day's data was invisible. The script had no cleanup step.

Fix: Always wrap bulk operations in a try-finally block that restores the refresh interval. Or use the index templates to enforce a minimum refresh interval.

Pitfall 3: Flush Storms Under Heavy Write

With translog.durability: request and a small translog size, a burst of writes triggers continuous flushes. Each flush is a disk I/O spike, causing query latency to jitter.

Fix: Increase the translog flush threshold size for write-heavy indices:

PUT /my-index/_settings
{
  "index": {
    "translog": {
      "flush_threshold_size": "1gb"
    }
  }
}

Pitfall 4: Confusing Refresh with Durability

An engineer assumed that after a refresh, data was safe on disk. A node crash seconds later lost the last batch of documents.

Fix: Remember: refresh makes data searchable. Flush makes data durable. Only the translog and flush protect against crashes.

Pitfall 5: Ignoring Refresh in Benchmarks

A performance benchmark reported 50,000 docs/second. In production, throughput dropped to 15,000 because the benchmark had disabled refresh while the production index used the default 1s.

Fix: Always benchmark with production-equivalent settings. Document your benchmark configuration clearly so others do not copy the wrong numbers.

Diagnostic Toolkit

# Refresh statistics: total time and count
GET /my-index/_stats/refresh

# Translog statistics: size, operations, uncommitted
GET /my-index/_stats/translog

# Segment details: count, size, merge status
GET /_cat/segments/my-index?v=true&s=size:desc

# Indexing pressure: current throughput
GET /_nodes/stats/indices/indexing

# Full settings: verify refresh and translog
GET /my-index/_settings?flat_settings=true&filter_path=**.refresh_interval,**.translog

# Current indexing rate across cluster
GET /_cat/indices?v=true&h=index,docs.indexed,docs.deleted,store.size

Performance Numbers from Production

Here are real numbers I have measured in production environments. Your mileage will vary based on hardware, document size, and query complexity.

Configuration	Indexing Rate	Search Latency (p95)	Segment Count
Default (1s refresh, request durability)	12,000 docs/sec	45ms	35 per shard
30s refresh, request durability	18,000 docs/sec	38ms	12 per shard
30s refresh, async durability	22,000 docs/sec	38ms	12 per shard
Disabled refresh, async durability	28,000 docs/sec	N/A (not searchable)	3 per shard

The jump from 12,000 to 18,000 docs/sec simply by relaxing the refresh interval demonstrates how significant this tuning can be. The jump to 28,000 with refresh disabled shows the theoretical maximum, but remember that data is invisible until you re-enable it.

Conclusion: Choose Your Trade-Off

Refresh and flush are the control points for Elasticsearch's most fundamental trade-off: search visibility versus write efficiency. There is no universal best setting. A logging pipeline shipping millions of events per second needs a different configuration than a product search engine serving real-time shoppers.

The key principles are:

For write-heavy: Relax or disable refresh during bulk loads. Use async translog if you can tolerate minor durability risk. Re-enable and add replicas after loading completes.
For search-heavy: Keep the default 1-second refresh. Add replicas to spread read load. Monitor segment count and merge pressure.
For mixed workloads: Use index lifecycle management to transition settings as data ages. Force-merge old indices to a single segment.
Always monitor: Track refresh time, segment count, translog size, and indexing rate. Set alerts for unusual patterns.

Understanding these mechanisms lets you move beyond cargo-cult tuning. You will know why a setting helps, what it costs, and when to change it. That is the difference between running Elasticsearch and mastering it.

References

Elasticsearch Documentation - Index Modules: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html
Lucene Documentation - Segments: https://lucene.apache.org/core/documentation.html
Elasticsearch Refresh API: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
Elasticsearch Flush API: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-flush.html

I'm Prithvi S, Staff Software Engineer at Cloudera and open-source enthusiast. I write about search infrastructure, distributed systems, and the engineering decisions that make them reliable. Follow my work on GitHub.

DEV Community

Refresh vs Flush in Elasticsearch: The Real-Time Search Latency Trade-Off Explained

The Missing Documents Problem

The Ingestion Pipeline: Where Documents Go

Refresh: Making Documents Searchable

What It Does

The Default: One Second

The Cost of Refreshing

Tuning the Refresh Interval

Flush: Persisting to Disk

What It Does

When Flushes Happen

The Translog: Your Safety Net

The Lucene Segment Lifecycle

Segment Immutability

The Merge Problem

The Trade-Off Matrix: Picking Your Strategy

Three Production Scenarios

Scenario 1: Write-Heavy Log Ingestion

Scenario 2: Search-Heavy E-Commerce

Scenario 3: Mixed Workload with ILM

Common Pitfalls and How to Avoid Them

Diagnostic Toolkit

Performance Numbers from Production

Conclusion: Choose Your Trade-Off

References

Top comments (0)