The Missing Documents Problem
You bulk-index ten thousand documents into Elasticsearch. The API returns "acknowledged": true. You immediately run a search query. Zero results. You wait a second. Still nothing. Panic sets in. Is your data lost? Is the cluster broken?
Neither. Your data is sitting in an in-memory buffer, waiting for a refresh. This is the fundamental tension at the heart of Elasticsearch: making documents searchable quickly versus indexing them efficiently. Understanding the refresh and flush mechanisms is the difference between a cluster that flies and one that crawls under load.
In this post, we will dissect the Lucene segment lifecycle, explain what refresh and flush actually do, and show you how to tune these settings for write-heavy logging pipelines, search-heavy e-commerce platforms, and everything in between.
The Ingestion Pipeline: Where Documents Go
When a document arrives at a primary shard, it follows a precise path:
- Transaction Log (Translog): The document is appended to the translog on disk for durability. Even if the node crashes right now, this document can be recovered.
- In-Memory Buffer: The document is analyzed, tokenized, and added to an in-memory index structure (a Lucene segment buffer).
- NOT Searchable: At this point, the document is invisible to search queries. It exists, but no search thread can see it.
This design is intentional. Lucene segments are immutable. Once written to a segment, a document's index structure cannot be changed. Lucene builds these segments in memory first, then publishes them. The operation that publishes them is called a refresh.
Refresh: Making Documents Searchable
What It Does
A refresh creates a new Lucene segment from the in-memory buffer and makes it available to search threads. After a refresh, your documents are searchable. Importantly, the segment is still in the operating system's filesystem cache. It has not been written to physical disk yet.
The Default: One Second
By default, Elasticsearch refreshes every index every second. This is controlled by the index.refresh_interval setting. This one-second delay is why Elasticsearch is called near real-time rather than real-time. There is always a sub-second gap between indexing and searchability.
The Cost of Refreshing
Every refresh creates a new segment. Segments are immutable, so modifying a document actually creates a new version in a new segment and marks the old version for deletion. A high refresh rate means:
- Many small segments accumulate
- More file handles are consumed
- Search threads must check more segments (slower queries)
- Background merge operations work harder to consolidate segments
Here is how to check your current refresh interval:
GET /my-index/_settings/index.refresh_interval
# Response
{
"my-index": {
"settings": {
"index": {
"refresh_interval": "1s"
}
}
}
}
Tuning the Refresh Interval
For write-heavy workloads, increasing the refresh interval dramatically improves indexing throughput:
# Relax to 30 seconds for bulk loading
PUT /my-index/_settings
{
"index": {
"refresh_interval": "30s"
}
}
# Disable entirely for maximum speed (use with caution)
PUT /my-index/_settings
{
"index": {
"refresh_interval": "-1"
}
}
When you disable refresh, documents remain invisible until you either re-enable it or trigger a flush. This is perfect for initial data loads or reindexing operations where search visibility does not matter.
# Re-enable after bulk load is complete
PUT /my-index/_settings
{
"index": {
"refresh_interval": "1s"
}
}
A word of warning: do not forget to re-enable the refresh interval. I have seen production incidents where an index stayed invisible for hours because the bulk-load script skipped the cleanup step.
Flush: Persisting to Disk
What It Does
A flush performs three operations atomically:
- Triggers a refresh to publish all in-memory segments
- Writes all segments to physical disk via
fsync - Truncates the translog (since data is now durably persisted)
After a flush, your data survives a power outage. Before a flush, only the translog protects you.
When Flushes Happen
Flushes are triggered automatically when the translog reaches a threshold size (512MB by default) or age. You can also trigger one manually:
# Force a flush (rarely needed in production)
POST /my-index/_flush
# Flush and wait for completion
POST /my-index/_flush?wait_if_ongoing=true
Manual flushes are useful before taking a snapshot or restarting a node, ensuring all data is on disk.
The Translog: Your Safety Net
The translog is a write-ahead log. Every document is appended to it before being indexed. If a node crashes after indexing but before flushing, Elasticsearch replays the translog on startup to recover the in-memory segments.
The translog durability setting controls how aggressively it is synced:
# Default: fsync after every request (safest, slower)
PUT /my-index/_settings
{
"index": {
"translog": {
"durability": "request"
}
}
}
# Async: fsync every 5 seconds (faster, riskier)
PUT /my-index/_settings
{
"index": {
"translog": {
"durability": "async"
}
}
}
For write-heavy logging clusters where you can afford to lose a few seconds of data on crash, async provides a meaningful throughput boost. For financial transactions or user data, stick with request.
The Lucene Segment Lifecycle
To truly understand refresh and flush, you must visualize the Lucene segment lifecycle:
Document Arrives
|
v
In-Memory Buffer (not searchable)
|
| Refresh
v
New Segment (in OS filesystem cache, searchable)
|
| Flush
v
Persisted Segment (on physical disk, durable)
|
| Background Merge
v
Merged Segment (fewer, larger segments)
Segment Immutability
Once a segment is created, it never changes. Updates and deletes are implemented as new documents with generation markers. Old versions are physically removed only during a merge. This immutability is why Lucene is so fast for reads and so complex for writes.
The Merge Problem
Elasticsearch constantly runs background merge operations to combine small segments into larger ones. The merge policy targets segments of roughly similar size (tiered merge). However, aggressive refreshing creates tiny segments faster than merges can consolidate them. This leads to:
- Segment explosion: Hundreds of segments per shard
- File descriptor exhaustion: Each segment needs multiple files
- Query slowdown: More segments to scan
- Merge throttling: Elasticsearch pauses indexing to catch up
Monitor your segment count:
GET /_cat/segments/my-index?v=true&s=size:desc
# Look for many small segments (< 1MB)
# A healthy shard has 20-50 segments
The Trade-Off Matrix: Picking Your Strategy
| Refresh Interval | Search Visibility | Indexing Throughput | Segment Pressure | Best For |
|---|---|---|---|---|
| 100ms | Near instant | Poor | Very high | Real-time monitoring, chat |
| 1s (default) | ~1 second delay | Good | Moderate | General use, e-commerce |
| 5s | ~5 second delay | Better | Lower | Analytics, internal tools |
| 30s | ~30 second delay | Much better | Low | Log ingestion, metrics |
| -1 (disabled) | Manual only | Maximum | Minimal | Bulk loads, reindexing |
These numbers are approximate. The actual impact depends on document size, shard count, hardware, and query patterns. Always test in your environment.
Three Production Scenarios
Scenario 1: Write-Heavy Log Ingestion
You are indexing 50,000 log lines per second across a 10-node cluster. The default settings crumble.
Tuning approach:
# Before bulk load
PUT /logs-2026-06-06/_settings
{
"index": {
"refresh_interval": "30s",
"translog": {
"durability": "async"
},
"number_of_replicas": 0
}
}
# After load completes
PUT /logs-2026-06-06/_settings
{
"index": {
"refresh_interval": "5s",
"translog": {
"durability": "request"
},
"number_of_replicas": 1
}
}
Reducing replicas to zero during bulk loading eliminates network replication overhead. Add them back for redundancy once the index is no longer actively written. This pattern is common in time-series logging pipelines.
Scenario 2: Search-Heavy E-Commerce
Your product catalog updates once per hour but serves 10,000 searches per minute. Search latency is everything.
Tuning approach:
# Keep default refresh for quick visibility
# Add replicas to spread read load
PUT /products/_settings
{
"index": {
"refresh_interval": "1s",
"number_of_replicas": 2
}
}
More replicas mean more nodes can handle search queries in parallel. The trade-off is higher storage usage and slower indexing (each write must propagate to all replicas). For read-heavy workloads, this is the correct trade.
Scenario 3: Mixed Workload with ILM
Your application has hot indices (active writes) and warm indices (read-only). Index Lifecycle Management (ILM) can automate the refresh transition.
PUT /_ilm/policy/mixed-workload-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"set_priority": {
"priority": 100
}
}
},
"warm": {
"min_age": "1d",
"actions": {
"forcemerge": {
"max_num_segments": 1
},
"set_priority": {
"priority": 50
}
}
}
}
}
}
With this policy, hot indices stay responsive with a 1-second refresh while warm indices are force-merged to a single segment for optimal search performance. The refresh interval becomes irrelevant for read-only indices since there are no new writes to publish.
Common Pitfalls and How to Avoid Them
Pitfall 1: Too Many Tiny Segments
A team set refresh_interval: 100ms to power a real-time dashboard. Within a week, their cluster ran out of file descriptors and queries slowed to a crawl.
Fix: Use the _cat/segments API to monitor segment count. If a shard has more than 100 segments, your refresh interval is too aggressive. Consider increasing it or using a dedicated near-real-time index for the dashboard while keeping the main index at a relaxed setting.
Pitfall 2: Forgetting to Re-enable Refresh
A nightly ETL job disabled refresh for speed. A morning alert fired because the previous day's data was invisible. The script had no cleanup step.
Fix: Always wrap bulk operations in a try-finally block that restores the refresh interval. Or use the index templates to enforce a minimum refresh interval.
Pitfall 3: Flush Storms Under Heavy Write
With translog.durability: request and a small translog size, a burst of writes triggers continuous flushes. Each flush is a disk I/O spike, causing query latency to jitter.
Fix: Increase the translog flush threshold size for write-heavy indices:
PUT /my-index/_settings
{
"index": {
"translog": {
"flush_threshold_size": "1gb"
}
}
}
Pitfall 4: Confusing Refresh with Durability
An engineer assumed that after a refresh, data was safe on disk. A node crash seconds later lost the last batch of documents.
Fix: Remember: refresh makes data searchable. Flush makes data durable. Only the translog and flush protect against crashes.
Pitfall 5: Ignoring Refresh in Benchmarks
A performance benchmark reported 50,000 docs/second. In production, throughput dropped to 15,000 because the benchmark had disabled refresh while the production index used the default 1s.
Fix: Always benchmark with production-equivalent settings. Document your benchmark configuration clearly so others do not copy the wrong numbers.
Diagnostic Toolkit
# Refresh statistics: total time and count
GET /my-index/_stats/refresh
# Translog statistics: size, operations, uncommitted
GET /my-index/_stats/translog
# Segment details: count, size, merge status
GET /_cat/segments/my-index?v=true&s=size:desc
# Indexing pressure: current throughput
GET /_nodes/stats/indices/indexing
# Full settings: verify refresh and translog
GET /my-index/_settings?flat_settings=true&filter_path=**.refresh_interval,**.translog
# Current indexing rate across cluster
GET /_cat/indices?v=true&h=index,docs.indexed,docs.deleted,store.size
Performance Numbers from Production
Here are real numbers I have measured in production environments. Your mileage will vary based on hardware, document size, and query complexity.
| Configuration | Indexing Rate | Search Latency (p95) | Segment Count |
|---|---|---|---|
| Default (1s refresh, request durability) | 12,000 docs/sec | 45ms | 35 per shard |
| 30s refresh, request durability | 18,000 docs/sec | 38ms | 12 per shard |
| 30s refresh, async durability | 22,000 docs/sec | 38ms | 12 per shard |
| Disabled refresh, async durability | 28,000 docs/sec | N/A (not searchable) | 3 per shard |
The jump from 12,000 to 18,000 docs/sec simply by relaxing the refresh interval demonstrates how significant this tuning can be. The jump to 28,000 with refresh disabled shows the theoretical maximum, but remember that data is invisible until you re-enable it.
Conclusion: Choose Your Trade-Off
Refresh and flush are the control points for Elasticsearch's most fundamental trade-off: search visibility versus write efficiency. There is no universal best setting. A logging pipeline shipping millions of events per second needs a different configuration than a product search engine serving real-time shoppers.
The key principles are:
- For write-heavy: Relax or disable refresh during bulk loads. Use async translog if you can tolerate minor durability risk. Re-enable and add replicas after loading completes.
- For search-heavy: Keep the default 1-second refresh. Add replicas to spread read load. Monitor segment count and merge pressure.
- For mixed workloads: Use index lifecycle management to transition settings as data ages. Force-merge old indices to a single segment.
- Always monitor: Track refresh time, segment count, translog size, and indexing rate. Set alerts for unusual patterns.
Understanding these mechanisms lets you move beyond cargo-cult tuning. You will know why a setting helps, what it costs, and when to change it. That is the difference between running Elasticsearch and mastering it.
References
- Elasticsearch Documentation - Index Modules: https://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules.html
- Lucene Documentation - Segments: https://lucene.apache.org/core/documentation.html
- Elasticsearch Refresh API: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-refresh.html
- Elasticsearch Flush API: https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-flush.html
I'm Prithvi S, Staff Software Engineer at Cloudera and open-source enthusiast. I write about search infrastructure, distributed systems, and the engineering decisions that make them reliable. Follow my work on GitHub.
Top comments (0)