This post lists a few advantages of making use of time-based indices (as well as DataStreams) in Elasticsearch.
- Increasing / Decreasing the number of shards becomes easy
- Helps to plan cluster capacity and growth size
- Easily determine optimum number of shards
1. Increasing / Decreasing the number of shards becomes easy
Say, an index template that makes use of day-wise
indices is configured with 1 shard
in index settings. In case the indexing rate is slow or the shard size becomes too large (> 40-50 GB), the index template can be easily modified to increase the number_of_ shards
to 3
or 5
or n
. And this gets effected from the next day. Similarly, if a day-wise index pattern is configured with more than required number of shards oversharded, reducing the number of shares becomes pretty easy as it's just a matter of changing the template which would be effected next day (unless re-indexing is done).
2. Helps to plan cluster capacity and growth size
Let's say 100 events per second
flow into an Elasticsearch cluster and each event
averages about 1 KB
in size. Thus, per day, there would be:
86400 seconds * 100 events/second = 8,640,000
events.
Since each event averages about 1 KB, the total size of 8,640,000 events = 8,640,000 * 1 KB = 8,640,000 KB / (1024 * 1024) = ~8.24 GB
.
Thus, with a day-wise index
, we could see that the day-wise index size would be ~9 GB per day
without any replicas. Considering 1 replica, the size per day would be ~18 GB
and size for 30 days
would be ~540 GB
. This helps with capacity planning and estimating cluster growth rate.
3. Easily determine optimum number of shards
With data set of about 9GB per day
, for a day-wise index
, we could start by setting "number_of_shards" : 1
in the index template since each primary shard
would be about 9 GB which is pretty reasonable for a single shard. Shards for time-based
indices can be in the range of 10-50 GB
as mentioned here. With a bit of trial and error based on the daily ingestion rate, we can arrive at Optimum shard size that helps in stabilizing the cluster and boosting performance.
Top comments (0)