DEV Community

amaendeepm
amaendeepm

Posted on

Kafka Data Retention: Symphony of retention.ms & segment.ms

While grasping to see very old data still on a Kafka topic recently led me to a profound understanding of the often underestimated collaboration between retention.ms and segment.ms. Far from a conventional time-to-live concept, these two configurations work in tandem, significantly impacting Kafka's data retention behaviour.

Going by the documentation around Kafka data orchestration, retention.ms serves as "the" configuration, meticulously shaping data retention. Yet, the true revelation lies in the intricate dance between retention.ms and its counterpart, segment.ms.

Working in tandem, these configurations play a vital role in shaping Kafka's data retention with precision. Basically, each topic partition divides its log into smaller, manageable chunks known as segments. The size of these segments, determining when the log rolls, is controlled by segment.bytes (sized based) and segment.ms (time-based). This dance becomes especially critical in scenarios where slow-paced topics might lead to extended waits for segment.bytes to accumulate sufficient data.

The narrative extends beyond a single Kafka topic for those venturing into tiered storage. The orchestrated dance expands into the cloud, where the influence of retention.bytes and retention.ms orchestrates the symphony of data management. Simultaneously, their local counterpart configrations viz. retention.local.target.bytes and retention.local.target.ms thus take center stage, managing the tempo on the local storage platform.

Example:

Lets maybe take a festive example featuring the topic "ChristmasGreetings." This topic is a lively hub of holiday messages from various channels, partitioned for efficiency, and set to unfold the duality of retention.ms and segment.ms:

Number of Partitions & Segment Rollover Config:
The "ChristmasGreetings" topic boasts 4 partitions, each diligently managing its own log segments.

The size of these segments, governed by segment.bytes, is set to 1 MB, ensuring manageable chunks of holiday cheer.

segment.ms is choreographed to 15 minutes, allowing segments to gracefully age and rollover.

Entering retention.ms - Longer segment.ms than retention.ms:

In this scenario, let's consider a longer segment.ms of 30 minutes, while retention.ms remains at 12 hours.

Log segments gracefully rollover every 30 minutes due to segment.ms, but messages are eligible for deletion only after 12 hours, as dictated by retention.ms.

This can lead to some segments persisting longer than needed, potentially accumulating more holiday messages than necessary. The temporal control enforced by segment.ms ensures a graceful rollover but doesn't necessarily align with the overall retention policy.

Shorter segment.ms than retention.ms:

Now, let's flip the scenario, setting segment.ms to 5 minutes and retaining messages for 12 hours.

Log segments rollover every 5 minutes due to segment.ms, but messages are eligible for deletion only after 12 hours according to retention.ms.

This configuration ensures more frequent rollovers, potentially creating a higher number of smaller log segments. While it aligns with the retention policy, it might introduce additional overhead in terms of segment management.

Balancing Act: Equal segment.ms and retention.ms:

For an optimal configuration, segment.ms is set at 15 minutes, closely aligned with retention.ms of 12 hours.

Log segments gracefully rollover in sync with the overall retention policy, ensuring efficient data management without unnecessary accumulation or overhead.

Conclusion:
In this holiday message capturing performance, segment.ms and retention.ms dance in different tempos, showcasing the nuances of Kafka's orchestration within the festive realm of "ChristmasGreetings." For anyone aiming to optimize and fortify their data streaming platforms, understanding the dynamic interplay between retention.ms and segment.ms becomes an essential therefore. Something to keep in mind always!

Top comments (0)