<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Rajeev</title>
    <description>The latest articles on DEV Community by Rajeev (@rajeev_a954661bb78eb9797f).</description>
    <link>https://dev.to/rajeev_a954661bb78eb9797f</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3766845%2F2963a715-a2d0-453c-bad4-29aa3288e2cf.png</url>
      <title>DEV Community: Rajeev</title>
      <link>https://dev.to/rajeev_a954661bb78eb9797f</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/rajeev_a954661bb78eb9797f"/>
    <language>en</language>
    <item>
      <title>Kafka Safe Producer Defaults and Version Compatibility Explained</title>
      <dc:creator>Rajeev</dc:creator>
      <pubDate>Wed, 01 Apr 2026 04:37:14 +0000</pubDate>
      <link>https://dev.to/rajeev_a954661bb78eb9797f/kafka-safe-producer-defaults-and-version-compatibility-explained-5457</link>
      <guid>https://dev.to/rajeev_a954661bb78eb9797f/kafka-safe-producer-defaults-and-version-compatibility-explained-5457</guid>
      <description>&lt;p&gt;In the previous article &lt;a href="https://rajeevranjan.dev/blog/kafka-retries-idempotent-producer/" rel="noopener noreferrer"&gt;Kafka Retries and Idempotent Producers Explained&lt;/a&gt;, we discussed how idempotent producers prevent duplicate messages in Kafka even with retries.&lt;/p&gt;

&lt;p&gt;In this article, we will explore &lt;strong&gt;Kafka safe producer defaults&lt;/strong&gt;, what they mean, and how &lt;strong&gt;version compatibility between brokers and clients&lt;/strong&gt; affects them.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Does “Safe Producer” Mean in Kafka?
&lt;/h2&gt;

&lt;p&gt;A &lt;strong&gt;safe producer&lt;/strong&gt; ensures that messages are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Written &lt;strong&gt;without duplicates&lt;/strong&gt; (idempotence)&lt;/li&gt;
&lt;li&gt;Preserved in &lt;strong&gt;correct order per partition&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Retried safely if transient failures occur&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka achieves this with the following producer settings:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;enable.idempotence&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;acks&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;
&lt;span class="py"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Integer.MAX_VALUE&lt;/span&gt;
&lt;span class="py"&gt;max.in.flight.requests.per.connection&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;
&lt;span class="py"&gt;delivery.timeout.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;120000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Note: &lt;code&gt;min.insync.replicas&lt;/code&gt; is a &lt;strong&gt;broker/topic-level setting&lt;/strong&gt; and must be configured for full durability.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Kafka Version ≥ 3.0 — What It Covers
&lt;/h2&gt;

&lt;p&gt;When we say:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Kafka ≥ 3.0 has safe producer enabled by default”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;we are talking about the &lt;strong&gt;producer client behavior&lt;/strong&gt;, supported by the &lt;strong&gt;broker version ≥ 3.0&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  What is enabled automatically?
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;enable.idempotence=true&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;acks=all&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;retries=Integer.MAX_VALUE&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;max.in.flight.requests.per.connection=5&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Important:&lt;/strong&gt; &lt;code&gt;delivery.timeout.ms=120000&lt;/code&gt; is also default but not tied to idempotence.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  What this does NOT cover
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Consumer behavior&lt;/strong&gt; → consumers still need to handle duplicates if necessary
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Other Kafka components&lt;/strong&gt; → Streams, Connect, etc., are unaffected
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broker settings&lt;/strong&gt; → durability depends on replication and &lt;code&gt;min.insync.replicas&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Version Compatibility: Broker vs Producer
&lt;/h2&gt;

&lt;p&gt;Kafka &lt;strong&gt;broker version&lt;/strong&gt; and &lt;strong&gt;producer client version&lt;/strong&gt; are separate:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;th&gt;Version dependency&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Broker&lt;/td&gt;
&lt;td&gt;Kafka server/cluster&lt;/td&gt;
&lt;td&gt;Defines feature support (≥3.0 enables safe defaults)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Producer&lt;/td&gt;
&lt;td&gt;Kafka client&lt;/td&gt;
&lt;td&gt;Implements safe producer defaults; must match features with broker&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Consumer&lt;/td&gt;
&lt;td&gt;Kafka client&lt;/td&gt;
&lt;td&gt;Reads messages; independent of producer defaults&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h3&gt;
  
  
  What Happens with Mixed Versions?
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Broker Version&lt;/th&gt;
&lt;th&gt;Producer Version&lt;/th&gt;
&lt;th&gt;Safe Producer Defaults Applied?&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;≥ 3.0&lt;/td&gt;
&lt;td&gt;≥ 3.0&lt;/td&gt;
&lt;td&gt;Automatic&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;≥ 3.0&lt;/td&gt;
&lt;td&gt;&amp;lt; 3.0 (e.g., 2.1)&lt;/td&gt;
&lt;td&gt;Must manually enable &lt;code&gt;enable.idempotence&lt;/code&gt;, &lt;code&gt;acks=all&lt;/code&gt;, etc.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&amp;lt; 3.0&lt;/td&gt;
&lt;td&gt;any&lt;/td&gt;
&lt;td&gt;Must manually enable safe producer configs&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Key insight:&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Even if your &lt;strong&gt;broker is ≥3.0&lt;/strong&gt;, using an &lt;strong&gt;older producer client&lt;/strong&gt; will not automatically enable safe producer defaults.&lt;/p&gt;


&lt;h2&gt;
  
  
  When Should You Explicitly Configure Safe Producer Settings?
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Legacy systems (Kafka ≤ 2.8)&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Always configure &lt;code&gt;enable.idempotence=true&lt;/code&gt;, &lt;code&gt;acks=all&lt;/code&gt;, etc. manually.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Mixed-version clusters&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Explicit config ensures consistent behavior across old and new clients.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Critical systems&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For payments, order processing, or inventory management, explicit configs prevent duplicates and maintain ordering.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Upgrades&lt;/strong&gt;  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;When migrating brokers or clients, explicit settings help maintain predictable behavior.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;


&lt;h2&gt;
  
  
  Recommended Safe Producer Configuration
&lt;/h2&gt;

&lt;p&gt;Even with Kafka ≥ 3.0, explicitly setting configs can improve clarity:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;acks&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;
&lt;span class="py"&gt;enable.idempotence&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Integer.MAX_VALUE&lt;/span&gt;
&lt;span class="py"&gt;max.in.flight.requests.per.connection&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;
&lt;span class="py"&gt;delivery.timeout.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;120000&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;This ensures high reliability, correct ordering, and duplicate-free message delivery.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Safe producer&lt;/strong&gt; = producer client behavior, not broker or consumer
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Broker ≥3.0&lt;/strong&gt; supports safe defaults, but older clients must be configured manually
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Explicit configuration&lt;/strong&gt; is still recommended for critical systems or mixed-version clusters
&lt;/li&gt;
&lt;li&gt;Understanding &lt;strong&gt;producer vs broker vs consumer roles&lt;/strong&gt; avoids common pitfalls in Kafka message delivery&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Kafka safe producer guarantees &lt;strong&gt;idempotent writes and correct ordering per partition&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Defaults are automatic in &lt;strong&gt;broker ≥3.0 with modern clients&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;For &lt;strong&gt;older clients or mixed clusters&lt;/strong&gt;, safe producer configs must be &lt;strong&gt;explicitly set&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Proper broker settings (&lt;code&gt;min.insync.replicas&lt;/code&gt;) are still required for full durability&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;Ensuring safe producer behavior is essential for reliable Kafka pipelines, especially in distributed, event-driven systems.&lt;/p&gt;




&lt;p&gt;If you found this useful and want to share your thoughts leave a comment if you’d like. I always appreciate feedback and different perspectives.&lt;/p&gt;




&lt;p&gt;Originally published on my personal blog:&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://rajeevranjan.dev/blog/kafka/kafka-safe-producer-defaults-compatibility/" rel="noopener noreferrer"&gt;https://rajeevranjan.dev/blog/kafka/kafka-safe-producer-defaults-compatibility/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>distributedsystems</category>
      <category>java</category>
      <category>backend</category>
    </item>
    <item>
      <title>Kafka Retries and Idempotent Producers Explained: Avoid Duplicates and Ensure Reliable Delivery</title>
      <dc:creator>Rajeev</dc:creator>
      <pubDate>Sun, 22 Mar 2026 13:01:56 +0000</pubDate>
      <link>https://dev.to/rajeev_a954661bb78eb9797f/kafka-retries-and-idempotent-producers-explained-avoid-duplicates-and-ensure-reliable-delivery-gj7</link>
      <guid>https://dev.to/rajeev_a954661bb78eb9797f/kafka-retries-and-idempotent-producers-explained-avoid-duplicates-and-ensure-reliable-delivery-gj7</guid>
      <description>&lt;p&gt;In the previous article &lt;a href="https://rajeevranjan.dev/blog/kafka-producer-acks-explained/" rel="noopener noreferrer"&gt;Kafka Producer Acks Explained: Replicas, ISR, and Write Guarantees&lt;/a&gt;, we discussed when a producer considers a write successful and how acknowledgments impact durability and availability.&lt;/p&gt;

&lt;p&gt;But even with correct acknowledgment settings, one important problem still remains:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when a write fails in Kafka?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Or even more interesting:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What happens when Kafka &lt;em&gt;thinks&lt;/em&gt; a write failed, but it actually succeeded?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;This is where &lt;strong&gt;Kafka retries&lt;/strong&gt; and &lt;strong&gt;idempotent producers&lt;/strong&gt; become critical.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real Problem: Uncertain Failures in Distributed Systems
&lt;/h2&gt;

&lt;p&gt;In distributed systems, failures are not always clear.&lt;/p&gt;

&lt;p&gt;Consider this scenario:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Producer sends a message to the leader.&lt;/li&gt;
&lt;li&gt;Leader writes the message successfully.&lt;/li&gt;
&lt;li&gt;Leader sends acknowledgment.&lt;/li&gt;
&lt;li&gt;Network issue occurs → acknowledgment is lost.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;From Kafka’s perspective:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Broker: Write &lt;strong&gt;succeeded&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Producer: Write &lt;strong&gt;failed&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now the producer retries.&lt;/p&gt;

&lt;p&gt;👉 The same message gets written again.&lt;/p&gt;

&lt;p&gt;This leads to &lt;strong&gt;duplicate messages in Kafka&lt;/strong&gt;, even though the system behaved correctly.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kafka Retries Explained
&lt;/h2&gt;

&lt;p&gt;Kafka producers support automatic retries to handle transient failures.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kafka Retry Configuration
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;
&lt;span class="py"&gt;retry.backoff.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  How Kafka Retries Work
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Producer sends a record.&lt;/li&gt;
&lt;li&gt;If it receives an error (or timeout), it retries.&lt;/li&gt;
&lt;li&gt;This continues until:

&lt;ul&gt;
&lt;li&gt;Retry count is exhausted, or&lt;/li&gt;
&lt;li&gt;The send succeeds&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  When Do Kafka Retries Trigger?
&lt;/h2&gt;

&lt;p&gt;Retries typically happen in scenarios like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Temporary network failures&lt;/li&gt;
&lt;li&gt;Leader broker not available&lt;/li&gt;
&lt;li&gt;NOT_ENOUGH_REPLICAS&lt;/li&gt;
&lt;li&gt;REQUEST_TIMED_OUT&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are &lt;strong&gt;recoverable errors&lt;/strong&gt;, making retries useful.&lt;/p&gt;




&lt;h2&gt;
  
  
  Problem with Kafka Retries: Duplicate Messages
&lt;/h2&gt;

&lt;p&gt;Retries improve reliability but introduce a major issue:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Duplicate message production&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Why?&lt;/p&gt;

&lt;p&gt;Because the producer cannot always distinguish between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A failed write&lt;/li&gt;
&lt;li&gt;A successful write with lost acknowledgment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So retrying can result in:&lt;/p&gt;

&lt;p&gt;Message A → written&lt;br&gt;&lt;br&gt;
Retry Message A → written again  &lt;/p&gt;


&lt;h2&gt;
  
  
  Message Ordering Issues with Retries
&lt;/h2&gt;

&lt;p&gt;Retries can also impact ordering.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Message A is sent&lt;/li&gt;
&lt;li&gt;Message B is sent&lt;/li&gt;
&lt;li&gt;A fails and is retried later&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Now B might be written &lt;strong&gt;before&lt;/strong&gt; A retry.&lt;/p&gt;

&lt;p&gt;👉 This can break ordering guarantees.&lt;/p&gt;

&lt;p&gt;Kafka controls this using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="err"&gt;max.in.flight.requests.per.connection&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But retries alone cannot guarantee correctness.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kafka Idempotent Producer
&lt;/h2&gt;

&lt;p&gt;To solve duplicate messages in Kafka, we use:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Idempotent Producer&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  What is Idempotence in Kafka?
&lt;/h2&gt;

&lt;p&gt;Idempotence means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Sending the same message multiple times results in it being written only once.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In Kafka:&lt;/p&gt;

&lt;p&gt;👉 Even if retries happen, duplicate messages are &lt;strong&gt;not stored&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Kafka Idempotent Producer Works
&lt;/h2&gt;

&lt;p&gt;Kafka ensures idempotency using:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Producer ID (PID)
&lt;/h3&gt;

&lt;p&gt;Each producer gets a unique identifier from the broker.&lt;/p&gt;




&lt;h3&gt;
  
  
  2. Sequence Numbers
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Each message has a sequence number per partition&lt;/li&gt;
&lt;li&gt;Broker tracks the latest sequence number&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  Duplicate Detection
&lt;/h3&gt;

&lt;p&gt;On retry:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Same sequence number is sent&lt;/li&gt;
&lt;li&gt;Broker detects duplicate&lt;/li&gt;
&lt;li&gt;Duplicate message is discarded&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Enable Idempotent Producer in Kafka
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;enable.idempotence&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is enough to enable duplicate protection.&lt;/p&gt;




&lt;h2&gt;
  
  
  Important Kafka Config Changes with Idempotence
&lt;/h2&gt;

&lt;p&gt;When idempotence is enabled, Kafka automatically enforces:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;acks&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;
&lt;span class="py"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Integer.MAX_VALUE&lt;/span&gt;
&lt;span class="py"&gt;max.in.flight.requests.per.connection&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;5&lt;/span&gt;
&lt;span class="c"&gt;# Note: For idempotent producers, this number should be ≤5 to preserve ordering and ensure no duplicates
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Why These Settings Matter
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;acks=all → ensures durability
&lt;/li&gt;
&lt;li&gt;retries=∞ → safe retry mechanism
&lt;/li&gt;
&lt;li&gt;limited in-flight requests → preserves ordering
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Scope of Idempotent Producer
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What It Guarantees
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No duplicate messages per partition&lt;/li&gt;
&lt;li&gt;Safe retries&lt;/li&gt;
&lt;li&gt;Ordering guarantees (with correct config)&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  What It Does Not Guarantee
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No duplicates across producers&lt;/li&gt;
&lt;li&gt;No duplicates across restarts&lt;/li&gt;
&lt;li&gt;End-to-end exactly-once processing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For that, Kafka provides transactions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kafka Retries vs Idempotent Producer
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Without Idempotence&lt;/th&gt;
&lt;th&gt;With Idempotence&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Retries&lt;/td&gt;
&lt;td&gt;Can create duplicates&lt;/td&gt;
&lt;td&gt;Safe&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Reliability&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Ordering&lt;/td&gt;
&lt;td&gt;Can break&lt;/td&gt;
&lt;td&gt;Preserved&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;




&lt;h2&gt;
  
  
  Recommended Kafka Producer Configuration
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;acks&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;all&lt;/span&gt;
&lt;span class="py"&gt;enable.idempotence&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;true&lt;/span&gt;
&lt;span class="py"&gt;retries&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;Integer.MAX_VALUE&lt;/span&gt;
&lt;span class="py"&gt;retry.backoff.ms&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This setup ensures:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;High reliability&lt;/li&gt;
&lt;li&gt;No duplicate messages&lt;/li&gt;
&lt;li&gt;Strong durability guarantees&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  When to Use Idempotent Producer
&lt;/h2&gt;

&lt;p&gt;Use idempotent producers in:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Payment systems&lt;/li&gt;
&lt;li&gt;Order processing&lt;/li&gt;
&lt;li&gt;Inventory management&lt;/li&gt;
&lt;li&gt;Critical event-driven systems&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In modern Kafka setups:&lt;/p&gt;

&lt;p&gt;👉 It should almost always be enabled.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Kafka retries are essential for handling transient failures, but they introduce the risk of duplicate messages.&lt;/p&gt;

&lt;p&gt;Idempotent producers eliminate this risk by making retries safe.&lt;/p&gt;

&lt;p&gt;Together, they ensure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reliable message delivery&lt;/li&gt;
&lt;li&gt;No duplication&lt;/li&gt;
&lt;li&gt;Strong consistency at the producer level&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Kafka retries help recover from failures but can cause duplicate messages.&lt;/p&gt;

&lt;p&gt;Idempotent producers solve this by ensuring messages are written exactly once per partition.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retries improve fault tolerance&lt;/li&gt;
&lt;li&gt;Idempotence ensures correctness&lt;/li&gt;
&lt;li&gt;Together they enable reliable Kafka pipelines&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;If you found this useful and want to share your thoughts leave a comment if you’d like. I always appreciate feedback and different perspectives.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>distributedsystems</category>
      <category>java</category>
      <category>backend</category>
    </item>
    <item>
      <title>Kafka Producer Acks Explained: Replicas, ISR, and Write Guarantees</title>
      <dc:creator>Rajeev</dc:creator>
      <pubDate>Thu, 05 Mar 2026 16:55:13 +0000</pubDate>
      <link>https://dev.to/rajeev_a954661bb78eb9797f/kafka-producer-acks-explained-replicas-isr-and-write-guarantees-496p</link>
      <guid>https://dev.to/rajeev_a954661bb78eb9797f/kafka-producer-acks-explained-replicas-isr-and-write-guarantees-496p</guid>
      <description>&lt;p&gt;In the previous article &lt;a href="https://rajeevranjan.dev/blog/kafka-consumer-graceful-shutdown/" rel="noopener noreferrer"&gt;Kafka Consumer Graceful Shutdown Explained&lt;/a&gt;, we discussed how Kafka consumers should shut down gracefully to avoid duplicate processing and offset inconsistencies.&lt;/p&gt;

&lt;p&gt;Now let’s move to the producer side of Kafka.&lt;/p&gt;

&lt;p&gt;When a producer sends a message, an important question arises:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When should the producer consider the write successful?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Kafka answers this through a configuration called &lt;strong&gt;producer acknowledgments&lt;/strong&gt;, commonly referred to as &lt;strong&gt;acks&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;But before we understand acks, we need to clarify two important Kafka concepts that directly influence reliability:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replicas&lt;/li&gt;
&lt;li&gt;ISR (In-Sync Replicas)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These concepts define how Kafka maintains durability and availability in a distributed cluster.&lt;/p&gt;




&lt;h2&gt;
  
  
  Replicas vs ISR in Kafka
&lt;/h2&gt;

&lt;p&gt;When a Kafka topic is created, each partition is replicated across multiple brokers. This replication is controlled by the &lt;strong&gt;replication factor&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;For example:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;replication.factor = 3&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;This means a partition has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;strong&gt;1 Leader replica&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;2 Follower replicas&lt;/strong&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Together they form the &lt;strong&gt;replica set&lt;/strong&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Replicas
&lt;/h3&gt;

&lt;p&gt;Replicas are &lt;strong&gt;all copies of a partition&lt;/strong&gt; across brokers. They are fixed when the topic is created.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;p&gt;Partition-0&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Leader: Broker 1&lt;/li&gt;
&lt;li&gt;Follower: Broker 2&lt;/li&gt;
&lt;li&gt;Follower: Broker 3&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These three together form the replica set.&lt;/p&gt;

&lt;h3&gt;
  
  
  ISR (In-Sync Replicas)
&lt;/h3&gt;

&lt;p&gt;The &lt;strong&gt;ISR&lt;/strong&gt; is a &lt;strong&gt;subset of replicas&lt;/strong&gt; that are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Alive&lt;/li&gt;
&lt;li&gt;Fully caught up with the leader&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Unlike replicas, &lt;strong&gt;ISR is dynamic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If a follower falls behind or becomes unavailable, Kafka removes it from the ISR.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replicas: Broker1, Broker2, Broker3&lt;/li&gt;
&lt;li&gt;ISR: Broker1, Broker2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka uses &lt;strong&gt;ISR&lt;/strong&gt; — not all replicas — when determining if a write is successful.&lt;/p&gt;

&lt;p&gt;This becomes especially important when &lt;strong&gt;acks=all&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kafka Producer Acknowledgments
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;acks&lt;/code&gt; configuration controls &lt;strong&gt;how many brokers must confirm a write before the producer considers it successful&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;There are three main settings:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;acks=0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;acks=1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;acks=all&lt;/code&gt; (or &lt;code&gt;acks=-1&lt;/code&gt;)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each option provides a different balance between &lt;strong&gt;throughput, durability, and latency&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  acks = 0
&lt;/h2&gt;

&lt;p&gt;With this configuration, the producer &lt;strong&gt;does not wait for any acknowledgment from the broker&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The producer simply sends the record and moves on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Characteristics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;No confirmation from the broker&lt;/li&gt;
&lt;li&gt;Highest throughput&lt;/li&gt;
&lt;li&gt;Minimal network overhead&lt;/li&gt;
&lt;li&gt;Possible data loss&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What Happens Internally
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Producer sends the message.&lt;/li&gt;
&lt;li&gt;Producer immediately considers the write successful.&lt;/li&gt;
&lt;li&gt;Broker response is not waited for.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;If the leader broker is unavailable or crashes, the producer will not know.&lt;/p&gt;

&lt;h3&gt;
  
  
  Important Note
&lt;/h3&gt;

&lt;p&gt;Retries do not help much here because the producer never receives failure feedback.&lt;/p&gt;

&lt;h3&gt;
  
  
  Typical Use Cases
&lt;/h3&gt;

&lt;p&gt;This configuration is generally used when &lt;strong&gt;data loss is acceptable&lt;/strong&gt;, for example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Logs&lt;/li&gt;
&lt;li&gt;Metrics&lt;/li&gt;
&lt;li&gt;Non-critical telemetry data&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  acks = 1
&lt;/h2&gt;

&lt;p&gt;Here the producer waits for &lt;strong&gt;acknowledgment from the leader broker only&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The leader writes the message to its local log and then sends the acknowledgment back to the producer.&lt;/p&gt;

&lt;h3&gt;
  
  
  What Happens Internally
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Producer sends record to leader.&lt;/li&gt;
&lt;li&gt;Leader writes record to its log.&lt;/li&gt;
&lt;li&gt;Leader sends acknowledgment.&lt;/li&gt;
&lt;li&gt;Replication to followers happens &lt;strong&gt;asynchronously&lt;/strong&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Risk Scenario
&lt;/h3&gt;

&lt;p&gt;If the leader crashes &lt;strong&gt;before followers replicate the message&lt;/strong&gt;, the data may be lost.&lt;/p&gt;

&lt;h3&gt;
  
  
  Characteristics
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Good throughput&lt;/li&gt;
&lt;li&gt;Moderate durability&lt;/li&gt;
&lt;li&gt;Lower latency compared to acks=all&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Historically, &lt;strong&gt;acks=1 was the default configuration in Kafka up to version 2.x&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  acks = all (or acks = -1)
&lt;/h2&gt;

&lt;p&gt;This is the &lt;strong&gt;strongest durability guarantee Kafka offers&lt;/strong&gt; for producers.&lt;/p&gt;

&lt;p&gt;Here the leader waits for &lt;strong&gt;all replicas in the ISR to acknowledge the write&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;However, this works together with another configuration:&lt;/p&gt;

&lt;p&gt;min.insync.replicas&lt;/p&gt;




&lt;h2&gt;
  
  
  min.insync.replicas
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;min.insync.replicas&lt;/code&gt; defines the &lt;strong&gt;minimum number of ISR replicas that must acknowledge a write&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Default value:&lt;/p&gt;

&lt;p&gt;min.insync.replicas = 1&lt;/p&gt;

&lt;p&gt;If the ISR size drops below this value, the broker rejects the write.&lt;/p&gt;




&lt;h2&gt;
  
  
  Write Flow with acks=all
&lt;/h2&gt;

&lt;p&gt;When a producer sends a message:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Leader appends the record to its log.&lt;/li&gt;
&lt;li&gt;Leader replicates the record to all ISR members.&lt;/li&gt;
&lt;li&gt;Leader waits for acknowledgments.&lt;/li&gt;
&lt;li&gt;Write succeeds only if:&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;ISR size &amp;gt;= min.insync.replicas&lt;/p&gt;

&lt;p&gt;If this condition fails, the broker returns errors such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;NOT_ENOUGH_REPLICAS&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;NOT_ENOUGH_REPLICAS_AFTER_APPEND&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The producer then receives an exception.&lt;/p&gt;




&lt;h2&gt;
  
  
  Kafka 3.x Default Behavior
&lt;/h2&gt;

&lt;p&gt;This is a detail that often confuses people.&lt;/p&gt;

&lt;p&gt;Even in Kafka 3.x:&lt;/p&gt;

&lt;p&gt;acks=1&lt;/p&gt;

&lt;p&gt;still appears as the default producer configuration.&lt;/p&gt;

&lt;p&gt;However:&lt;/p&gt;

&lt;p&gt;enable.idempotence=true&lt;/p&gt;

&lt;p&gt;is enabled by default starting from Kafka 3.0.&lt;/p&gt;

&lt;p&gt;Idempotent producers &lt;strong&gt;automatically require&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;acks=all&lt;/p&gt;

&lt;p&gt;So although the configuration may show &lt;code&gt;acks=1&lt;/code&gt;, producers effectively behave like &lt;code&gt;acks=all&lt;/code&gt; unless idempotence is disabled.&lt;/p&gt;




&lt;h2&gt;
  
  
  Write Availability vs Durability
&lt;/h2&gt;

&lt;p&gt;Now let's connect these settings to &lt;strong&gt;cluster availability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Assume:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;replication.factor = 3&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  With acks=0 or acks=1
&lt;/h3&gt;

&lt;p&gt;As long as the &lt;strong&gt;leader broker is available&lt;/strong&gt;, writes can succeed.&lt;/p&gt;

&lt;p&gt;Even if follower replicas are down, the producer can still write.&lt;/p&gt;




&lt;h3&gt;
  
  
  With acks=all
&lt;/h3&gt;

&lt;p&gt;Write availability now depends on &lt;strong&gt;min.insync.replicas&lt;/strong&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Case 1 min.insync.replicas = 1
&lt;/h4&gt;

&lt;p&gt;Writes succeed as long as the &lt;strong&gt;leader is alive&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This means &lt;strong&gt;two brokers can fail&lt;/strong&gt; and writes will still be accepted.&lt;/p&gt;




&lt;h4&gt;
  
  
  Case 2 min.insync.replicas = 2
&lt;/h4&gt;

&lt;p&gt;Now at least:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Leader&lt;/li&gt;
&lt;li&gt;One follower&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;must be present in ISR.&lt;/p&gt;

&lt;p&gt;This means &lt;strong&gt;one broker failure can be tolerated&lt;/strong&gt;.&lt;/p&gt;




&lt;h4&gt;
  
  
  Case 3 min.insync.replicas = 3
&lt;/h4&gt;

&lt;p&gt;All replicas must acknowledge the write.&lt;/p&gt;

&lt;p&gt;This means &lt;strong&gt;no broker failure can be tolerated&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;While technically possible, this setup is rarely used because Kafka systems are designed to tolerate node failures.&lt;/p&gt;

&lt;p&gt;However, it may be used in scenarios where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Extremely high durability is required&lt;/li&gt;
&lt;li&gt;Write unavailability is preferred over potential data loss&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  General Rule for Failure Tolerance
&lt;/h2&gt;

&lt;p&gt;If:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;replication.factor = N&lt;/li&gt;
&lt;li&gt;min.insync.replicas = M&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Then the system can tolerate:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;N - M brokers failing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;While still accepting writes when using &lt;code&gt;acks=all&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The ISR size can be &lt;strong&gt;at most equal to the replication factor&lt;/strong&gt;, but may shrink dynamically if followers fall behind.&lt;/p&gt;




&lt;h2&gt;
  
  
  Choosing the Right Acknowledgment Strategy
&lt;/h2&gt;

&lt;p&gt;A quick rule of thumb used in many production systems:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Setting&lt;/th&gt;
&lt;th&gt;Throughput&lt;/th&gt;
&lt;th&gt;Durability&lt;/th&gt;
&lt;th&gt;Typical Usage&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;acks=0&lt;/td&gt;
&lt;td&gt;Highest&lt;/td&gt;
&lt;td&gt;Lowest&lt;/td&gt;
&lt;td&gt;Logs, metrics&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;acks=1&lt;/td&gt;
&lt;td&gt;High&lt;/td&gt;
&lt;td&gt;Moderate&lt;/td&gt;
&lt;td&gt;General workloads&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;acks=all&lt;/td&gt;
&lt;td&gt;Lower&lt;/td&gt;
&lt;td&gt;Highest&lt;/td&gt;
&lt;td&gt;Financial or critical data&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Most modern Kafka deployments prefer:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;acks=all&lt;/li&gt;
&lt;li&gt;enable.idempotence=true&lt;/li&gt;
&lt;li&gt;min.insync.replicas &amp;gt;= 2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This combination provides strong guarantees without sacrificing too much availability.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Producer acknowledgments directly impact &lt;strong&gt;data durability and system availability&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Understanding the relationship between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Replication factor&lt;/li&gt;
&lt;li&gt;ISR&lt;/li&gt;
&lt;li&gt;min.insync.replicas&lt;/li&gt;
&lt;li&gt;Producer acks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;is essential when designing reliable Kafka pipelines.&lt;/p&gt;

&lt;p&gt;Misconfiguring these settings can either lead to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Data loss&lt;/li&gt;
&lt;li&gt;Unnecessary write failures&lt;/li&gt;
&lt;li&gt;Reduced throughput&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A balanced configuration ensures both &lt;strong&gt;reliability and performance&lt;/strong&gt; in production systems.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;Kafka producer acknowledgments determine when a write is considered successful.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;acks=0 prioritizes throughput but risks data loss&lt;/li&gt;
&lt;li&gt;acks=1 waits for leader acknowledgment&lt;/li&gt;
&lt;li&gt;acks=all ensures replication across ISR members&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Combined with replication.factor and min.insync.replicas,&lt;br&gt;
these settings control the balance between durability and availability.&lt;/p&gt;

&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;Now that we understand acknowledgments and replication, the next step is exploring the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Retries&lt;/li&gt;
&lt;li&gt;Idempotent producers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These features build on top of the acknowledgment mechanism and are critical for building robust event-driven systems.&lt;/p&gt;




&lt;p&gt;Originally published on my personal blog:&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://rajeevranjan.dev/blog/kafka-producer-acks-explained/" rel="noopener noreferrer"&gt;https://rajeevranjan.dev/blog/kafka-producer-acks-explained/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>distributedsystems</category>
      <category>java</category>
      <category>backend</category>
    </item>
    <item>
      <title>Kafka Consumer Graceful Shutdown: Handle WakeupException and Commit Offsets Safely</title>
      <dc:creator>Rajeev</dc:creator>
      <pubDate>Thu, 26 Feb 2026 13:07:29 +0000</pubDate>
      <link>https://dev.to/rajeev_a954661bb78eb9797f/kafka-consumer-graceful-shutdown-handle-wakeupexception-and-commit-offsets-safely-2a93</link>
      <guid>https://dev.to/rajeev_a954661bb78eb9797f/kafka-consumer-graceful-shutdown-handle-wakeupexception-and-commit-offsets-safely-2a93</guid>
      <description>&lt;p&gt;In the previous article, we looked at how rebalancing works and why consumers pause during deployments.&lt;/p&gt;

&lt;p&gt;Now let’s look at something that causes even more real production issues:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do you gracefully shut down a Kafka consumer without losing work?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In real systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pods restart
&lt;/li&gt;
&lt;li&gt;Deployments happen
&lt;/li&gt;
&lt;li&gt;Servers receive SIGTERM
&lt;/li&gt;
&lt;li&gt;Auto-scaling removes instances
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If shutdown is not handled properly, you can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reprocess messages
&lt;/li&gt;
&lt;li&gt;Lose uncommitted offsets
&lt;/li&gt;
&lt;li&gt;Cause inconsistent state
&lt;/li&gt;
&lt;li&gt;Trigger unnecessary rebalances
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka does not handle this automatically for you.&lt;br&gt;&lt;br&gt;
You have to shut down the consumer correctly.&lt;/p&gt;


&lt;h2&gt;
  
  
  Why Graceful Shutdown Matters
&lt;/h2&gt;

&lt;p&gt;A typical Kafka consumer runs inside a loop:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nc"&gt;ConsumerRecords&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConsumerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now imagine the application is terminated while:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Records are already processed
&lt;/li&gt;
&lt;li&gt;Offsets are not committed yet
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When the consumer restarts, Kafka will deliver those records again.&lt;/p&gt;

&lt;p&gt;That may be acceptable.&lt;/p&gt;

&lt;p&gt;But in many systems, it creates duplicate writes, duplicate payments, or inconsistent updates.&lt;/p&gt;

&lt;p&gt;A proper Kafka consumer graceful shutdown should:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Stop polling new records
&lt;/li&gt;
&lt;li&gt;Finish processing current records
&lt;/li&gt;
&lt;li&gt;Commit offsets safely
&lt;/li&gt;
&lt;li&gt;Close the consumer cleanly
&lt;/li&gt;
&lt;/ol&gt;




&lt;h2&gt;
  
  
  The Main Problem: &lt;code&gt;poll()&lt;/code&gt; Blocks
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;poll()&lt;/code&gt; method blocks while waiting for records.&lt;/p&gt;

&lt;p&gt;If your application receives a shutdown signal while the thread is inside &lt;code&gt;poll()&lt;/code&gt;, how do you stop it?&lt;/p&gt;

&lt;p&gt;You should not kill the thread.&lt;/p&gt;

&lt;p&gt;Kafka provides a safe mechanism for this.&lt;/p&gt;




&lt;h2&gt;
  
  
  WakeupException in Kafka
&lt;/h2&gt;

&lt;p&gt;Kafka provides this method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;wakeup&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When called from another thread, it interrupts the blocking &lt;code&gt;poll()&lt;/code&gt; call.&lt;/p&gt;

&lt;p&gt;Inside the consumer thread, &lt;code&gt;poll()&lt;/code&gt; throws:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;WakeupException
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the official and safe way to interrupt a Kafka consumer.&lt;/p&gt;

&lt;p&gt;Nothing else should be used.&lt;/p&gt;




&lt;h2&gt;
  
  
  Correct Kafka Consumer Graceful Shutdown Pattern
&lt;/h2&gt;

&lt;p&gt;Let’s build this properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Step 1: Add a running flag
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;running&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Step 2: Consumer implementation
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;GracefulKafkaConsumer&lt;/span&gt; &lt;span class="kd"&gt;implements&lt;/span&gt; &lt;span class="nc"&gt;Runnable&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;final&lt;/span&gt; &lt;span class="nc"&gt;KafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kd"&gt;volatile&lt;/span&gt; &lt;span class="kt"&gt;boolean&lt;/span&gt; &lt;span class="n"&gt;running&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="nf"&gt;GracefulKafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;KafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;consumer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="nd"&gt;@Override&lt;/span&gt;
    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;subscribe&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;List&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;of&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"my-topic"&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

            &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;running&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="nc"&gt;ConsumerRecords&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
                    &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;poll&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;Duration&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;ofMillis&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="o"&gt;));&lt;/span&gt;

                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConsumerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;records&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                    &lt;span class="n"&gt;process&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
                &lt;span class="o"&gt;}&lt;/span&gt;

                &lt;span class="c1"&gt;// Commit after processing&lt;/span&gt;
                &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;commitSync&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;

        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;WakeupException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// Expected during shutdown&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;running&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;commitSync&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// final safety commit&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;finally&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;close&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt; &lt;span class="c1"&gt;// leave group cleanly&lt;/span&gt;
            &lt;span class="o"&gt;}&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;public&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;shutdown&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;running&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="o"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;wakeup&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;

    &lt;span class="kd"&gt;private&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;ConsumerRecord&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="nc"&gt;String&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
        &lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;out&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;println&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Processing: "&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;value&lt;/span&gt;&lt;span class="o"&gt;());&lt;/span&gt;
    &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  What Is Happening Here?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;running&lt;/code&gt; controls the loop.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;consumer.wakeup()&lt;/code&gt; interrupts &lt;code&gt;poll()&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;WakeupException&lt;/code&gt; is expected during shutdown.&lt;/li&gt;
&lt;li&gt;Offsets are committed after processing.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;consumer.close()&lt;/code&gt; triggers proper group leave.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This ensures offsets are committed safely before exit.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Not Just Kill the Thread?
&lt;/h2&gt;

&lt;p&gt;If you forcefully stop the thread:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Offsets may not be committed
&lt;/li&gt;
&lt;li&gt;The consumer may not leave the group cleanly
&lt;/li&gt;
&lt;li&gt;Rebalances may be delayed
&lt;/li&gt;
&lt;li&gt;Duplicate processing may increase
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Always use &lt;code&gt;consumer.wakeup()&lt;/code&gt; for graceful shutdown.&lt;/p&gt;




&lt;h2&gt;
  
  
  Manual Commit vs Auto Commit
&lt;/h2&gt;

&lt;p&gt;If you are using:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;enable.auto.commit=true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Offsets are committed periodically.&lt;/p&gt;

&lt;p&gt;But auto commit does not guarantee that processing is finished before commit.&lt;/p&gt;

&lt;p&gt;For production systems, it is usually safer to use:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;enable.auto.commit=false
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And commit offsets only after successful processing.&lt;/p&gt;

&lt;p&gt;If you want a deeper explanation of offset commits, I covered it here:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://rajeevranjan.dev/blog/kafka-consumer-auto-commit-misunderstood/" rel="noopener noreferrer"&gt;Kafka Auto Commit Explained (At-Least-Once Processing)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Offset management and graceful shutdown are closely connected.&lt;/p&gt;




&lt;h2&gt;
  
  
  A Quick Note on Long Processing
&lt;/h2&gt;

&lt;p&gt;If your message processing takes a long time, Kafka may trigger a rebalance even if the consumer is still running.&lt;/p&gt;

&lt;p&gt;This usually relates to how frequently &lt;code&gt;poll()&lt;/code&gt; is called and certain consumer configurations.&lt;/p&gt;

&lt;p&gt;This topic deserves a separate discussion because it directly impacts stability in high-throughput systems.&lt;/p&gt;

&lt;p&gt;We’ll explore it properly in a dedicated article.&lt;/p&gt;




&lt;h2&gt;
  
  
  Connecting Shutdown to JVM Exit
&lt;/h2&gt;

&lt;p&gt;You can connect shutdown logic like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="nc"&gt;GracefulKafkaConsumer&lt;/span&gt; &lt;span class="n"&gt;consumerRunnable&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
    &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;GracefulKafkaConsumer&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consumer&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;

&lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;consumerThread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;consumerRunnable&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;consumerThread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;start&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
&lt;span class="nc"&gt;Thread&lt;/span&gt; &lt;span class="n"&gt;mainThread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

&lt;span class="nc"&gt;Runtime&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;getRuntime&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;addShutdownHook&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;(()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;consumerRunnable&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;shutdown&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;try&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;mainThread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;join&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;catch&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;InterruptedException&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
            &lt;span class="nc"&gt;Thread&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentThread&lt;/span&gt;&lt;span class="o"&gt;().&lt;/span&gt;&lt;span class="na"&gt;interrupt&lt;/span&gt;&lt;span class="o"&gt;();&lt;/span&gt;
        &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}));&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the application receives a termination signal, the consumer exits safely.&lt;/p&gt;




&lt;h2&gt;
  
  
  How This Relates to Rebalancing
&lt;/h2&gt;

&lt;p&gt;In the previous article(👉 &lt;a href="https://rajeevranjan.dev/blog/kafka-eager-vs-cooperative-rebalancing/" rel="noopener noreferrer"&gt;Read: Kafka Eager vs Cooperative Rebalancing Explained&lt;/a&gt;), we discussed eager vs cooperative rebalancing.&lt;/p&gt;

&lt;p&gt;Improper shutdown can trigger unnecessary rebalances.&lt;/p&gt;

&lt;p&gt;A clean shutdown:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Reduces duplicate processing
&lt;/li&gt;
&lt;li&gt;Leaves the consumer group smoothly
&lt;/li&gt;
&lt;li&gt;Minimizes disruption during deployments
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Graceful shutdown is not optional in production systems.&lt;/p&gt;

&lt;p&gt;It is part of building reliable Kafka consumers.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Kafka consumer graceful shutdown is simple in concept, but easy to get wrong.&lt;/p&gt;

&lt;p&gt;The correct approach is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Stop polling
&lt;/li&gt;
&lt;li&gt;Finish processing
&lt;/li&gt;
&lt;li&gt;Commit offsets safely
&lt;/li&gt;
&lt;li&gt;Close the consumer properly
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you skip any of these steps, you increase the chance of duplicate processing or inconsistent state.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;Next, we’ll explore Kafka producer internals — specifically how &lt;code&gt;acks&lt;/code&gt;, retries, and idempotent producers impact reliability and delivery guarantees in production systems.&lt;/p&gt;




&lt;p&gt;Originally published on my personal blog:&lt;/p&gt;

&lt;p&gt;🔗 &lt;a href="https://rajeevranjan.dev/blog/kafka-consumer-graceful-shutdown/" rel="noopener noreferrer"&gt;https://rajeevranjan.dev/blog/kafka-consumer-graceful-shutdown/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>distributedsystems</category>
      <category>java</category>
      <category>backend</category>
    </item>
    <item>
      <title>Kafka Eager vs Cooperative Rebalancing Explained (Why Consumers Pause During Deployments)</title>
      <dc:creator>Rajeev</dc:creator>
      <pubDate>Thu, 19 Feb 2026 17:30:18 +0000</pubDate>
      <link>https://dev.to/rajeev_a954661bb78eb9797f/kafka-eager-vs-cooperative-rebalancing-explained-why-consumers-pause-during-deployments-4dn9</link>
      <guid>https://dev.to/rajeev_a954661bb78eb9797f/kafka-eager-vs-cooperative-rebalancing-explained-why-consumers-pause-during-deployments-4dn9</guid>
      <description>&lt;p&gt;If you search about Kafka rebalancing, you’ll often find something like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Kafka automatically redistributes partitions when consumers join or leave.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That statement is correct.&lt;br&gt;&lt;br&gt;
But it hides what really happens during that redistribution.&lt;/p&gt;

&lt;p&gt;In real systems, rebalancing can:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Pause message processing
&lt;/li&gt;
&lt;li&gt;Increase latency
&lt;/li&gt;
&lt;li&gt;Trigger duplicate processing
&lt;/li&gt;
&lt;li&gt;Disrupt rolling deployments
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most developers only notice it when their consumers suddenly stop for a few seconds during a deployment.&lt;/p&gt;

&lt;p&gt;In this article, we’ll look at what actually happens during a rebalance, how eager rebalancing works, how cooperative rebalancing improves it, and why the difference matters in production.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Does Rebalancing Happen?
&lt;/h2&gt;

&lt;p&gt;Rebalancing is triggered whenever consumer group membership changes.&lt;/p&gt;

&lt;p&gt;Common triggers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A new consumer instance starts
&lt;/li&gt;
&lt;li&gt;A consumer crashes
&lt;/li&gt;
&lt;li&gt;A deployment restarts pods
&lt;/li&gt;
&lt;li&gt;Topic partitions are increased
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka must redistribute partitions among active consumers.&lt;/p&gt;

&lt;p&gt;Example:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3 consumers
&lt;/li&gt;
&lt;li&gt;6 partitions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each gets 2 partitions.&lt;/p&gt;

&lt;p&gt;If one consumer goes down, Kafka must reassign its partitions to the remaining consumers.&lt;/p&gt;

&lt;p&gt;That reassignment process is the rebalance.&lt;/p&gt;

&lt;p&gt;What matters is &lt;strong&gt;how&lt;/strong&gt; that reassignment happens.&lt;/p&gt;




&lt;h2&gt;
  
  
  Eager Rebalancing: Stop Everything
&lt;/h2&gt;

&lt;p&gt;In the traditional model (used for years), Kafka performs what is called eager rebalancing.&lt;/p&gt;

&lt;p&gt;When a rebalance is triggered:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;All consumers stop fetching records
&lt;/li&gt;
&lt;li&gt;All partitions are revoked from all consumers
&lt;/li&gt;
&lt;li&gt;Kafka computes a new assignment
&lt;/li&gt;
&lt;li&gt;Partitions are reassigned
&lt;/li&gt;
&lt;li&gt;Consumers resume processing
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Every consumer pauses.&lt;/p&gt;

&lt;p&gt;Even partitions that did not need to move are temporarily revoked.&lt;/p&gt;

&lt;p&gt;This is why Kafka consumers pause during a rebalance.&lt;/p&gt;

&lt;p&gt;It is effectively a full stop across the entire consumer group.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Becomes Visible in Production
&lt;/h2&gt;

&lt;p&gt;In static environments, this may not feel significant.&lt;/p&gt;

&lt;p&gt;But in dynamic systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Auto-scaling
&lt;/li&gt;
&lt;li&gt;Kubernetes rolling updates
&lt;/li&gt;
&lt;li&gt;Frequent deployments
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Rebalances can happen often.&lt;/p&gt;

&lt;p&gt;Suppose you have:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4 consumers
&lt;/li&gt;
&lt;li&gt;12 partitions
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;During a rolling deployment, one new consumer joins.&lt;/p&gt;

&lt;p&gt;With eager rebalancing, all 4 consumers pause — even though only a few partitions actually need reassignment.&lt;/p&gt;

&lt;p&gt;With cooperative rebalancing, only the affected partitions move. The rest continue processing.&lt;/p&gt;

&lt;p&gt;With eager rebalancing, each event creates a visible pause in processing.&lt;/p&gt;

&lt;p&gt;If offset handling is not carefully managed, this can also increase duplicate processing.&lt;/p&gt;

&lt;p&gt;The system is correct — but not smooth.&lt;/p&gt;




&lt;h2&gt;
  
  
  Cooperative Rebalancing: Move Only What’s Necessary
&lt;/h2&gt;

&lt;p&gt;To reduce disruption, Kafka introduced cooperative (incremental) rebalancing in Apache Kafka 2.4.&lt;/p&gt;

&lt;p&gt;The key difference:&lt;/p&gt;

&lt;p&gt;Instead of revoking all partitions from all consumers, Kafka only revokes the partitions that actually need to move.&lt;/p&gt;

&lt;p&gt;Other partitions continue processing.&lt;/p&gt;

&lt;p&gt;The rebalance happens incrementally rather than all at once.&lt;/p&gt;

&lt;p&gt;The practical result:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No full stop
&lt;/li&gt;
&lt;li&gt;Reduced latency spikes
&lt;/li&gt;
&lt;li&gt;Smoother scaling
&lt;/li&gt;
&lt;li&gt;Less disruption during deployments
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The system still rebalances — but without unnecessary interruption.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Enables Cooperative Rebalancing?
&lt;/h2&gt;

&lt;p&gt;Rebalancing behavior depends on the partition assignment strategy.&lt;/p&gt;

&lt;p&gt;Older strategies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;RangeAssignor&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;RoundRobinAssignor&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For cooperative rebalancing, you use:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That configuration enables incremental rebalancing.&lt;/p&gt;

&lt;p&gt;The change is small in configuration, but significant in behavior.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Connects to Offset Management
&lt;/h2&gt;

&lt;p&gt;Rebalancing and offset commits are closely related.&lt;/p&gt;

&lt;p&gt;When partitions are revoked:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You may still be processing records
&lt;/li&gt;
&lt;li&gt;Offsets may or may not be committed
&lt;/li&gt;
&lt;li&gt;Improper handling can lead to duplicates &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you haven't explored how offset commits work, I explained it in detail here:&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://rajeevranjan.dev/blog/kafka-consumer-auto-commit-misunderstood" rel="noopener noreferrer"&gt;Kafka Auto Commit Explained (At-Least-Once Processing)&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even cooperative rebalancing does not eliminate this responsibility.&lt;/p&gt;

&lt;p&gt;It only reduces disruption.&lt;/p&gt;

&lt;p&gt;Correct offset handling is still critical.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Should You Prefer Cooperative Rebalancing?
&lt;/h2&gt;

&lt;p&gt;You should strongly consider it if:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You deploy frequently
&lt;/li&gt;
&lt;li&gt;You auto-scale consumers
&lt;/li&gt;
&lt;li&gt;You run multiple instances
&lt;/li&gt;
&lt;li&gt;You process high-throughput topics
&lt;/li&gt;
&lt;li&gt;You care about latency stability
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In modern distributed systems, these are common conditions.&lt;/p&gt;




&lt;h2&gt;
  
  
  Closing Thoughts
&lt;/h2&gt;

&lt;p&gt;Rebalancing is often treated as an internal Kafka detail.&lt;/p&gt;

&lt;p&gt;In practice, it directly affects:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Throughput
&lt;/li&gt;
&lt;li&gt;Latency
&lt;/li&gt;
&lt;li&gt;Duplicate processing
&lt;/li&gt;
&lt;li&gt;Deployment stability
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding how rebalancing works — and which strategy you are using — is part of building production-ready Kafka consumers.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s Next?
&lt;/h2&gt;

&lt;p&gt;If you're following along, next we’ll cover graceful shutdown of a Kafka consumer — including how to handle &lt;code&gt;WakeupException&lt;/code&gt;, commit offsets safely, and close the consumer without losing work.&lt;/p&gt;

&lt;p&gt;That’s where many real-world production issues actually happen.&lt;/p&gt;




&lt;p&gt;Originally published on my personal blog:&lt;br&gt;&lt;br&gt;
🔗 &lt;a href="https://rajeevranjan.dev/blog/kafka-eager-vs-cooperative-rebalancing/" rel="noopener noreferrer"&gt;https://rajeevranjan.dev/blog/kafka-eager-vs-cooperative-rebalancing/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>distributedsystems</category>
      <category>java</category>
      <category>backend</category>
    </item>
    <item>
      <title>Kafka Consumer Auto-Commit: Why 'At-Least-Once' Is Often Misunderstood</title>
      <dc:creator>Rajeev</dc:creator>
      <pubDate>Wed, 11 Feb 2026 17:23:39 +0000</pubDate>
      <link>https://dev.to/rajeev_a954661bb78eb9797f/kafka-consumer-auto-commit-why-at-least-once-is-often-misunderstood-15hn</link>
      <guid>https://dev.to/rajeev_a954661bb78eb9797f/kafka-consumer-auto-commit-why-at-least-once-is-often-misunderstood-15hn</guid>
      <description>&lt;p&gt;If you search online, you’ll often see statements like:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Kafka consumers provide at-least-once delivery when auto-commit is enabled.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;While this statement is not entirely wrong, it is dangerously incomplete. Many production issues happen because developers take this at face value without understanding &lt;em&gt;when&lt;/em&gt; offsets are committed and &lt;em&gt;what exactly&lt;/em&gt; “at-least-once” means in practice.&lt;/p&gt;

&lt;p&gt;In this article, we’ll look at what actually happens inside a Kafka consumer when auto-commit is enabled, why failures can still cause data loss, and when auto-commit is truly safe to use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Auto-Commit Feels Safe
&lt;/h2&gt;

&lt;p&gt;By default, the Kafka Java consumer has:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;enable.auto.commit = true&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;auto.commit.interval.ms = 5000&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This gives a comforting impression:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kafka periodically commits offsets
&lt;/li&gt;
&lt;li&gt;If a consumer crashes, it restarts
&lt;/li&gt;
&lt;li&gt;Messages should be reprocessed
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So it feels like at-least-once delivery is guaranteed.&lt;/p&gt;

&lt;p&gt;But the real question is:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;At least once relative to what? Polling? Processing? Business logic?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kafka only tracks polling. It does not track what your application does after that.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Kafka Actually Commits
&lt;/h2&gt;

&lt;p&gt;Kafka commits &lt;strong&gt;offsets&lt;/strong&gt;, not messages.&lt;/p&gt;

&lt;p&gt;An offset simply means:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“The next record the consumer should read.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;When auto-commit is enabled, the consumer periodically commits the &lt;strong&gt;latest offsets returned by &lt;code&gt;poll()&lt;/code&gt;&lt;/strong&gt;, regardless of whether your application has finished processing those records.&lt;/p&gt;

&lt;p&gt;Kafka does not know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether you processed the record
&lt;/li&gt;
&lt;li&gt;Whether processing succeeded or failed
&lt;/li&gt;
&lt;li&gt;Whether your database write completed
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From Kafka’s perspective, once &lt;code&gt;poll()&lt;/code&gt; returns records, those offsets are eligible for commit.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Actual Timeline (Critical to Understand)
&lt;/h2&gt;

&lt;p&gt;Let’s walk through a realistic scenario:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Consumer calls &lt;code&gt;poll()&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Kafka returns records with offsets &lt;code&gt;100–120&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Auto-commit timer fires&lt;/li&gt;
&lt;li&gt;Offset &lt;code&gt;120&lt;/code&gt; is committed&lt;/li&gt;
&lt;li&gt;Your application is still processing records&lt;/li&gt;
&lt;li&gt;The consumer crashes (OOM, JVM kill, container restart)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What happens next?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Kafka sees offset &lt;code&gt;120&lt;/code&gt; as committed
&lt;/li&gt;
&lt;li&gt;On restart, the consumer resumes from &lt;code&gt;121&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Records &lt;code&gt;100–120&lt;/code&gt; are never re-read
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;From your application’s point of view, those messages are effectively lost.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Is This Still Called “At-Least-Once”?
&lt;/h2&gt;

&lt;p&gt;Because Kafka’s guarantee is scoped narrowly.&lt;/p&gt;

&lt;p&gt;Kafka guarantees:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Records returned by &lt;code&gt;poll()&lt;/code&gt; will be delivered at least once to the consumer
&lt;/li&gt;
&lt;li&gt;As long as offsets are not committed before polling
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka does &lt;strong&gt;not&lt;/strong&gt; guarantee:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;At-least-once processing
&lt;/li&gt;
&lt;li&gt;At-least-once database writes
&lt;/li&gt;
&lt;li&gt;At-least-once business side effects
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This distinction is often overlooked.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Real-World Problem
&lt;/h2&gt;

&lt;p&gt;In real systems:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Processing may be asynchronous
&lt;/li&gt;
&lt;li&gt;There are database writes, API calls, retries
&lt;/li&gt;
&lt;li&gt;Processing can take seconds or even minutes
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Auto-commit is time-based, not processing-based.&lt;/p&gt;

&lt;p&gt;So the larger the gap between:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Polling the record
&lt;/li&gt;
&lt;li&gt;Finishing business processing
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;the higher the risk of data loss if the consumer crashes.&lt;/p&gt;

&lt;p&gt;This is why teams sometimes observe:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Missing records
&lt;/li&gt;
&lt;li&gt;Inconsistent aggregates
&lt;/li&gt;
&lt;li&gt;Silent data loss after restarts
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka usually behaves correctly — the misunderstanding is in how the guarantees are interpreted.&lt;/p&gt;




&lt;h2&gt;
  
  
  When Auto-Commit Is Actually Safe
&lt;/h2&gt;

&lt;p&gt;Auto-commit can be acceptable when:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Processing is very fast
&lt;/li&gt;
&lt;li&gt;Processing is idempotent
&lt;/li&gt;
&lt;li&gt;Losing a small number of records is acceptable
&lt;/li&gt;
&lt;li&gt;The consumer does not maintain critical state
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Typical examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Metrics collection
&lt;/li&gt;
&lt;li&gt;Log aggregation
&lt;/li&gt;
&lt;li&gt;Monitoring events
&lt;/li&gt;
&lt;li&gt;Best-effort analytics
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In these cases, simplicity may outweigh strict correctness.&lt;/p&gt;




&lt;h2&gt;
  
  
  When You Should Avoid Auto-Commit
&lt;/h2&gt;

&lt;p&gt;Avoid auto-commit when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You write to a database
&lt;/li&gt;
&lt;li&gt;You update business state
&lt;/li&gt;
&lt;li&gt;You perform non-idempotent operations
&lt;/li&gt;
&lt;li&gt;You require strong delivery guarantees
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In these situations, manual offset management provides better control:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Process the record
&lt;/li&gt;
&lt;li&gt;Ensure processing succeeds
&lt;/li&gt;
&lt;li&gt;Commit the offset explicitly
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;It adds complexity, but it aligns offset commits with business success.&lt;/p&gt;




&lt;h2&gt;
  
  
  Manual Commit Is Not Magic Either
&lt;/h2&gt;

&lt;p&gt;Even with manual commits:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Duplicates can still happen
&lt;/li&gt;
&lt;li&gt;Rebalances can interrupt processing
&lt;/li&gt;
&lt;li&gt;Commits can fail or be delayed
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka gives delivery guarantees, not business correctness guarantees.&lt;/p&gt;

&lt;p&gt;Production systems should still be designed with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Idempotent processing
&lt;/li&gt;
&lt;li&gt;Clear retry strategies
&lt;/li&gt;
&lt;li&gt;Proper failure handling
&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Key Takeaways
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Kafka commits offsets, not processing results
&lt;/li&gt;
&lt;li&gt;Auto-commit is tied to &lt;code&gt;poll()&lt;/code&gt;, not your business logic
&lt;/li&gt;
&lt;li&gt;“At-least-once” does not mean “processed at least once”
&lt;/li&gt;
&lt;li&gt;Auto-commit is fine for best-effort use cases
&lt;/li&gt;
&lt;li&gt;For critical systems, explicit offset control is safer
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Understanding this early can prevent subtle production bugs later.&lt;/p&gt;




&lt;h2&gt;
  
  
  What’s Next
&lt;/h2&gt;

&lt;p&gt;In the next article, we’ll explore:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The difference between eager and cooperative rebalancing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;👉 Read next: &lt;a href="https://rajeevranjan.dev/blog/kafka-eager-vs-cooperative-rebalancing" rel="noopener noreferrer"&gt;Kafka Eager vs Cooperative Rebalancing Explained&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Originally published on my personal blog:&lt;br&gt;&lt;br&gt;
🔗 &lt;a href="https://rajeevranjan.dev/blog/kafka-consumer-auto-commit-misunderstood" rel="noopener noreferrer"&gt;https://rajeevranjan.dev/blog/kafka-consumer-auto-commit-misunderstood&lt;/a&gt;&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>java</category>
      <category>distributedsystems</category>
      <category>backend</category>
    </item>
  </channel>
</rss>
