In the previous article Kafka Producer Acks Explained: Replicas, ISR, and Write Guarantees, we discussed when a producer considers a write successful and how acknowledgments impact durability and availability.
But even with correct acknowledgment settings, one important problem still remains:
What happens when a write fails in Kafka?
Or even more interesting:
What happens when Kafka thinks a write failed, but it actually succeeded?
This is where Kafka retries and idempotent producers become critical.
The Real Problem: Uncertain Failures in Distributed Systems
In distributed systems, failures are not always clear.
Consider this scenario:
- Producer sends a message to the leader.
- Leader writes the message successfully.
- Leader sends acknowledgment.
- Network issue occurs → acknowledgment is lost.
From Kafka’s perspective:
- Broker: Write succeeded
- Producer: Write failed
Now the producer retries.
👉 The same message gets written again.
This leads to duplicate messages in Kafka, even though the system behaved correctly.
Kafka Retries Explained
Kafka producers support automatic retries to handle transient failures.
Kafka Retry Configuration
retries=3
retry.backoff.ms=100
How Kafka Retries Work
- Producer sends a record.
- If it receives an error (or timeout), it retries.
- This continues until:
- Retry count is exhausted, or
- The send succeeds
When Do Kafka Retries Trigger?
Retries typically happen in scenarios like:
- Temporary network failures
- Leader broker not available
- NOT_ENOUGH_REPLICAS
- REQUEST_TIMED_OUT
These are recoverable errors, making retries useful.
Problem with Kafka Retries: Duplicate Messages
Retries improve reliability but introduce a major issue:
Duplicate message production
Why?
Because the producer cannot always distinguish between:
- A failed write
- A successful write with lost acknowledgment
So retrying can result in:
Message A → written
Retry Message A → written again
Message Ordering Issues with Retries
Retries can also impact ordering.
Example:
- Message A is sent
- Message B is sent
- A fails and is retried later
Now B might be written before A retry.
👉 This can break ordering guarantees.
Kafka controls this using:
max.in.flight.requests.per.connection
But retries alone cannot guarantee correctness.
Kafka Idempotent Producer
To solve duplicate messages in Kafka, we use:
Idempotent Producer
What is Idempotence in Kafka?
Idempotence means:
Sending the same message multiple times results in it being written only once.
In Kafka:
👉 Even if retries happen, duplicate messages are not stored.
How Kafka Idempotent Producer Works
Kafka ensures idempotency using:
1. Producer ID (PID)
Each producer gets a unique identifier from the broker.
2. Sequence Numbers
- Each message has a sequence number per partition
- Broker tracks the latest sequence number
Duplicate Detection
On retry:
- Same sequence number is sent
- Broker detects duplicate
- Duplicate message is discarded
Enable Idempotent Producer in Kafka
enable.idempotence=true
This is enough to enable duplicate protection.
Important Kafka Config Changes with Idempotence
When idempotence is enabled, Kafka automatically enforces:
acks=all
retries=Integer.MAX_VALUE
max.in.flight.requests.per.connection=5
# Note: For idempotent producers, this number should be ≤5 to preserve ordering and ensure no duplicates
Why These Settings Matter
- acks=all → ensures durability
- retries=∞ → safe retry mechanism
- limited in-flight requests → preserves ordering
Scope of Idempotent Producer
What It Guarantees
- No duplicate messages per partition
- Safe retries
- Ordering guarantees (with correct config)
What It Does Not Guarantee
- No duplicates across producers
- No duplicates across restarts
- End-to-end exactly-once processing
For that, Kafka provides transactions.
Kafka Retries vs Idempotent Producer
| Feature | Without Idempotence | With Idempotence |
|---|---|---|
| Retries | Can create duplicates | Safe |
| Reliability | Moderate | High |
| Ordering | Can break | Preserved |
Recommended Kafka Producer Configuration
acks=all
enable.idempotence=true
retries=Integer.MAX_VALUE
retry.backoff.ms=100
This setup ensures:
- High reliability
- No duplicate messages
- Strong durability guarantees
When to Use Idempotent Producer
Use idempotent producers in:
- Payment systems
- Order processing
- Inventory management
- Critical event-driven systems
In modern Kafka setups:
👉 It should almost always be enabled.
Closing Thoughts
Kafka retries are essential for handling transient failures, but they introduce the risk of duplicate messages.
Idempotent producers eliminate this risk by making retries safe.
Together, they ensure:
- Reliable message delivery
- No duplication
- Strong consistency at the producer level
Summary
Kafka retries help recover from failures but can cause duplicate messages.
Idempotent producers solve this by ensuring messages are written exactly once per partition.
- Retries improve fault tolerance
- Idempotence ensures correctness
- Together they enable reliable Kafka pipelines
If you found this useful and want to share your thoughts leave a comment if you’d like. I always appreciate feedback and different perspectives.
Top comments (0)