TL;DR
- Kafka guarantees delivery, not uniqueness. Retries are expected.
- Acknowledgements can fail even when a write succeeds, leading to duplicates.
- Producer idempotency prevents duplicate writes to Kafka, not duplicate business effects.
- Application-level idempotency gives each event a stable identity.
- Consumers must assume every message can be a duplicate.
- Databases (and sometimes caches) enforce idempotency at the point of side effects.
- "Exactly-once" is a practical goal, not a perfect guarantee.
- Idempotency doesn't eliminate retries; it makes retries safe.
If you've worked with Kafka long enough, you've probably seen this happen or you will.
A producer sends a message.
The consumer processes it.
The database write succeeds.
And then… something goes wrong.
The acknowledgement doesn't come back.
The network hiccups.
The consumer restarts.
Kafka does what it's designed to do: it retries.
Suddenly, the same message shows up again.
Now you're left staring at duplicated rows, repeated updates, or inconsistent state, wondering:
"But didn't Kafka already process this?"
Here's the uncomfortable truth:
Kafka guarantees delivery, not uniqueness.
Kafka is excellent at making sure messages are not lost. But when failures occur and they always do in distributed systems Kafka will retry. And retries mean duplicates, unless your system is designed to handle them.
This is where many systems quietly break.
Not because Kafka failed.
But because the system assumed acknowledgements were reliable.
Understanding the Duplicate Message Scenario
Let's walk through what's happening in the diagram above, step by step.
- The producer sends a message (Y) to the Kafka broker.
- The broker successfully appends this message to the topic partition. So far, everything is working as expected.
- However, when the broker sends the acknowledgement back to the producer, that acknowledgement fails to reach the producer maybe due to a temporary network issue or a timeout. From the producer's point of view, it has no way of knowing whether the message was actually written or not.
- So the producer does the only safe thing it can do: it retries and sends the same message (Y) again.
- The broker receives this retry and, without additional safeguards, appends the message again to the same partition. Now the topic contains two identical messages even though the producer intended to send it only once.
This is an important realization:
The duplication happened not because Kafka is broken, but because the producer could not trust the acknowledgement.
Kafka chose reliability over guessing. It preferred possibly duplicating a message rather than risking data loss. And that trade-off is intentional.
This is exactly why retries are a fundamental part of Kafka and why idempotency becomes essential when building real-world systems on top of it.
A Simple Analogy
Think of Kafka like a courier service.
You send a package and wait for a confirmation.
If the confirmation doesn't arrive, you send the package again just to be safe.
From the courier's point of view, that's the correct behavior.
From the receiver's point of view, they may now have two identical packages.
Kafka behaves the same way.
Retries are not a bug. They are a feature.
The question is: can your system safely handle receiving the same message more than once?
Enter Idempotency
This is where idempotency comes in.
At a high level, an operation is idempotent if doing it multiple times produces the same final result as doing it once.
In practical terms:
- Processing the same event twice should not corrupt your data.
- Writing the same record again should not create duplicates.
- Retrying should be safe, not dangerous.
Kafka provides some idempotency guarantees at the producer level, which help prevent duplicate messages from being written to Kafka itself during retries. That's important but it's only part of the story.
Because even with an idempotent producer:
- consumers can retry
- acknowledgements can fail
- databases can be written to more than once
Which means true idempotency is not a single setting.
It's a system-wide design choice that spans:
- producers
- Kafka
- consumers
- and the database itself
In this article, we'll walk through how idempotency actually works in real Kafka systems what Kafka protects you from, what it doesn't, and how to design your pipeline so that retries don't turn into production incidents.
No framework-specific code.
No marketing promises.
Just practical, production-oriented thinking.
Producer-Side Idempotency: Preventing Duplicates at the Source
Let's start at the very beginning of the pipeline the producer.
When a producer sends a message to Kafka, it expects an acknowledgement in return. If that acknowledgement doesn't arrive maybe due to a network glitch or a temporary broker issue the producer assumes the message was not delivered and sends it again.
From the producer's perspective, this is the safest possible behavior.
But without protection, this retry can result in duplicate messages being written to Kafka, even though the original message may have already been stored successfully.
To handle this, Kafka provides producer-side idempotency.
What Kafka's Idempotent Producer Actually Does
When producer idempotency is enabled, Kafka ensures that retries from the same producer do not result in duplicate records being written to a partition.
Internally, Kafka does this by tracking:
- a unique identity for the producer session, and
- a sequence number for each message sent to a given partition
If the producer retries a message because it didn't receive an acknowledgement, Kafka can recognize that this is a retry of a previously sent message, not a new one and it avoids writing it again.
The result is simple and powerful:
Even if the producer retries, Kafka will store the message only once.
This gives us a strong guarantee at the Kafka log level.
Why Acknowledgements Matter (acks=all)
Producer idempotency works correctly only when Kafka is allowed to fully confirm writes.
That's why it's typically paired with waiting for acknowledgements from all in-sync replicas.
Why does this matter?
Because a partial acknowledgement can lie.
If the producer receives an acknowledgement before the message is safely replicated, and a failure happens immediately after, Kafka might accept the retry and now you're back to duplicates or lost data.
Waiting for full acknowledgements ensures that:
- Kafka has durably stored the message.
- retries are handled safely.
- producer idempotency can actually do its job.
In short:
- Fast acknowledgements optimize latency.
- Strong acknowledgements protect correctness.
The Critical Limitation (This Is Where Many Teams Stop Too Early)
At this point, it's tempting to think:
"Great producer idempotency is enabled. We're safe."
Not quite.
Producer idempotency only guarantees that Kafka won't store duplicate records due to producer retries.
It does not guarantee:
- uniqueness across different producers
- uniqueness across restarts
- uniqueness at the consumer or database level
- business-level correctness
If multiple producers send logically identical events or if a consumer processes the same message twice Kafka will not stop that.
This is an important distinction:
Kafka-level idempotency protects delivery. It does not protect business state.
And that's why real-world systems need more than just producer idempotency.
Application-Level Idempotency: Making Duplicates Detectable
Once you accept that Kafka alone cannot guarantee uniqueness, the next question becomes:
How does the rest of the system recognize a duplicate when it sees one?
The answer is application-level idempotency.
At this layer, we stop relying on Kafka to "do the right thing" and instead give our system the ability to identify whether an event has already been processed, regardless of how many times it shows up.
The Core Idea: Stable Event Identity
Application-level idempotency starts with a simple but powerful concept:
Every logical event must have a stable, unique identity.
This identity is not generated by Kafka.
It's generated by the application and travels with the event, end to end.
Think of it like a receipt number.
If you see the same receipt number twice, you immediately know:
- this isn't a new action
- it's a retry or a duplicate
- processing it again would be incorrect
In Kafka systems, this typically means attaching an event ID to every message something that uniquely represents what happened, not when it was sent.
Why This Matters Even with Idempotent Producers
Producer idempotency prevents Kafka from writing the same send attempt twice.
But it cannot answer questions like:
- Did another producer emit the same logical event?
- Did this consumer restart and reprocess the message?
- Did a downstream write succeed even though the ack failed?
Only the application can answer those questions and it can only do so if events are identifiable.
That's why application-level idempotency is about business correctness, not messaging mechanics.
What Happens Without Stable Event IDs
Without a stable identifier, the system has no memory.
When a duplicate message arrives, the consumer has no way to know:
- whether this event is new
- whether it was already applied
- whether processing it again would cause harm
So the system does the only thing it can do: process it again.
This is how duplicates silently turn into:
- double inserts
- incorrect counters
- repeated state transitions
- corrupted aggregates
And by the time you notice, the damage is already done.
With Application-Level Idempotency
When every event carries a stable ID, the system can make an informed decision.
At the consumer side, the flow becomes:
- Receive event.
- Check whether this event ID was already seen.
- If yes → skip or safely ignore.
- If no → process and record the ID.
Now retries stop being dangerous.
They become harmless repetitions.
A Key Mindset Shift
This is the mental shift many teams miss:
Retries are inevitable. Duplicates are optional if your system can recognize them.
Kafka will retry.
Networks will fail.
Consumers will restart.
Application-level idempotency is how you design a system that remains correct anyway.
A Reality Check: "Exactly Once" Is a Goal, Not a Guarantee
Before we talk about consumer-side idempotency, it's important to set expectations.
In distributed systems, achieving 100% idempotency across all components is theoretically impossible.
This isn't a limitation of Kafka.
It's a property of distributed systems themselves.
When you have:
- independent processes
- network partitions
- retries
- crashes
- and multiple sources of truth
There will always be edge cases where the system cannot know, with absolute certainty, whether an operation already happened or not.
So when we talk about "exactly-once" behavior in Kafka-based systems, what we really mean is:
Practically exactly-once under well-defined failure scenarios.
The goal is not perfection.
The goal is controlled correctness.
Why This Matters
Many teams approach idempotency expecting a magic switch a configuration that eliminates duplicates forever.
That switch does not exist.
Instead, what Kafka and good system design give you is:
- deterministic behavior
- bounded failure modes
- safe retries
- and recoverable state
Idempotency is about minimizing harm, not eliminating retries.
Kafka's Philosophy Aligns with This Reality
Kafka intentionally chooses:
- at-least-once delivery
- explicit retries
- clear failure semantics
Because losing data is usually worse than processing it twice.
This means Kafka pushes the final responsibility for correctness up to the application.
That's not a weakness.
It's a design decision.
Consumer-Side Idempotency: The Final Line of Defense
With that reality in mind, we now arrive at the most critical part of the system: the consumer.
Even with:
- idempotent producers
- stable event IDs
- careful message design
Consumers will still:
- restart
- reprocess messages
- see the same event more than once
Which means the consumer must assume:
"Every message I receive could be a duplicate."
Consumer-side idempotency is where this assumption is enforced.
What the Consumer Must Do
At a high level, the consumer's job is simple:
- Receive an event.
- Check whether this event ID has already been processed.
- Decide whether to:
- apply the change
- skip it
- or safely update existing state
This check typically happens before any irreversible side effects especially database writes.
If the consumer does not perform this check, all previous idempotency efforts can still collapse at the last step.
Why the Consumer Is So Important
The consumer is the only component that:
- sees the final event
- performs the side effect
- mutates durable state
That makes it the last opportunity to prevent duplicates from becoming permanent.
If duplicates reach the database unchecked, the system has already lost.
How Consumers Enforce Idempotency in Practice
At the consumer layer, idempotency stops being a theory and becomes a decision-making process.
The consumer receives a message and must answer one question before doing anything else:
Have I already processed this event?
Everything else flows from that.
The Two Common Deduplication Strategies
In practice, consumers enforce idempotency using one of two mechanisms sometimes both.
1. Database-Based Deduplication (Most Reliable)
In this approach, the database itself becomes the source of truth for idempotency.
The idea is simple:
- every event has a stable event ID
- the database enforces uniqueness for that ID
- duplicate writes are either ignored or treated as no-ops
This works well because:
- databases are durable
- uniqueness constraints are enforced atomically
- retries become safe by design
From the consumer's point of view:
- if the write succeeds → the event was new
- if the write fails due to duplication → the event was already processed
The key benefit here is correctness under crashes.
Even if:
- the consumer restarts
- the same message is processed again
- the acknowledgement failed previously
…the database prevents corruption.
That's why database-level idempotency is often the strongest safety net in Kafka systems.
2. Cache-Based Deduplication (Fast but Weaker)
Some systems use an in-memory cache or distributed cache to track processed event IDs.
This approach is typically chosen for:
- very high throughput
- extremely low latency
- short-lived deduplication windows
The flow looks like this:
- consumer checks cache for event ID
- if present → skip
- if not → process and store ID in cache
This can work well, but it comes with trade-offs:
- cache entries expire
- cache can be evicted
- cache can be lost on failure
Which means:
Cache-based deduplication improves performance, but cannot be the only line of defense if correctness is critical.
Many production systems use cache as an optimization, with the database still acting as the final authority.
Choosing Between Them (or Combining Them)
There is no universal right answer.
The choice depends on:
- how harmful duplicates are
- how long duplicates can appear
- how much latency you can tolerate
- how much complexity you're willing to manage
A common pattern is:
- cache for fast, short-term duplicate filtering
- database for long-term correctness
This balances performance and safety.
The Important Ordering Rule
One subtle but critical rule applies regardless of strategy:
Deduplication must happen before side effects.
Once the consumer performs an irreversible action such as writing to a database or triggering an external call it's already too late to ask whether the event was a duplicate.
This is why idempotency checks are placed at the very start of message processing.
Why This Still Doesn't Mean "Perfect Idempotency"
Even with all of this in place, edge cases still exist.
There are moments where:
- a write succeeds
- the consumer crashes
- the acknowledgement never happens
- the message is retried
At that point, the system relies entirely on idempotency to remain correct.
And this brings us back to the earlier reality check:
Idempotency doesn't eliminate retries. It makes retries safe.
That's the real objective.
Conclusion: Idempotency Is a System Property, Not a Feature
Kafka is very good at one thing: making sure data is not lost.
It is intentionally not responsible for ensuring that data is processed only once everywhere. That responsibility belongs to the system built on top of Kafka, not Kafka itself.
The approaches discussed in this article represent one practical way to achieve idempotency in real-world systems but they are not the only way.
Depending on the ecosystem and tooling you use, there may be other mechanisms available:
- language-level abstractions
- framework-provided annotations
- transactional helpers
- or platform-specific guarantees
These can simplify implementation, but they do not change the underlying requirement:
the system must still be designed to tolerate retries and detect duplicates.
Frameworks can help. They cannot replace sound system design.
And that's the key takeaway.
Idempotency is not:
- a Kafka configuration
- a producer setting
- a consumer option
- or a database trick
It is a system-wide design decision.
What We've Learned
Let's zoom out and connect the dots.
- Kafka retries are expected not exceptional.
- Acknowledgements can fail even when writes succeed.
- Producer idempotency prevents duplicate writes to Kafka, not duplicate business effects.
- Application-level event identity makes duplicates detectable.
- Consumer-side idempotency is the final line of defense.
- Databases and caches enforce correctness when retries happen.
- "Exactly-once" is a practical goal, not a mathematical guarantee.
Or put simply:
Kafka guarantees delivery. Your system must guarantee correctness.
Why This Matters in Production
In small systems, duplicates might look harmless.
In large systems with high throughput, retries, restarts, and partial failures duplicates silently accumulate and eventually surface as:
- incorrect data
- broken invariants
- painful backfills
- and long debugging sessions
Idempotency is cheaper than recovery.
Systems that are designed to tolerate retries age far better than systems that assume they won't happen.
The Right Mental Model Going Forward
When designing Kafka-based systems, ask these questions early:
- What uniquely identifies a business event?
- What happens if this message is processed twice?
- Where is duplication detected?
- Where is correctness enforced?
- What happens when acknowledgements lie?
If you can answer those clearly, your system is already ahead of most.
Kafka will retry. Failures will happen. Duplicates will appear.
Idempotency is how you make all of that safe.
🔗 Connect with Me
📖 Blog by Naresh B. A.
👨💻 Building AI & ML Systems | Backend-Focused Full Stack
🌐 Portfolio: Naresh B A
📫 Let's connect on LinkedIn | GitHub: Naresh B A
Thanks for spending your precious time reading this it's a personal, non-techy little corner of my thoughts, and I really appreciate you being here. ❤️


Top comments (0)