How to analyze duplicate processing in an async flow

#architecture #discuss #distributedsystems #systemdesign

In one line: deduplication is about the evidence that a side effect has been applied — make it atomic with the effect, visible to everyone, and tied to a unique identifier.

This is just how I think it out — not a tutorial, not the final answer. I'm sharing my reasoning, and I'd love to hear where it breaks.

What comes to mind when you're asked about a duplicate issue?

maybe we can use the Inbox pattern to solve it.
maybe we can use Redis SETNX to build a distributed lock.
maybe ...

Wait a minute pls. Let's step back on this topic and not get stuck in some tech solutions.

So, my questions:

What does duplicate processing mean, at its core?
What conditions cause it to happen?
Which boundaries together guarantee dedup?

By the way, I tried to enumerate all the failure scenarios the typical implementations aim to prevent, but I gave up — there are too many possibilities to list them all. So I'll start from the essence of duplicate processing instead.

What is duplicate processing, exactly?

So, what is duplicate processing at its core?

It's not about how many times a message is delivered. It's about how many times the side effect is applied.

Delivered N times but the side effect applied once → correct.
Delivered N times, the side effect applied multiple times but idempotent (harmless) → correct, but wasteful.
Delivered N times, the side effect applied more than once and it's not idempotent → that's the real problem.

So at its root: duplicate processing means the same logical intent has its side effect applied more than once.

And one more thing: even when duplication happens, it only causes damage if the side effect is not idempotent. An idempotent side effect makes a duplicate harmless — but still wasteful, and real business logic is often hard to make idempotent. So idempotency isn't our goal here; the discussion below does not assume it.

What conditions cause it to happen?

Now, instead of jumping to solutions, let's think the other way around: under what conditions does the side effect get applied more than once?

Here is what I think — it happens if any of these is true:

Identifier: The same logical intent has no unique identifier, so two arrivals are treated as two different things.
Atomicity: The side effect runs, but the evidence that it was applied is not written atomically with it — so after a crash or a failed commit, the system can't tell it was applied, and runs it again.
Visibility: That evidence lives only in one consumer's local memory — so after a rebalance, a restart, or across worker threads, whoever picks the message up next can't see it, and runs it again.

Notice something: each condition above is just a way the guarantee breaks. So if we flip them around, we get the boundaries that guarantee non-duplication.

Which boundaries together guarantee dedup?

Flipping the failure conditions, here are the boundaries:

A unique identifier per logical intent, tied to the same business side effect.
The side effect and its evidence record — the proof that the side effect for this identifier has been applied — must be written atomically: both succeed or both fail, never just one. (Eventual consistency is needed if the side effect crosses an external system.)
The evidence record must be visible to every processor — not living in any single consumer's memory — so it stays visible across rebalances, restarts, and worker threads, and remains valid for as long as a duplicate could still occur. (Whether it lives in Redis or a relational table is just an implementation choice — the essence is that the evidence exists, with the necessary information.)

After that, we can decide what the orchestrator and collaborators look like.

Collaborators:

Evidence Checker — checks the evidence record to see whether the side effect for this identifier has already been applied. The record is visible to every processor, independent of any single consumer.
Side-effect Handler — applies the side effect and writes the evidence record atomically.
Offset Committer — confirms to the MQ that the message has been consumed. (No need to tie this to a specific MQ here — that's an implementation trade-off.)

Orchestrator:

public class ConsumerHandler {

    private EvidenceChecker checker;
    private SideEffectHandler handler;
    private OffsetCommitter committer;

    public void consume(Message message, CommitHandle handle) {
        log.info(...);

        // Has the side effect for this identifier already been applied?
        boolean alreadyApplied = checker.check(message.identifierKey());

        // Already applied — skip the work, just commit and return.
        if (alreadyApplied) {
            committer.commit();
            return;
        }

        handler.handle(message -> {
            // Within the same atomic boundary:
            // 1. apply the side effect (business logic)
            // 2. write the evidence record for this identifier
        });

        committer.commit();
    }
}

Take care of this:

We haven't discussed any concrete tech (RDBMS, Redis) yet.

The early alreadyApplied check is a performance optimization, not a correctness guarantee. Even with an idempotent side effect, reprocessing a duplicate still wastes resources — CPU, DB calls, external requests — so the check lets us skip that work and return fast. But it does NOT prevent duplication itself: a check-then-act still has a race window. The real guarantee comes from the unique constraint when the evidence record is written atomically.

No matter what MQ we use (Kafka, RabbitMQ, or something else), the consumer always needs the message and a way to commit/confirm it — otherwise it can't know the message was consumed. That's why consume(Message message, CommitHandle handle) is written like this.

So, what's left?

So far everything is still technology-agnostic — we went from the essence, to the failure conditions, to the boundaries, and finally to an abstract collaboration model. No Redis, no RDBMS yet.

The abstract model is clean. Reality usually isn't. The handler.handle(...) above still treats business logic as a black box — and that box might not be simple. When the side effect is more than one step, what does its evidence record look like then?

So I'll leave it here as a question:

What problems do you think are still hiding? What would you have to design or reason about next? And if you'd leave any comment to help refine this post, feel free to let me know — thanks in advance.

The point of this post: find the boundaries first, and every later solution has a place to fit.