The Idempotent Consumer: Dedup Stores That Actually Scale

#kafka #architecture #eventdriven #backend

Book: Event-Driven Architecture Pocket Guide: Saga, CQRS, Outbox, and the Traps Nobody Warns You About
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

Someone on your team turned on Kafka's enable.idempotence=true, saw "exactly-once" in the config docs, and closed the ticket. Six weeks later a customer got charged twice and the post-mortem said "we thought the broker handled that."

The broker did not handle that. Exactly-once delivery, as a property you can lean on end-to-end, does not exist. What you actually have is at-least-once delivery and a dedup store you build yourself. The interesting engineering question is not whether to dedup. It is where you keep the record of what you've already seen, and how that store behaves when your throughput climbs.

Why exactly-once is a lie

Kafka's exactly-once semantics are real, but read the scope carefully. EoS covers a closed loop: consume from a topic, produce to a topic, commit the offset, all inside one transaction. The moment your handler does anything outside that loop, the guarantee evaporates.

Your handler calls Stripe. It sends an email. It writes a row to a Postgres database that isn't part of the Kafka transaction. Each of those is a side effect the broker cannot roll back. If your process crashes after charging Stripe but before committing the offset, the next poll redelivers the message, and you charge again.

So the honest model is: the broker delivers at least once, sometimes more. Your job is to make a second delivery a no-op. That requires remembering which messages you've processed, which means a dedup store. Three designs show up in production, and they scale very differently.

Store 1: the message-id dedup table

The default, and the one most teams should start with. Every message carries a unique ID. You record processed IDs in durable storage. Before handling a message, you try to insert its ID. If the insert fails because the row exists, you've seen it, so you skip.

The atomic primitive matters. In Postgres it's an insert with a unique constraint:

CREATE TABLE processed_events (
    event_id   TEXT PRIMARY KEY,
    handled_at TIMESTAMPTZ NOT NULL DEFAULT now()
);

func (s *PgDedup) Claim(
    ctx context.Context, id string,
) (bool, error) {
    tag, err := s.pool.Exec(ctx,
        `INSERT INTO processed_events (event_id)
         VALUES ($1)
         ON CONFLICT (event_id) DO NOTHING`,
        id,
    )
    if err != nil {
        return false, err
    }
    // RowsAffected == 0 means the row already existed.
    return tag.RowsAffected() == 1, nil
}

The trick is doing the claim and the side effect in the same transaction when you can. Insert the ID and write the business row together. If the transaction commits, both stick. If it rolls back, neither does, and redelivery retries cleanly.

What it catches: genuine duplicates from at-least-once delivery, surviving process restarts, broker rebalances, and failover. The record is on disk.

Where it stops scaling: the table grows without bound. Every event ever processed becomes a row. At a few thousand events per second, you're inserting a few thousand rows per second into a table you also have to keep indexed and prune. You add a TTL-style cleanup job (DELETE WHERE handled_at < now() - interval '7 days'), and now you've got a high-churn table with constant inserts and bulk deletes fighting the autovacuum. It works to a point. Past that point the dedup table becomes the bottleneck of the whole consumer.

Store 2: the bloom filter

When the dedup table's write volume hurts, the bloom filter is the obvious next reach. A bloom filter answers one question with a known error profile: have I probably seen this ID. It says "definitely no" or "maybe yes," and it does so in a fixed amount of memory regardless of how many IDs you've added. The snippet below uses bits-and-blooms/bloom.

import "github.com/bits-and-blooms/bloom/v3"

// Sized for 10M items at a 0.1% false-positive rate:
// about 17 MB of memory, fixed.
filter := bloom.NewWithEstimates(10_000_000, 0.001)

func seen(id string) bool {
    if !filter.TestString(id) {
        filter.AddString(id)
        return false // definitely new
    }
    return true // probably seen, treat as duplicate
}

That memory profile is the appeal. Ten million IDs in 17 megabytes, with lookups in microseconds and no network hop. For a fan-out consumer doing 50k events per second, that's the difference between keeping up and falling behind.

The catch is the word "maybe." A bloom filter has false positives by design. When it says "seen," it's right most of the time, but sometimes it's wrong, and a false positive means you drop a message you've never actually processed. For analytics ingestion or log fan-out, dropping one event in a thousand is fine. For a payment, it is a customer who paid and never got their order.

The second catch is that a plain bloom filter never forgets and cannot delete. Add IDs forever and the false-positive rate climbs as the filter saturates. Real deployments use a rotating or counting variant, or rebuild the filter on a schedule. And none of it survives a process restart unless you back it with something durable, at which point you've reinvented Store 1 with weaker guarantees.

So the bloom filter is a front-line cheap check, not an authoritative one. Treat its "maybe seen" as a hint that triggers a real lookup, never as the final word for anything a customer can see.

Store 3: the TTL cache

The middle ground, and where most high-throughput systems land. A TTL cache is a dedup table that forgets on purpose. You record processed IDs with a time-to-live tuned to the broker's worst-case redelivery window. Redis with SET NX is the common shape:

func (d *RedisDedup) Claim(
    ctx context.Context, id string,
) (bool, error) {
    key := "dedup:orders:" + id
    // SET key val EX <ttl> NX, one atomic round trip.
    ok, err := d.rdb.SetNX(
        ctx, key, "1", d.ttl,
    ).Result()
    if err != nil {
        // Fail closed: reprocess rather than skip
        // wrongly when the store is unreachable.
        return false, err
    }
    return ok, nil // ok == true means first time
}

The TTL is the whole design decision, and it's where teams get burned. The window has to cover the broker's worst-case redelivery gap, not what feels reasonable. Kafka with seven-day retention and the option to seek for a reprocess means a one-hour TTL is wrong: a replay from yesterday sails straight past an expired key and reprocesses. SQS redelivers within its visibility timeout, so a window of a few minutes past that is enough. Set the TTL from the broker's settings, not your intuition.

The other trap is that the cache is a cache. If Redis fails over to a replica that hasn't caught up, or the cluster reboots, the dedup keyspace empties. A retry that lands in that gap looks brand new. The earlier you assume this will happen, the better your design.

SET NX has to be one round trip. If anyone writes GET then SET, the race window between them is wide enough that two concurrent deliveries both see "not seen" and both process. Reject that PR. The atomic conditional write is the entire point.

The trade-off table

Store	Memory / storage	Survives restart	False drops	Best for
Message-id table	Grows unbounded, needs pruning	Yes, durable	None	Correctness-critical, moderate volume
Bloom filter	Fixed, tiny	No, unless backed	Yes, by design	Analytics, logs, hot-path hints
TTL cache	Bounded by TTL window	No, cache failover loses it	None within window	High-throughput with a known redelivery window

Read it by the column that hurts you. If a dropped message costs real money, the bloom filter's "false drops: yes" rules it out as your authority. If your table's write volume is the bottleneck, the unbounded growth row is your problem and the TTL cache is the fix. If you can't tolerate losing the keyspace on a failover, neither cache is enough on its own and you need the durable table underneath.

What scaling systems actually run

Large systems rarely pick one. They layer cheap-to-expensive and let the design backstop the infrastructure.

func (c *Consumer) Handle(
    ctx context.Context, evt OrderEvent,
) error {
    // Layer 1: bloom filter, in memory, microseconds.
    // "Definitely new" lets us skip the network entirely
    // on the common path. "Maybe seen" falls through.
    if !c.bloom.MaybeSeen(evt.ID) {
        c.bloom.Add(evt.ID)
    }

    // Layer 2: TTL cache, authoritative within its window.
    fresh, err := c.cache.Claim(ctx, evt.ID)
    if err != nil {
        return err // store down: let the broker redeliver
    }
    if !fresh {
        c.metrics.Inc("dedup.cache.hit")
        return nil
    }

    // Layer 3: the handler itself is a state transition.
    // Even if both checks above lied, "WHERE status =
    // 'pending'" applies the change exactly once.
    return c.process(ctx, evt)
}

Each layer earns its place. The bloom filter saves the network hop on the hot path. The TTL cache catches what the bloom filter forgets after a restart. The state-transition guard in the handler catches what the cache loses during a failover. A duplicate has to slip past all three before it reaches a customer, and the layers are ordered so the cheap check runs first and the durable guarantee runs last.

That last layer is the one to internalize. The cheapest dedup store is the one you don't need because the operation is naturally idempotent. "Mark order paid" with a WHERE status = 'pending' clause is correct no matter how many times it runs, with no store at all. When the handler can't be modeled that way, you pay the cost in one of the three stores above. The job is choosing which one on purpose, sized to the failure mode you're actually defending against, rather than trusting a broker config that promised more than it can deliver.

What's the dedup store behind your busiest consumer, and what happens to it when the cache fails over? Drop the war story in the comments.

If this was useful

Dedup is the part you can see. The part that bites later is everything around it: the outbox that makes the producer side idempotent in the first place, the saga that compensates the right step when one consumer skips and another doesn't, the replay strategy that doesn't reprocess a Stripe charge from last Tuesday. The Event-Driven Architecture Pocket Guide is built around the traps that show up after the first duplicate-charge post-mortem, which is usually right about when you start caring where your dedup store lives.