Event Ordering and Partition Keys: The Guarantee You Think You Have

#kafka #architecture #eventdriven #backend

Book: Event-Driven Architecture Pocket Guide: Saga, CQRS, Outbox, and the Traps Nobody Warns You About
Also by me: Thinking in Go (2-book series) — Complete Guide to Go Programming + Hexagonal Architecture in Go
My project: Hermes IDE | GitHub — an IDE for developers who ship with Claude Code and other AI coding tools
Me: xgabriel.com | GitHub

You ship three events for one order: OrderCreated, OrderPaid, OrderCancelled. The consumer reads them and decides the order's final state. In staging, every order ends up correct. In production, a few orders a day end up Paid when they should be Cancelled. The events are all there. The payloads are right. They just arrived in the wrong order.

The bug isn't in your consumer. It's in the partition key you picked for the producer. Kafka gave you exactly the ordering guarantee it promised. It just wasn't the one you assumed.

The guarantee Kafka actually makes

Kafka orders messages within a single partition. That's the whole promise. Messages in partition 3 come out in the order they went into partition 3. There is no ordering across partitions. None.

A topic with 12 partitions is 12 independent ordered logs. Consumers read each partition in order, but two partitions advance independently. If OrderPaid lands in partition 4 and OrderCancelled lands in partition 7, the two consumer threads reading those partitions race. Whichever thread is ahead wins, and "ahead" depends on consumer lag, rebalances, and which broker answered first.

So the question that decides your ordering isn't "does Kafka preserve order." It's "which partition does each event land in." And that is decided entirely by the partition key.

The default partitioner hashes the key and takes it modulo the partition count:

partition = hash(key) % partition_count

Same key, same partition, always. Different keys, possibly different partitions. No key at all, round-robin across all of them. That last case is the one that quietly reorders everything.

How a bad key reorders your events

Here's a producer that looks fine in code review and is broken in production:

// BROKEN: no key, round-robin partitioning
ProducerRecord<String, byte[]> record =
    new ProducerRecord<>("orders", serialize(event));
producer.send(record);

No key means round-robin. The three events for order A1 get spread across three different partitions. Three consumer threads pick them up. There is no force on earth that keeps them in order.

Now a subtler break. The key exists, but it's the wrong one:

// BROKEN: keyed by event type, not by order
String key = event.type();   // "OrderCreated", "OrderPaid", ...
ProducerRecord<String, byte[]> record =
    new ProducerRecord<>("orders", key, serialize(event));
producer.send(record);

This passes a code review because it has a key. But now all OrderPaid events for every order share one partition, and all OrderCancelled events share a different one. Events for the same order are scattered by type. The thing you wanted ordered, per-order history, is the exact thing this spreads apart.

Same trap with a customer-region key, a shard id, a random UUID per message. Each one keeps some events together and splits the ones that matter.

Pick the key by aggregate id

The fix is to key by the thing whose history must stay ordered. In event-driven systems that's almost always the aggregate id, the order id, the account id, the cart id, the entity the events describe.

// CORRECT: keyed by the aggregate the events belong to
String key = event.orderId();   // all order-A1 events together
ProducerRecord<String, byte[]> record =
    new ProducerRecord<>("orders", key, serialize(event));
producer.send(record);

All events for A1 now hash to the same partition. They go into that log in produce order, and the consumer reads them out in that order. OrderCreated before OrderPaid before OrderCancelled, every time, for that order.

Events for order B2 might land in a different partition. That's fine. You never needed A1 and B2 ordered relative to each other. You only needed each order's own timeline intact. Per-partition ordering gives you exactly that, as long as one aggregate maps to one partition.

In Go with franz-go the shape is the same:

rec := &kgo.Record{
    Topic: "orders",
    Key:   []byte(event.OrderID),
    Value: payload,
}
client.Produce(ctx, rec, nil)

The rule: the partition key is the unit of ordering. Choose the aggregate whose events must be totally ordered, and key by its id. Nothing finer, nothing coarser.

The producer setting that quietly reorders a single partition

Picking the right key isn't enough. There's one producer config that reorders messages inside one partition on retry.

With retries enabled and more than one in-flight request per connection, a failed-then-retried batch can land behind a batch that was sent later. Same key, same partition, still out of order:

# RISK: in-flight > 1 with retries can reorder on retry
max.in.flight.requests.per.connection=5
retries=2147483647
enable.idempotence=false

The fix is the idempotent producer. It tags each batch with a sequence number so the broker rejects out-of-order writes and preserves order even across retries:

# SAFE: idempotent producer preserves per-partition order
enable.idempotence=true
# with idempotence on, the client caps in-flight at 5
# and Kafka still guarantees order on retry
acks=all

On modern Kafka clients enable.idempotence=true is the default, but plenty of older configs and hand-tuned producers turn it off chasing throughput. If you keyed correctly and still see reordering on the same key, this is the first setting to check.

Repartitioning is the other way order breaks

You shipped the right key. Order is stable for months. Then traffic grows, somebody adds partitions to the topic, and ordering breaks for a window nobody expected.

Remember the partitioner: hash(key) % partition_count. Change partition_count and the same key now maps to a different partition. Order A1's old events sit in partition 4 (12-partition math). Its new events go to partition 9 (16-partition math). For the orders in flight during the change, history splits across two partitions and the consumer can interleave them.

Two ways to avoid the surprise:

Over-provision partitions up front. You can't shrink a topic and you can't safely grow it without this hazard, so size for years-out throughput on day one. Partitions are cheap; reordering incidents are not.
If you must add partitions, drain first. Stop producing, let consumers catch up to the end of every partition, add partitions, then resume. No in-flight aggregate spans the boundary, so nothing reorders.

The same warning applies to changing the partitioner itself, switching from the default hash to a custom one, or moving from the older murmur2 scheme. Any change to how keys map to partitions is a reordering event for keys that are mid-flight.

What to check before you trust the ordering

Run this list against any topic where order matters:

Is there a key on every produce call? No key means round-robin means no per-aggregate order. Grep for ProducerRecord constructors and kgo.Record literals with no Key.
Is the key the aggregate id? Not the event type, not the region, not a per-message UUID. The id of the entity whose timeline must stay intact.
Is enable.idempotence=true? Otherwise a retry can reorder a single partition under load.
Is the consumer single-threaded per partition? If you hand a partition's records to a thread pool, you reordered them after Kafka did the work of keeping them straight. Process a partition's records in sequence, or use a key-aware executor that pins one key to one worker.
Does anyone have the authority to add partitions? If yes, write down the drain-first runbook before they need it at 2 a.m.

Most "Kafka reordered my events" incidents are one of these five. The broker did its job. The ordering you wanted was always per-partition, and something upstream broke the mapping between your aggregate and its partition: the key, the in-flight config, the consumer threading, or a partition count change.

Global ordering across a whole topic is a different and much harder problem, usually one partition for the whole topic with all the throughput limits that implies. Most systems don't need it. They need per-aggregate order, and per-aggregate order is a partition-key decision you make once and protect forever.

Pick the key by the aggregate. Keep the idempotent producer on. Decide who's allowed to repartition. Do those three and the ordering you assumed and the ordering you get finally match.

If this was useful

Partition keys are one of the small decisions that decide whether an event-driven system behaves the way the whiteboard said it would. The Event-Driven Architecture Pocket Guide: Saga, CQRS, Outbox, and the Traps Nobody Warns You About walks through the ordering traps alongside the ones that bite later: outboxes that preserve produce order, sagas that depend on per-aggregate sequencing, and the repartitioning runbook in full. If you've ever stared at events that arrived out of order and trusted, the book is the playbook for the next one.