Tsofnat M

Posted on Dec 10, 2025 • Edited on Dec 15, 2025

How to Guarantee True Ordering in Complex Kafka Replays: Solving the Determinism Nightmare

#kafka #infrastructure #systemdesign #eventdriven

בס"ד
In modern event-driven architectures, data replay is a cornerstone mechanism for incident analysis, data correction, or disaster recovery. But behind the seemingly simple task of "fetch events and send to Kafka," lies a complex world of ordering and consistency challenges.

The challenge isn't the replay itself, but guaranteeing true chronological order in an asynchronous environment. A single mistake in sequencing can lead to corrupted data, race conditions, and a failed recovery.

This post dives deep into the significant determinism issues we faced and the architectural solutions we implemented to ensure a reliable replay.

The System Architecture

To provide context, our replay system consists of the following components:

React Frontend: For range selection and monitoring.
API Service: Generates replay commands.
Worker Service: Queries backups and publishes to Kafka.
Target Data Processor: An asynchronous service that processes events using a thread-per-action model.
WebSocket: For real-time progress tracking.

Challenge #1: The Async Order Problem

The first step in the Replay process seems simple: fetching data from backup based on a date range and sending it to Kafka. In practice, the order in which events are sent is critical.

⚠️ Race Conditions due to Asynchronous Processing

Our target service (Data Processor) operates asynchronously, opening a separate thread for each operation (CREATE/UPDATE/DELETE) and lacking a built-in internal ordering mechanism.

If relevant events are not sent in the correct sequence (e.g., an UPDATE before a CREATE for the same entity), this can lead to Race Conditions and improper data processing.

We also considered an alternative proposal: processing all CREATE operations first, then all UPDATEs, and finally the DELETEs. Even with this approach, improper event processing can still occur. For example, consider a unique NAME column. What happens when an entity is created, deleted, and then created again with the exact same name?

The Solution: Synchronous Replay Pattern

The only way to guarantee full determinism at the publishing level is to enforce Synchronous Behavior:

Send a message to Kafka.
Wait for an acknowledgment from the target service that processing is complete.
Only then move to the next message.

Trade-off Note: This guarantees order but imposes a high cost on Latency and significantly reduces Throughput.

Challenge #2: The Failure of Timestamps (Global Ordering)

Once we adopted a synchronous model, we needed a definitive order to query our backup database. Relying on Kafka's default Timestamp proved insufficient for several reasons. The immediate idea is to sort by the Timestamp provided by Kafka, but this is insufficient.

Two main problems undermine the reliability of using timestamps as the sole sorting tool:

Timestamp Ties

When traffic is high, multiple events can receive the exact same timestamp (typically measured in milliseconds/microseconds). The sorting creates a "tie," and the system is forced to break it using an arbitrary rule (such as database query order), which can result in an incorrect and non-deterministic sequence.

Clock Skew

Every server in the network has its own clock, and even with the use of NTP, small deviations exist. The practical implication is that an event that logically occurred later (in real-time) might receive an earlier timestamp because the clock of the machine that generated it is slower.

The Solution: Global Offset via Postgres Sequence

Since Kafka lacks a built-in continuous global index across all topics, we created one.

Every message published through our API requests a new Sequence Number from PostgreSQL.
This atomic integer is embedded in the Kafka Header as a global_offset.

This provides a stable, source-of-truth order independent of machine clocks.

Challenge #3: Consistency and Order in Optimization Sub-Replays (Overlapping Replay Ranges)

Another ordering problem can occur when a user selects a date range for a replay, and within that range, previous sub-replays have already been executed. To maintain a complete history, we would want to include and replay these sub-ranges as well.

The problem arises when we arrange all relevant ranges on a timeline and discover overlapping ranges. While we want to optimize and avoid processing the same ranges more than once, we must preserve the true order of events. This means the overlapping data must be processed according to the latest source that modified it within the overlapping window.

Last-Wins Resolution

To prevent duplication (Optimization) while ensuring correct processing order, we use the 'Last-Wins' logic as a Pre-Processing step:

The algorithm resolves overlapping ranges using 'Last-Wins' logic instead of processing redundant data.
This ensures that every timestamp is processed only once.
Processing is carried out only by the latest source that created the change in the overlapping time range.

Algorithm Steps (High Level)

All ranges are sorted by their creation time (Crucial for Last-Wins logic).
Iterate through this sorted list.
For each new range, search for overlaps with existing (preceding) ranges.
Trim and merge existing ranges, and insert the new range into a new list with higher precedence.
Finally, the result is sorted by precedence for the final execution order.

public async Task<List<SplitRange>> SplitOverlappingRanges(Guid rootRunId, CancellationToken ct)
{
    var allRuns = await GetOverlappingRunsRecursive(rootRunId, ct);

    // 1. Sort by Creation Time (Crucial for "Last-Wins" logic)
    allRuns.Sort((a, b) => a.CreatedAt.CompareTo(b.CreatedAt));

    var splitRanges = new TreeDictionary<DateTimeOffset, SplitRange>();
    int currentRangeIndex = 0;

    foreach (var run in allRuns)
    {
        var start = run.FromDate;
        var end = run.ToDate;

        // 2. Handle Predecessor (Trim the tail of the previous range)
        if (splitRanges.TryGetValue(start, out var pred) && pred.Value.ToDate > start)
        {
            if (pred.Value.ToDate > end)
            {
                AddSplitSegment(splitRanges, pred.Value, end.AddTicks(1), pred.Value.ToDate);
            }
            pred.Value.ToDate = start.AddTicks(-1); // Trim existing
        }

        // 3. Handle Successors (Remove or trim future overlapping ranges)
        while (splitRanges.TryGetSuccessor(start, out var next) && next.Value.FromDate < end)
        {
            if (next.Value.ToDate > end)
            {
                AddSplitSegment(splitRanges, next.Value, end.AddTicks(1), next.Value.ToDate);
            }
            splitRanges.Remove(next.Key); // Remove completely overlapped part
        }

        // 4. Insert the current "Winning" Range
        splitRanges[start] = new SplitRange
        {
            RunId = run.RunId,
            FromDate = start,
            ToDate = end,
            RangeIndex = currentRangeIndex++
        };
    }

    return splitRanges.Values.OrderBy(x => x.RangeIndex).ThenBy(x => x.FromDate).ToList();
}

Conclusion

Building a truly deterministic data replay system requires moving beyond the basic assumptions of Kafka and timestamps. By enforcing Synchronous Processing, implementing a Global Offset Sequence, and using a Last-Wins Pre-Processing Algorithm, we were able to guarantee absolute chronological order.

Thank you for reading! What are your strategies for dealing with data replay order in large-scale event systems? If you have any questions about the implementation of the Range Splitter or managing Global Offsets in Postgres, I'd be happy to discuss them in the comments.

DEV Community