Vitalii Buhaiov for MarketTrace

Posted on May 14

Why a single timestamp breaks real-time aggregation

#distributedsystems #dataengineering #webdev #cryptocurrency

During volatile moves, the aggregator could show a "consensus" order book that never existed on any exchange at any single instant. The bug: one timestamp field hiding three different "nows", one per venue.

I learned this aggregating live order books. The pattern generalizes to any multi-source pipeline.

The setup

I run a service that joins live order books from Binance, Bybit, and OKX into one view. Each producer is a small daemon (one per exchange/asset pair) that holds a websocket open, applies snapshot+diff updates, and publishes the top 200 levels to Redis. A 10 Hz aggregator reads the three Redis keys, bins prices into per-asset buckets ($1 for BTC, $0.10 for ETH, etc.), and writes one unified snapshot.

There's an obvious question every consumer asks:

what time is this unified snapshot from?

If you're paying attention, this question has two wrong answers before it has a right one.

Wrong answer #1: now()

The aggregator runs every 100 ms. So the snapshot is from… now? Sort of. It's from the moment the aggregator built it. The underlying data is older. Each producer has its own publish cadence, the websocket has its own jitter, and the exchange itself stamped the event some milliseconds before that.

now() is fine as an audit-log field ("the aggregator emitted this snapshot at t=…"). It's wrong if a consumer wants to know how old the data is.

Wrong answer #2: max(producer.ts)

Better: each producer stamps ts = int(time.time() * 1000) on its publish. The aggregator picks the freshest producer and calls that the snapshot's time.

This is what I shipped first. It's wrong for a subtler reason: producer ts is wall time on the producer machine, not the exchange's event time. Two producers can be sitting on data that the exchange stamped 200 ms apart yet publish to Redis 5 ms apart on their wall clocks. The snapshot looks synchronized because producer clocks are close, even though the underlying exchange events are not. Even perfectly synchronized producer clocks can't reconstruct exchange event ordering after the fact.

In quiet markets this is invisible. In high-volatility moments (a Fed print, a liquidation cascade) Binance, Bybit, and OKX can stamp their depth events tens to hundreds of milliseconds apart. Producer ts hides this completely. The consensus snapshot reads as a single instant when in fact it's stitched from three.

The two-field rule

The fix that finally stuck: every payload carries two timestamps.

payload = {
    "ts":        int(time.time() * 1000),  # producer wall time (publish moment)
    "event_ts":  top.get("event_ts"),      # exchange-stamped event time
    ...
}

exchange emits event  →  producer receives  →  producer publishes  →  aggregator joins
       event_ts                                       ts

ts is the producer's wall clock at publish. It controls Redis TTL ("is this producer alive?") and is the right field for staleness gates. If a producer dies, ts stops moving, which is the signal to exclude it.

event_ts is whatever the exchange called the time of the underlying event. It's the right field for cross-source alignment. Two producers with event_ts 200 ms apart are showing different instants of the market, even if their ts is identical.

These do different jobs. Conflating them, picking one field to do both, leaks one job into the other. Gate staleness on event_ts, and you get false alarms when a venue throttles its push rate. Align on ts, and you get fake consensus during a vol spike.

Two fields. Two jobs. Don't combine them.

Treasure hunt: what each venue actually sends

The two-field rule pushes complexity into the producers. Each one has to know what its exchange calls "event time". The answer is different at every venue.

Binance futures depth stream: each event carries E (exchange-emitted event timestamp) and T (transaction time). I prefer E, fall back to T:

ev_ts = ev.get("E") or ev.get("T")

Both are ms epoch ints.

Bybit orderbook.50.<SYMBOL> on v5 linear: a top-level ts field on the wrapping frame, int ms epoch. The inner data carries u (sequence id) but no separate event timestamp.

OKX books channel on v5: the timestamp lives inside each data[] entry, and it arrives as a string of ms epoch.

Three venues, three shapes:

Venue	Field	Location	Wire type
Binance	`E` (fallback `T`)	per-event object	int ms
Bybit	`ts`	top-level wrapper	int ms
OKX	`ts`	per-entry inside `data[]`	string ms (parse)

After extraction, every producer normalizes to int ms epoch stamped on a field literally named event_ts in Redis. Downstream code never branches on exchange to interpret time.

Normalize exchange-specific timestamp semantics at the producer boundary, not in downstream consumers.

The aggregator: two jobs, two fields

With the invariant in place, the aggregator's logic separates cleanly. Stale gate uses ts. Alignment uses event_ts. They never cross paths.

The stale gate:

snap_ts = snap.get("ts", 0)
age_ms = now_ms - snap_ts if snap_ts else None
if age_ms is None or age_ms > STALE_THRESHOLD_S * 1000:
    sources_status.append({
        "exchange": ex, "status": "stale", "age_ms": age_ms,
    })
    continue

Threshold is 60 s. If a producer's wall clock hasn't moved in a minute, it's dead. Exclude it from the union. This is the right measure of producer health and nothing else.

For the live sources, extract event_ts and the lag it implies:

ev_ts_raw = snap.get("event_ts")
event_ts = int(ev_ts_raw) if ev_ts_raw is not None else None
event_age_ms = (
    now_ms - event_ts if event_ts is not None else None
)

event_age_ms is the honest measure of how far behind this venue's underlying data is. A producer can be perfectly healthy (recent ts) yet showing data that's 300 ms behind because the exchange itself is slow under load. That's a different failure mode, and the FE needs to surface it differently. Not "Bybit is down" but "Bybit is lagging."

cross_exchange_skew_ms as honesty metric

Once each live venue has an event_ts, the skew across them falls out:

ok_event_ts = [
    ts for ex, ts in event_ts_per_exch.items()
    if ex in ok_exchanges and ts is not None
]
cross_exchange_skew_ms = None
if len(ok_event_ts) >= 2:
    cross_exchange_skew_ms = max(ok_event_ts) - min(ok_event_ts)

A single integer. Spread between earliest and latest exchange-stamped event in the "consensus" snapshot.

Binance event_ts = 12:00:00.100
Bybit   event_ts = 12:00:00.420
OKX     event_ts = 12:00:00.160

cross_exchange_skew_ms = 320

That "consensus" snapshot spans almost a third of a second. Skew is a different signal from transport latency. It captures how far apart the venues' own clocks place their events, regardless of how fast your producers ran. If the exchanges themselves stamped events 320 ms apart, the snapshot is 320 ms wide.

Practical thresholds:

<100 ms     = normal
100–300 ms  = degraded
>300 ms     = unsafe for microstructure signals

This metric does one important thing: it surfaces dishonesty in the consensus view. A naive aggregator presents a unified snapshot as if it's a single instant. It isn't. By emitting cross_exchange_skew_ms on every payload, every consumer picks its own policy. A 1-second chart can ignore 200 ms of skew. A spoof-detection feature has to discard the snapshot and wait. A "live consensus" UI can display the skew as a number. "Consensus over a 47 ms window" is honest; hiding it isn't.

The principle generalizes: when a system can't be honest about precision, it should at least be honest about its imprecision.

When a venue's clock drifts

The two-field rule also protects against a failure mode that took me embarrassingly long to notice: an exchange's clock can be wrong.

Most of the time venue clocks are NTP-disciplined and accurate to a few ms. But under load, after a maintenance window, or around an NTP step, a venue's event_ts can lurch forward (or backward) by hundreds of ms relative to wall time. The producer faithfully forwards the bad timestamp because that's its job.

With one timestamp, you can't tell whether the producer is slow or the venue is, so you either drop the venue or accept a torn consensus snapshot. With two timestamps the failure is visible: event_age_ms goes negative (the venue claims data from the future), or it spikes asymmetrically vs other venues. The skew metric lights up, and you can downgrade that venue specifically, not the whole pipeline.

Beyond crypto: any multi-source pipeline

The pattern isn't crypto-specific. Anywhere you aggregate real-time data from N independent sources, the same problem shows up:

Distributed logs: pod wall-clock time vs the event's actual occurrence (or the trace span's start).
Sensor fusion: each sensor's local clock vs the moment your gateway received it.
IoT telemetry: device clock (often horribly skewed) vs gateway ingestion time.
Cross-region replication: source DB commit time vs replica apply time.

Same three temptations. Same three wrong answers. Same fix: carry both timestamps to the joining layer. Ingestion time for liveness and TTL. Source time for alignment. Spread between sources as a first-class metric so consumers decide what to trust.

This is also the framing Apache Flink and Beam ship with: event time vs processing time, with watermarks to surface drift. Most ad-hoc real-time systems converge to the same dual-timestamp model eventually. You can skip the eventually.

What I'd tell past me

Two lines.

In the very first payload schema, write both ts and event_ts. Migrating later means rewriting every producer, the aggregator, and every consumer. Adding it on day one is two extra lines per producer.
Emit the skew metric on every aggregated payload, even when it's "always low". The day skew matters, you'll wish you had it on the wire.

The single-timestamp field is one of those defaults that looks fine until it doesn't.

One timestamp tells you when you saw the data.

Two timestamps tell you when the market happened.

Carry both.

DEV Community