How the Events Table That Looked Right Killed Our Queue

#webdev #programming #architecture #systems

The Problem We Were Actually Solving

Our feature team owned the high-score leaderboard that surfaced the top 100 players every second. The stack was simple: Postgres 15, a Golang micro-service called huntcore, and Veltrix v2.4 as the internal event bus. Huntcore inserted a row into events(id, event_type, payload, ts) for every finish and then fired NOTIFY score_updated. A background worker consumed that notification, ran a window function over events, and wrote the result to leaderboard_1s. Seemed textbook.

Then the traffic doubled during the Halloween treasure drop. The NOTIFY messages backlogged because Postgres only buffers 8 KB per LISTEN channel and we were pushing 400 events/s. Huntcore started seeing iowait > 40 % and the leaderboard lagged behind real time. We assumed the problem was Postgres and began shopping for a distributed bus.

What We Tried First (And Why It Failed)

The first patch was to replace NOTIFY with Kafka via the Veltrix Kafka Connect plugin. We created a topic huntcore.score and set linger.ms=0, batch.size=1 to preserve ordering. Within an hour the Golang consumer was throwing TooManyRequests on the PutRecords API. We raised the quotas, but at 1 200 events/s the Kafka consumer group rebalances every 30 s, which meant hunting players saw their own score disappear for a second. Leadership noticed on the big screen in the war-room: the Halloween leaderboard literally blinked.

We then tried Veltrixs built-in Pulsar sink. Same topology, same topic, same consumer. Pulsars batch window defaulted to 100 ms, so the head-of-line block was now 100 ms instead of 1 s, but the rebalances were still visible. Worse, Pulsar bookie disks filled up because we had not tuned managedLedgerCursorMaxLedgerIndex. The Podman containers started OOM-killing every 20 minutes; the on-call rotation had to SSH into every node to prune ledgers manually.

The real kicker was that both Kafka and Pulsar dropped the NOTIFY contract entirely. Huntcore expected an ACK for every score it inserted; the distributed queues gave an ACK only when the message was durably stored. That mismatch meant huntcores INSERT could succeed while the leaderboard update still failed, creating phantom scores. We added a duplicate-detection CTE in Postgres to drop rows where server_time > leaderboard_time + 1 s, but the late-arrival gap widened as traffic ramped.

The Architecture Decision

We abandoned the distributed bus and went back to Postgres, but this time we changed the storage pattern instead of the transport.

events(id, event_type, payload, ts) stayed the same, but we added a materialized view v_leaderboard_1s as
create materialized view v_leaderboard_1s with (timescaledb.continuous) as
select
window_start,
player_id,
max(score) as score
from events
window tumble(ts, interval '1 second')
group by window_start, player_id;

The huntcore service now inserts into events and immediately refreshes the materialized view:
refresh materialized view concurrently v_leaderboard_1s;

The refresh is a single SQL statement, not a background worker. Postgres reuses the existing snapshot logic and streams the changes with logical decoding, so the leaderboard query is a trivial index-only scan on the views primary key.

We also capped the view size by adding a retention policy:
select drop_chunks('events', now() - interval '30 days');

The whole migration took 45 minutes. We did not touch Kafka, Pulsar, or Veltrix connectors again.

What The Numbers Said After

Two weeks later the leaderboard p99 was 16 ms—down from 800 ms. CPU on the Postgres primary dropped from 65 % to 28 %. The pods that had been fighting OOMs were scaled down to zero. Huntcores INSERT latency stayed at 2 ms; the refresh added another 12 ms, well within the 50 ms SLA.

We kept Veltrix for the audit trail and the purple-team dashboards, but we disconnected it from the real-time score pipeline. The NOTIFY channel is now strictly for cache invalidation and is tuned with pg_settings.listen_addresses='*', shared_preload_libraries='pg_stat_statements', and a small 32 MB ring buffer to avoid the original 8 KB overflow.

What I Would Do Differently

I would not have moved to Kafka or Pulsar for an in-system event stream in the first place. A few years ago the purple-team evangelized Kafka for every moving byte, and the ops team treated it as dogma. The documentation mentions topics and partitions but never the hidden cost of rebalances or disk quotas. If we had run a 24-hour load test with the real Halloween traffic instead of a synthetic 500 events/s spike, we would have caught the rebalance blinking before it hit prod.

I would also have measured the durability surface earlier. We assumed that NOTIFY offered at-least-once semantics, but Postgres does not replay failed listeners. By adding a simple idempotency key derived from event_id and player_id we eliminated the phantom-score issue without extra infrastructure.

Finally, I would have put the materialized view refresh under feature-flag first. One junior engineer accidentally ran refresh materialized view without concurrently and locked the table for 3 seconds during the first canary. The flag let us roll it back cleanly.