DEV Community

Denis Toropov
Denis Toropov

Posted on

Transactional Outbox with Kafka: How to Stop Losing Events When Syncing Databases

Transactional Outbox with Kafka: How to Stop Losing Events When Syncing Databases

When you sync data between services (or databases) through Kafka, the classic failure looks like this: the database transaction commits, but the Kafka message never gets published (crash, network issue, timeout). Your systems diverge silently, and you only discover it when a user reports incorrect data.

Why the obvious fixes don’t work

Wrapping Kafka send in retries doesn’t guarantee delivery if the process dies after the DB commit. Sending to Kafka before committing the DB creates “phantom events” about changes that never made it into the database. Two-phase commit is usually too complex operationally and doesn’t fit Kafka in a clean, universal way.

The key idea

Make the critical operation one atomic DB transaction: write the business data and the event to an outbox table in the same commit. If the update exists, the event exists too.

Conceptually:

  • In one DB transaction: update domain tables + INSERT into outbox

  • Later: a separate mechanism reads outbox and publishes to Kafka

  • This makes delivery retryable because the event is durably stored

Two ways to deliver events from the outbox

1) Polling Relay

A background worker polls unsent rows (sent_at IS NULL), publishes them to Kafka, then marks them as sent. It’s simple, reliable, and often enough when sub-second latency isn’t a strict requirement. If you run multiple workers, you must ensure each event is claimed once (typically via row locking / “skip locked” patterns).

2) CDC via Debezium

Instead of polling, Debezium streams inserts from the database transaction log (WAL) to Kafka. This reduces latency and removes the need for a poller, but adds infrastructure/ops complexity (Debezium + Kafka Connect).

Don’t forget: duplicates will happen

Outbox pipelines are typically at-least-once, so consumers must be idempotent. The most robust approach is an inbox table with a unique constraint on event_id: first insert the event id; if it already exists, skip processing. This avoids race conditions that occur with “check then insert”.

Common pitfalls

If you run the relay inside every API instance, scaling your API can accidentally scale polling load and hammer the DB. Also, outbox/inbox tables will grow—plan retention/cleanup. Finally, monitor lag: the count of unsent events and the age of the oldest unsent event are simple, high-signal metrics.

Bottom line

Transactional Outbox prevents “DB committed but event lost” by making event creation part of the DB transaction, then reliably delivering events to Kafka via polling or CDC—while consumers protect themselves with idempotency.

Top comments (0)