DEV Community

Cover image for Event-Driven Architecture: The Dual Write Problem and How to Solve It
Alok Ranjan Daftuar
Alok Ranjan Daftuar

Posted on • Originally published at aloknecessary.github.io

Event-Driven Architecture: The Dual Write Problem and How to Solve It

You have a well-designed order service. It writes to the database and publishes an event to Kafka. Clean, decoupled, event-driven. Then Kafka has a brief network hiccup. The database write succeeds. The event publish fails. The order exists. Fulfillment never hears about it. No alert fires. Just a quietly broken order going nowhere.

This is the dual write problem — an architectural correctness problem that exists the moment you write to two separate systems without a coordination mechanism.


The Problem

A dual write occurs when your application writes to two separate systems as part of a single logical operation without atomicity across both. The dangerous failure modes are silent — the HTTP response returns 200, the client gets a success, and nothing downstream happens.

The naive fixes don't work:

  • Try/catch with retry — introduces duplicate events; consumers must be idempotent
  • Publish first, then write DB — just reverses which failure mode you're exposed to
  • Distributed transactions (2PC) — sacrifices availability and introduces distributed locking

The real solution: reduce to a single atomic write and derive the event from it.


Solution 1: Transactional Outbox Pattern

Write the event as a row in an outbox table in the same database transaction as your business data. A separate relay process reads from the outbox and publishes to the broker.

  • Both writes succeed or fail together (single DB transaction)
  • Relay publishes and marks messages as published
  • Guarantees at-least-once delivery — consumers must be idempotent

Best for: greenfield services, full control over event schema, teams wanting simplicity.


Solution 2: Change Data Capture (Debezium)

Read directly from the database's transaction log (WAL/binlog). Every committed write is captured and streamed to Kafka automatically. No application code changes required.

  • Sub-second publish latency (WAL-based, no polling)
  • Captures all state changes including DB migrations and admin tools
  • Requires infrastructure for Kafka Connect + Debezium

Best for: legacy systems, high-throughput services, capturing all state changes without code modification.


Solution 3: Event Sourcing

The event log is the source of truth. The database is a derived projection. There is no dual write because there is only one write — appending events to the event store.

  • Eliminates the problem entirely
  • Introduces significant complexity (schema versioning, aggregate rehydration, eventual consistency)

Best for: domains where history of state changes matters (financial systems, audit-heavy domains).


Operational Non-Negotiables

  • Consumer idempotency — at-least-once delivery means duplicates will arrive. Deduplicate on event ID.
  • Outbox housekeeping — purge published messages; don't let the table grow unbounded.
  • Replication slot monitoring — for CDC, a stuck connector causes WAL accumulation and disk exhaustion.

Read the Full Article

This is a summary of my deep dive into the dual write problem. The full article covers all three solutions with production implementation examples:

👉 The Dual Write Problem and How to Solve It — Full Article

The full article includes:

  • Four failure scenarios with a dual write matrix
  • Transactional Outbox Pattern implementation (.NET with EF Core)
  • Polling relay vs log-tailing relay comparison
  • Debezium PostgreSQL connector configuration
  • Event Sourcing with aggregate pattern (C#)
  • Decision matrix for choosing between the three solutions
  • Operational concerns: housekeeping, replication slot monitoring, consumer idempotency
  • Production deployment checklist

Top comments (0)