Before we define it, let’s understand the nightmare it was designed to solve:
Why does this design pattern exist in the first place?
Let’s take a simple example.
Suppose you’re in the mood to watch a movie and relax.
You’ve got your popcorn, your comfy sofa, and Netflix open.
You log in. Netflix already recommends movies based on your watch history.
You select one, press “Watch Now,” and the movie starts streaming.
That’s the ideal scenario.
But behind that smooth experience, what is actually happening?
Imagine You’re Designing This System
You are responsible for two critical operations:
- Save the selected movie in the database (Very important — so Netflix can improve recommendations using behavioral data.)
- Notify another service to start streaming the movie (By emitting an event like MovieStarted.)
Seems simple.
But here’s where distributed systems start laughing at you.
The Real Problem
What if:
- The database save succeeds
- But the event emission fails?
Or worse:
- The event is emitted
- But the database transaction fails?
Now your system is inconsistent.
The recommendation service thinks the user watched the movie.
The streaming service thinks nothing happened.
Or the opposite.
This is called the Dual Write Problem(the nightmare we need to solve).
You are writing to two different systems:
- A relational database (ACID guarantees — atomicity, consistency, durability)
- A message broker (asynchronous, eventually consistent)
And there is no single atomic transaction spanning both.
- No shared commit boundary.
- No guaranteed consistency.
- No safety.
Enter the Transactional Outbox Pattern
The idea is simple but powerful.
Instead of:
Save to DB → Emit event to broker
You do:
Save business data
→ Insert event into OUTBOX table
→ Commit transaction
Both operations succeed or both fail.
Now both operations happen inside the same database transaction.
If the commit fails → nothing is persisted.
If it commits → both the state change and the event record are durable.
This solves the atomicity problem.
Ok, so now two tables have the required data — what changed?
The change is that the event is no longer sent directly to Kafka.
Instead:
You can now poll the outbox table, read the unprocessed events, emit them to the message broker, and then mark them as processed.
Or you can use CDC (Change Data Capture) on the outbox table so that it directly captures database changes (from WAL/binlog) and emits them to the message broker automatically.
Or you can even introduce an entirely separate service dedicated to handling this responsibility.
We removed distributed transactions (2PC) and still preserved atomicity between state change and event creation.
So You Solved the Problem and Saved Your Job… But What If?
- The event is published, the broker acknowledges it, but the application crashes before marking the event as processed. After restart → the event is published again. Now you have duplicates.
- In a horizontally scaled system, multiple instances poll the same outbox table and the same event is picked more than once.
- The DB transaction commits, the event is emitted, but the broker crashes before persisting it. Or a network timeout occurs and you don’t know whether the publish succeeded.You retry — and create duplicates.
- You have a high-throughput system and polling the outbox table increases database load, creates lag, and eventually becomes a bottleneck.
For these reasons, you should never rely on the Outbox implementation alone.
Outbox guarantees atomicity — not delivery perfection.
You must design your consumers to handle failure scenarios:
- Consumers should be idempotent
- Use idempotency keys
- Partition by aggregate ID to preserve ordering
- Handle duplicate messages safely
- Use deduplication tables if required
- For high DB load, prefer CDC tools like Debezium over polling
What I’m Trying to Say Is:
The Outbox Pattern is not a one-stop solution.
Many engineers assume it solves broker reliability.
It does not.
The Outbox Pattern does not solve reliability of the broker.
It only guarantees atomicity between state change and event creation.
It guarantees at-least-once delivery, not exactly-once.
If you want to keep your job as a system designer, you must design around its weaknesses — not ignore them.
Now that you understand the Outbox Pattern properly, let’s look at some examples on when and where it should be used — and where it should not.
When To Use the Outbox Pattern
✅ 1. Database is the Source of Truth
Example: Order Management System
Order saved in PostgreSQL
OrderCreated event must be emitted
Losing that event breaks inventory and billing
Strong consistency is required → Outbox is ideal.
✅ 2. Financial or Healthcare Systems
Example: Payment Processing
Transaction written to DB
Event triggers ledger updates, fraud checks, notifications
Losing the event = financial inconsistency.
Outbox ensures atomicity between transaction and event creation.
When NOT To Use the Outbox Pattern
❌ 1. When the Event Log Is the Source of Truth
Example: An Event Sourcing system built around Kafka
In this architecture:
All state changes are written directly to Kafka first.
The database is just a projection (materialized view).
The event log is the system of record.
Here, writing to the database first and then using an outbox adds unnecessary complexity.
You should publish to Kafka as the primary write operation and build state from events.
Outbox is not needed.
❌ 2. Ultra High Throughput Streaming Systems
Example: Real-time clickstream analytics or ad impression tracking
Millions of events per second
Events are transient and not tightly coupled to transactional DB state
Occasional event loss may be acceptable
In such systems, polling a relational database becomes a bottleneck:
Heavy I/O
Lock contention
Index scans
Increased latency
It is better to:
Write directly to Kafka
Use stream processing (Kafka Streams / Flink)
Materialize views downstream
❌ 3. When Eventual Consistency Is Acceptable
Example: Tracking “user viewed product” for analytics
If one tracking event is lost:
It does not break core business logic
No financial or critical data is affected
Using Outbox here adds operational overhead without strong benefit.
❌ 4. When You Don’t Control the Database
Example:
Using a third-party SaaS database
No ability to create tables
No transaction control
Since Outbox relies on atomic database transactions, it cannot be properly implemented.
Closing Thoughts
The Outbox Pattern is not about Kafka, polling, or CDC.
It is about solving the dual write problem in a practical way.
It guarantees atomicity between state change and event creation —
but it does not guarantee broker reliability or exactly-once delivery.
The mistake many engineers make is believing a pattern solves the entire problem.
It doesn’t.
Outbox is a powerful tool — but real reliability comes from designing for failure, not assuming it won’t happen.
If you’re exploring event-driven architectures further, I’ve also written about Kafka Streams and why it matters in real-world systems:
Top comments (0)