Solving the Dual Write Problem Without Losing Data
Distributed systems fail in uncomfortable ways.
Sometimes the database commit succeeds — but the Kafka publish fails.
Sometimes the event is published — but the transaction rolls back.
And sometimes everything looks successful… until downstream systems realize data is missing.
This is the dual write problem.
If your Go microservice:
- writes to a database
- publishes events
- triggers async workflows
- integrates with Kafka/RabbitMQ/NATS
then you are already dealing with it — whether you realize it or not.
This article explores how the Outbox Pattern solves this problem safely in production Go systems.
1. The Dual Write Problem
Consider a typical order flow:
HTTP Request
↓
Save order to DB
↓
Publish "OrderCreated" event
Naive implementation:
func CreateOrder(ctx context.Context, order Order) error {
err := db.Insert(order)
if err != nil {
return err
}
err = kafka.Publish("order.created", order)
if err != nil {
return err
}
return nil
}
Looks harmless.
But what happens if:
- DB insert succeeds
- Kafka publish fails
Now:
- order exists
- no event emitted
- downstream services never know
Your system is inconsistent.
2. Why Distributed Transactions Are Rarely the Answer
Some engineers try:
- two-phase commit
- distributed transactions
- XA protocols
In practice:
- operationally complex
- poor performance
- difficult to scale
- unsupported by many systems
Modern systems usually prefer:
- eventual consistency
- reliable event delivery
This is where the Outbox Pattern shines.
3. Core Idea of the Outbox Pattern
Instead of:
DB write
+
Kafka publish
Do:
DB write
+
Insert event into outbox table
inside the SAME transaction.
Then:
- background worker publishes events later
Now:
- either both persist
- or neither persists
Atomicity restored.
4. Outbox Table Design
Typical schema:
CREATE TABLE outbox_events (
id UUID PRIMARY KEY,
event_type TEXT NOT NULL,
payload JSONB NOT NULL,
created_at TIMESTAMP NOT NULL,
processed_at TIMESTAMP,
retries INT DEFAULT 0
);
Key fields:
- payload
- processed status
- retry count
- timestamps
This table becomes a durable event queue.
5. Writing to the Outbox (Go Example)
Inside transaction:
func CreateOrder(ctx context.Context, db *sql.DB, order Order) error {
tx, err := db.BeginTx(ctx, nil)
if err != nil {
return err
}
defer tx.Rollback()
_, err = tx.ExecContext(ctx,
`INSERT INTO orders(id, amount) VALUES($1, $2)`,
order.ID,
order.Amount,
)
if err != nil {
return err
}
payload, _ := json.Marshal(order)
_, err = tx.ExecContext(ctx,
`INSERT INTO outbox_events(id, event_type, payload, created_at)
VALUES($1, $2, $3, NOW())`,
uuid.New(),
"order.created",
payload,
)
if err != nil {
return err
}
return tx.Commit()
}
Now:
- order + event persist atomically
No dual write inconsistency.
6. Background Publisher Worker
Separate worker:
func StartOutboxPublisher(ctx context.Context, db *sql.DB) {
ticker := time.NewTicker(2 * time.Second)
for {
select {
case <-ctx.Done():
return
case <-ticker.C:
publishPendingEvents(ctx, db)
}
}
}
7. Publishing Pending Events
func publishPendingEvents(ctx context.Context, db *sql.DB) {
rows, err := db.QueryContext(ctx,
`SELECT id, event_type, payload
FROM outbox_events
WHERE processed_at IS NULL
LIMIT 100`)
if err != nil {
return
}
defer rows.Close()
for rows.Next() {
var (
id string
eventType string
payload []byte
)
rows.Scan(&id, &eventType, &payload)
err := kafka.Publish(eventType, payload)
if err != nil {
continue
}
_, _ = db.ExecContext(ctx,
`UPDATE outbox_events
SET processed_at = NOW()
WHERE id = $1`,
id,
)
}
}
Now failures become recoverable:
- if Kafka fails → retry later
- event never lost
8. The Hidden Problem: Duplicate Delivery
Outbox guarantees:
- at-least-once delivery
Not:
- exactly once
This means:
- consumer may receive duplicates
Consumers MUST be idempotent.
This connects directly to:
- retries
- idempotency keys
- distributed consistency
9. Handling Retries Properly
Never retry infinitely without control.
Track retries:
retries INT DEFAULT 0
Update:
UPDATE outbox_events
SET retries = retries + 1
Eventually:
- dead-letter queue
- manual inspection
- alerting
10. Polling vs CDC (Change Data Capture)
Simple approach:
- polling outbox table
Advanced approach:
- Debezium / WAL streaming
- CDC-based event publishing
Tradeoff:
| Polling | CDC |
|---|---|
| Simple | Complex |
| Easier ops | Higher throughput |
| Slight latency | Near real-time |
Most systems should start with polling.
11. Concurrency Pitfall: Multiple Workers
If multiple publisher instances run:
Two workers may publish same event.
Solution:
- row locking
Example:
SELECT *
FROM outbox_events
WHERE processed_at IS NULL
FOR UPDATE SKIP LOCKED
LIMIT 100
This is critical in Kubernetes deployments.
12. Observability Matters
Track:
- pending outbox size
- retry count
- oldest unprocessed event
- publish latency
- dead-letter count
Danger signal:
growing outbox table
This means downstream systems are unhealthy.
13. Real Production Failure Story
Classic outage pattern:
- Kafka degraded
- API kept accepting writes
- events silently failed
- downstream inventory never updated
Without outbox:
- permanent inconsistency
After outbox:
- events queued safely
- Kafka recovered later
- system healed automatically
This is resilience.
14. Production Lessons
The outbox pattern teaches an important engineering truth:
Reliability is not preventing failure.
It’s surviving failure without losing correctness.
Distributed systems WILL:
- retry
- duplicate
- reorder
- partially fail
Your architecture must expect this.
Final Thoughts
The Outbox Pattern is one of the most important patterns in modern backend engineering.
It solves:
- dual write inconsistency
- event loss
- partial failures
But it also forces you to think carefully about:
- idempotency
- retries
- observability
- operational recovery
Reliable distributed systems are not built by hoping failures won’t happen.
They are built by assuming they absolutely will.
Top comments (0)