"Your dashboard shows 'processing'—but the order shipped hours ago."
Command Query Responsibility Segregation (CQRS) promises blazing-fast reads and tangle-free writes. But when your read models lag behind reality, you get:
- Customers seeing outdated balances
- Support teams fighting phantom data
- A creeping distrust of your entire system
Let’s dissect why this happens—and how to fix it.
1. The Root Causes
Pitfall 1: Eventual Consistency Isn’t Eventual Enough
Scenario:
- User cancels an order (
CancelOrderCommand
succeeds) - Read model still shows "active" 5 seconds later
Why?
- The command updated the write model
- The projection handler hasn’t processed the event yet
Fix:
# Set SLAs for consistency
Rails.application.config.eventual_consistency_max_delay = 1.second # Alert if breached
Pitfall 2: Poisoned Events
Scenario:
-
OrderShipped
event fails to process (bug in handler) - Read model never updates
Why?
- No dead letter queue for failed events
- No automated retries
Fix:
# With RabbitMQ
class OrderShippedHandler
include Sneakers::Worker
def work(event)
process_event(event)
ack! # Only on success
rescue => e
log_error(e)
retry_later(event)
end
end
Pitfall 3: Clock Drift Across Services
Scenario:
- Payment service emits
PaymentProcessed
at 2:00:00 PM - Inventory service processes it at 2:00:03 PM
- Read model shows "paid but out of stock" (wrong order)
Why?
- Events processed out of chronological order
Fix:
# Use hybrid logical clocks
event = PaymentProcessed.new(
order_id: "123",
timestamp: Time.now,
logical_clock: last_clock + 1
)
2. Debugging a Stale Read Model
Step 1: Check the Event Horizon
# How far behind is the projection?
last_processed_event = ReadModelStatus.last_event_for("orders")
latest_event = EventStore.max_event_id
lag = latest_event - last_processed_event # Alert if > 100
Step 2: Rebuild Suspicious Projections
# Force a rebuild
OrderProjection.rebuild!(order_id)
Step 3: Trace the Data Flow
1. Find the command (e.g., `CancelOrderCommand`)
2. Locate its emitted event (`OrderCancelled`)
3. Follow the projection handler (`OrderCancelledHandler`)
3. Prevention Strategies
✅ Set consistency SLAs (e.g., "Reads lag ≤1s")
✅ Add synthetic tests (e.g., "Write then immediately read")
✅ Use idempotent handlers (retries shouldn’t explode)
"But Our Reads Are Fast Enough!"
Until they’re not. Start small:
- Add lag metrics to one critical read model
- Implement one retry handler
- Run a chaos test (kill a projection worker)
Hit a CQRS landmine? Share your war story below.
Top comments (0)