DEV Community

Alex Aslam
Alex Aslam

Posted on

CQRS Pitfalls: Why Your Read Model is Stale

"Your dashboard shows 'processing'—but the order shipped hours ago."

Command Query Responsibility Segregation (CQRS) promises blazing-fast reads and tangle-free writes. But when your read models lag behind reality, you get:

  • Customers seeing outdated balances
  • Support teams fighting phantom data
  • A creeping distrust of your entire system

Let’s dissect why this happens—and how to fix it.


1. The Root Causes

Pitfall 1: Eventual Consistency Isn’t Eventual Enough

Scenario:

  • User cancels an order (CancelOrderCommand succeeds)
  • Read model still shows "active" 5 seconds later

Why?

  • The command updated the write model
  • The projection handler hasn’t processed the event yet

Fix:

# Set SLAs for consistency
Rails.application.config.eventual_consistency_max_delay = 1.second # Alert if breached
Enter fullscreen mode Exit fullscreen mode

Pitfall 2: Poisoned Events

Scenario:

  • OrderShipped event fails to process (bug in handler)
  • Read model never updates

Why?

  • No dead letter queue for failed events
  • No automated retries

Fix:

# With RabbitMQ
class OrderShippedHandler
  include Sneakers::Worker
  def work(event)
    process_event(event)
    ack! # Only on success
  rescue => e
    log_error(e)
    retry_later(event)
  end
end
Enter fullscreen mode Exit fullscreen mode

Pitfall 3: Clock Drift Across Services

Scenario:

  • Payment service emits PaymentProcessed at 2:00:00 PM
  • Inventory service processes it at 2:00:03 PM
  • Read model shows "paid but out of stock" (wrong order)

Why?

  • Events processed out of chronological order

Fix:

# Use hybrid logical clocks
event = PaymentProcessed.new(
  order_id: "123",
  timestamp: Time.now,
  logical_clock: last_clock + 1
)
Enter fullscreen mode Exit fullscreen mode

2. Debugging a Stale Read Model

Step 1: Check the Event Horizon

# How far behind is the projection?
last_processed_event = ReadModelStatus.last_event_for("orders")
latest_event = EventStore.max_event_id
lag = latest_event - last_processed_event # Alert if > 100
Enter fullscreen mode Exit fullscreen mode

Step 2: Rebuild Suspicious Projections

# Force a rebuild
OrderProjection.rebuild!(order_id)
Enter fullscreen mode Exit fullscreen mode

Step 3: Trace the Data Flow

1. Find the command (e.g., `CancelOrderCommand`)
2. Locate its emitted event (`OrderCancelled`)
3. Follow the projection handler (`OrderCancelledHandler`)
Enter fullscreen mode Exit fullscreen mode

3. Prevention Strategies

Set consistency SLAs (e.g., "Reads lag ≤1s")
Add synthetic tests (e.g., "Write then immediately read")
Use idempotent handlers (retries shouldn’t explode)


"But Our Reads Are Fast Enough!"

Until they’re not. Start small:

  1. Add lag metrics to one critical read model
  2. Implement one retry handler
  3. Run a chaos test (kill a projection worker)

Hit a CQRS landmine? Share your war story below.

Top comments (0)