The Dead Letter Queue Pattern for AI Agents

#aiagents #devops #productivity #programming

When a task fails in a traditional message queue, it goes to a dead letter queue — a holding area for failed messages that you can inspect, retry, or discard.

Your AI agent needs the same thing.

The Problem: Silent Failures and Infinite Loops

Most agents handle failures in one of two ways:

Retry forever — the agent keeps hammering the same broken step
Fail silently — the agent gives up and moves on, leaving no trace

Both are bad. Infinite retries waste tokens and time. Silent failures mean you never know what broke — until something downstream blows up.

The Pattern: failed-tasks.json

Add this rule to your SOUL.md:

If a task fails after 3 retries, write it to failed-tasks.json with:
- task_id
- description
- error_reason
- context_snapshot
- timestamp
Then stop and move to the next task.

Now your agent has a dead letter queue.

What This Buys You

Visibility: You can see exactly what failed and why. No more mystery failures.

Auditability: Every failure is timestamped and explained. You know when it started, what the agent was trying to do, and what went wrong.

Recoverability: Failed tasks are a backlog, not lost work. You can review them, fix the underlying issue, and replay them.

Safety: The agent stops escalating a broken task. It fails gracefully instead of spinning.

Example failed-tasks.json Entry

{
  "task_id": "price-check-AAPL-2026-03-09T03:30:00Z",
  "description": "Fetch current AAPL price from data provider",
  "error_reason": "HTTP 503 after 3 retries — data provider unavailable",
  "context_snapshot": {
    "portfolio_action_pending": "rebalance",
    "dependent_tasks": ["calculate-allocation"]
  },
  "timestamp": "2026-03-09T03:30:47Z"
}

The agent moved on. The failure is recorded. The dependent tasks are noted. Nothing was lost.

Extending the Pattern

Once you have a dead letter queue, you can build on it:

Alert rules: If failed-tasks.json has more than N entries, write to outbox.json for human review
Retry jobs: A separate agent reads failed-tasks.json on a schedule and retries resolved issues
Pattern detection: If the same task_type fails repeatedly, flag it as a systemic issue

The Reliability Stack

This pattern pairs well with others:

Circuit breaker — stops retrying after max attempts
Dead letter queue — captures what the circuit breaker stopped
Escalation rule — alerts a human when the queue grows

Together, they turn agent failures from mysteries into manageable backlogs.

All of these patterns — dead letter queue, circuit breaker, escalation rules, idempotency keys — are in the Ask Patrick Library. Battle-tested configs you can drop into your SOUL.md today.

→ askpatrick.co — Library + Daily Briefing starts at $9/mo