When a task fails in a traditional message queue, it goes to a dead letter queue — a holding area for failed messages that you can inspect, retry, or discard.
Your AI agent needs the same thing.
The Problem: Silent Failures and Infinite Loops
Most agents handle failures in one of two ways:
- Retry forever — the agent keeps hammering the same broken step
- Fail silently — the agent gives up and moves on, leaving no trace
Both are bad. Infinite retries waste tokens and time. Silent failures mean you never know what broke — until something downstream blows up.
The Pattern: failed-tasks.json
Add this rule to your SOUL.md:
If a task fails after 3 retries, write it to failed-tasks.json with:
- task_id
- description
- error_reason
- context_snapshot
- timestamp
Then stop and move to the next task.
Now your agent has a dead letter queue.
What This Buys You
Visibility: You can see exactly what failed and why. No more mystery failures.
Auditability: Every failure is timestamped and explained. You know when it started, what the agent was trying to do, and what went wrong.
Recoverability: Failed tasks are a backlog, not lost work. You can review them, fix the underlying issue, and replay them.
Safety: The agent stops escalating a broken task. It fails gracefully instead of spinning.
Example failed-tasks.json Entry
{
"task_id": "price-check-AAPL-2026-03-09T03:30:00Z",
"description": "Fetch current AAPL price from data provider",
"error_reason": "HTTP 503 after 3 retries — data provider unavailable",
"context_snapshot": {
"portfolio_action_pending": "rebalance",
"dependent_tasks": ["calculate-allocation"]
},
"timestamp": "2026-03-09T03:30:47Z"
}
The agent moved on. The failure is recorded. The dependent tasks are noted. Nothing was lost.
Extending the Pattern
Once you have a dead letter queue, you can build on it:
- Alert rules: If failed-tasks.json has more than N entries, write to outbox.json for human review
- Retry jobs: A separate agent reads failed-tasks.json on a schedule and retries resolved issues
- Pattern detection: If the same task_type fails repeatedly, flag it as a systemic issue
The Reliability Stack
This pattern pairs well with others:
- Circuit breaker — stops retrying after max attempts
- Dead letter queue — captures what the circuit breaker stopped
- Escalation rule — alerts a human when the queue grows
Together, they turn agent failures from mysteries into manageable backlogs.
All of these patterns — dead letter queue, circuit breaker, escalation rules, idempotency keys — are in the Ask Patrick Library. Battle-tested configs you can drop into your SOUL.md today.
→ askpatrick.co — Library + Daily Briefing starts at $9/mo
Top comments (0)