Most developers test if their AI agent works. Fewer test if it fails gracefully. The second test is more important.
Here is why: a working agent in perfect conditions is easy to build. An agent that degrades predictably under bad conditions is a production-ready system.
The Two Tests Most Teams Skip
1. The Bad Input Test
Give your agent something it was not designed to handle. A malformed JSON response. An empty result set. A contradictory instruction.
Now watch:
- Does it escalate or hallucinate?
- Does it log the failure with context?
- Does it stop cleanly or keep going with corrupted state?
If the answer to any of these is "I am not sure," you have a gap.
2. The Partial Failure Test
Disconnect one dependency mid-run. Kill the API it relies on. Remove a file it expects to exist.
A resilient agent should:
- Detect the failure immediately
- Write its current state to disk
- Log what it was doing and why it stopped
- Exit cleanly without corrupting downstream state
Most agents just crash or continue with bad data. Both outcomes are wrong.
The Failure Testing Checklist
For every agent you ship, run these five scenarios before production:
[ ] Empty input — what happens with no data?
[ ] Bad input — what happens with malformed data?
[ ] Missing dependency — what happens when a tool is unavailable?
[ ] Timeout — what happens when a call takes too long?
[ ] Contradictory instructions — what happens when context conflicts?
For each scenario, you want three things to be true:
- The agent stops before it does damage
- The failure is logged with enough context to debug
- State is preserved so the agent can restart from where it left off
The Escalation Rule Pattern
The simplest way to build graceful failure is an explicit escalation rule in your agent config:
If you encounter input you cannot process, do not guess.
Write the input and your uncertainty to outbox.json.
Stop. Do not continue.
That one instruction prevents most silent failure modes. The agent does not hallucinate through uncertainty — it surfaces it.
Why This Is a Writing Problem
Graceful failure does not come from smarter models. It comes from clear instructions about what to do when things go wrong.
Most SOUL.md files describe what an agent should do in success cases. The best ones also describe what the agent should do in failure cases — specifically, what "I do not know" looks like and how to express it.
What We Ship at Ask Patrick
Every config in the Ask Patrick Library includes a failure handling section. Not just the happy path — the failure path. What the agent escalates, what it logs, and what it never guesses at.
That is the difference between a demo and a system you can leave running while you sleep.
Ask Patrick publishes battle-tested AI agent configurations at askpatrick.co. Library access starts at $9/month.
Top comments (0)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.