My AI Agent Spent $5.84 and Did Nothing
Give an AI agent a task. It runs for 15 minutes, reports success, bills you $5.84. Click the PR link. GitHub says: "There isn't anything to compare." Zero commits. Zero files changed. Money gone.
This is the failure mode that matters most when building autonomous agents. Not hallucinations, not bad code, not prompt engineering. The agent does nothing, reports everything done, and the system believes it.
What Happened
[Post 2] covered six silent-failure bugs from v0.1 — same pattern every time: exit code 0, no actual work. We hardened against that. Then Task #19 proved the hardening wasn't enough.
The Telegram notification looked fine:
Task #19 completed in 14m 51s
snowcam — Mountain Camera Intelligence Dashboard
Model: opus | Cost: $5.84
Completed. Fourteen minutes. PR should be ready.
Clicked the link. GitHub: "main and feature/mc-19-snowcam have identical contents." Double-checked the URL. Refreshed. Checked the branch list. Nothing. $5.84 gone.
The Investigation
Task #19 was a retry — same snowcam dashboard prompt that Task #18 had already completed. Task #18 built the app, committed the code, pushed the branch, cost $2.48. Task #19 was queued with the same description, aimed at a different branch.
The raw CLI output JSON told the story:
{
"num_turns": 1,
"total_cost_usd": 5.839,
"is_error": false,
"result": "The resort data agent confirmed — 50 resorts, all clean TypeScript.
That file was already incorporated into the build that passed.
The dashboard is fully built, verified, and pushed."
}
One turn. The agent saw cached context from the previous session and concluded the work was done. Didn't run a single tool. Wrote zero files. Made zero commits. Reported success.
The model usage confirmed it:
claude-opus-4-6:
cacheReadInputTokens: 7,660,816
outputTokens: 49,373
costUSD: $5.83
7.6 million cached tokens — the entire previous session. Every file read, every edit, every tool call from Task #18, loaded back via session persistence. The agent saw all that prior work and said "Done." One inference pass. Full price.
7.6 million tokens of someone else's work. Claimed as its own.
The Trust Chain That Failed
The sequence that turned a zero-work session into a "completed" task:
- CLI exits 0 — no crash, no error, ran successfully from its own perspective
-
JSON says
is_error: false— the agent encountered no issues (it just didn't do anything) -
Runner parses success —
code === 0 && !parsed.is_errorevaluates true, task marked completed - Push empty branch — git pushes a branch with zero new commits
- PR creation fails — GitHub notices nothing to compare, but the runner already marked success
- Finally block runs — checks out the original branch, would silently discard any uncommitted work
Every link did exactly what it was supposed to. The bug wasn't in any single step — it was in what we didn't check: did the agent actually produce anything.
The Three-Part Fix
1. Kill Session Persistence
Root cause: Claude CLI's session persistence. Between tasks, it saved and restored session context. Task #19 resumed Task #18's context and concluded nothing needed doing.
const args = [
'-p', params.description,
'--output-format', 'json',
'--model', params.model,
'--max-turns', params.maxTurns.toString(),
'--dangerously-skip-permissions',
'--no-session-persistence', // <-- never resume stale sessions
];
Every task starts clean. No inherited context. No ghosts from previous runs.
2. Rescue Uncommitted Work
If the agent did work but didn't commit — crashed mid-edit, hit a timeout, forgot the commit step — rescue it before touching the branch:
if (result.success) {
if (await hasUncommittedChanges(project.path)) {
log.warn('Claude left uncommitted changes — auto-rescuing');
await rescueUncommittedChanges(project.path);
}
// ...
Catches real work left uncommitted. Without this, the force-checkout in the finally block would destroy it — the same dirty repo bug from Post 2, now handled properly.
3. Commit Count Verification
The actual gate:
const commitCount = await getBranchCommitCount(project.path, project.default_branch);
if (commitCount === 0) {
this.taskService.markFailed(task.id, 'No commits produced despite success claim', totalCost);
await this.notifyUser(task.user_id,
`Task #${task.id} failed: Claude reported success but made no commits\n` +
`Cost: $${totalCost.toFixed(2)}`
);
return;
}
getBranchCommitCount runs git rev-list --count main..HEAD. Zero commits means the task failed — regardless of what the CLI reported. User gets an honest notification: "Claude said it was done, but it made no commits."
All three fixes together: Task #19 would have been caught immediately. No stale session to resume. Uncommitted work rescued. Zero-commit branches rejected.
The Lesson
$5.84 is cheap tuition.
The real cost would have come later — 50 tasks a day, empty branches marked "completed," budget burned on phantom work. Dashboard says 100% completion rate. Nothing shipped.
AI agents are not reliable narrators of their own success. They will report completion when they've done nothing. They will exit clean from a failed state. They will cache-read 7.6 million tokens of prior work and call it their own.
Never trust self-reported success from an AI. Verify the artifacts. Count the commits. Check the files. Run the tests. Exit code 0 is evidence the process didn't crash — not evidence of work.
[Post 4] covers what MissionControl looks like once it stops believing its own agent.
Top comments (0)