Beni

Posted on Mar 24

My AI Agent Spent $5.84 and Did Nothing

#ai #agents #typescript #devops

My AI Agent Spent $5.84 and Did Nothing

Give an AI agent a task. It runs for 15 minutes, reports success, bills you $5.84. Click the PR link. GitHub says: "There isn't anything to compare." Zero commits. Zero files changed. Money gone.

This is the failure mode that matters most when building autonomous agents. Not hallucinations, not bad code, not prompt engineering. The agent does nothing, reports everything done, and the system believes it.

What Happened

[Post 2] covered six silent-failure bugs from v0.1 — same pattern every time: exit code 0, no actual work. We hardened against that. Then Task #19 proved the hardening wasn't enough.

The Telegram notification looked fine:

Task #19 completed in 14m 51s
snowcam — Mountain Camera Intelligence Dashboard
Model: opus | Cost: $5.84

Completed. Fourteen minutes. PR should be ready.

Clicked the link. GitHub: "main and feature/mc-19-snowcam have identical contents." Double-checked the URL. Refreshed. Checked the branch list. Nothing. $5.84 gone.

The Investigation

Task #19 was a retry — same snowcam dashboard prompt that Task #18 had already completed. Task #18 built the app, committed the code, pushed the branch, cost $2.48. Task #19 was queued with the same description, aimed at a different branch.

The raw CLI output JSON told the story:

{
  "num_turns": 1,
  "total_cost_usd": 5.839,
  "is_error": false,
  "result": "The resort data agent confirmed — 50 resorts, all clean TypeScript.
             That file was already incorporated into the build that passed.
             The dashboard is fully built, verified, and pushed."
}

One turn. The agent saw cached context from the previous session and concluded the work was done. Didn't run a single tool. Wrote zero files. Made zero commits. Reported success.

The model usage confirmed it:

claude-opus-4-6:
  cacheReadInputTokens: 7,660,816
  outputTokens: 49,373
  costUSD: $5.83

7.6 million cached tokens — the entire previous session. Every file read, every edit, every tool call from Task #18, loaded back via session persistence. The agent saw all that prior work and said "Done." One inference pass. Full price.

7.6 million tokens of someone else's work. Claimed as its own.

The Trust Chain That Failed

The sequence that turned a zero-work session into a "completed" task:

CLI exits 0 — no crash, no error, ran successfully from its own perspective
JSON says is_error: false — the agent encountered no issues (it just didn't do anything)
Runner parses success — code === 0 && !parsed.is_error evaluates true, task marked completed
Push empty branch — git pushes a branch with zero new commits
PR creation fails — GitHub notices nothing to compare, but the runner already marked success
Finally block runs — checks out the original branch, would silently discard any uncommitted work

Every link did exactly what it was supposed to. The bug wasn't in any single step — it was in what we didn't check: did the agent actually produce anything.

The Three-Part Fix

1. Kill Session Persistence

Root cause: Claude CLI's session persistence. Between tasks, it saved and restored session context. Task #19 resumed Task #18's context and concluded nothing needed doing.

const args = [
  '-p', params.description,
  '--output-format', 'json',
  '--model', params.model,
  '--max-turns', params.maxTurns.toString(),
  '--dangerously-skip-permissions',
  '--no-session-persistence',  // <-- never resume stale sessions
];

Every task starts clean. No inherited context. No ghosts from previous runs.

2. Rescue Uncommitted Work

If the agent did work but didn't commit — crashed mid-edit, hit a timeout, forgot the commit step — rescue it before touching the branch:

if (result.success) {
  if (await hasUncommittedChanges(project.path)) {
    log.warn('Claude left uncommitted changes — auto-rescuing');
    await rescueUncommittedChanges(project.path);
  }
  // ...

Catches real work left uncommitted. Without this, the force-checkout in the finally block would destroy it — the same dirty repo bug from Post 2, now handled properly.

3. Commit Count Verification

The actual gate:

const commitCount = await getBranchCommitCount(project.path, project.default_branch);
if (commitCount === 0) {
  this.taskService.markFailed(task.id, 'No commits produced despite success claim', totalCost);
  await this.notifyUser(task.user_id,
    `Task #${task.id} failed: Claude reported success but made no commits\n` +
    `Cost: $${totalCost.toFixed(2)}`
  );
  return;
}

getBranchCommitCount runs git rev-list --count main..HEAD. Zero commits means the task failed — regardless of what the CLI reported. User gets an honest notification: "Claude said it was done, but it made no commits."

All three fixes together: Task #19 would have been caught immediately. No stale session to resume. Uncommitted work rescued. Zero-commit branches rejected.

The Lesson

$5.84 is cheap tuition.

The real cost would have come later — 50 tasks a day, empty branches marked "completed," budget burned on phantom work. Dashboard says 100% completion rate. Nothing shipped.

AI agents are not reliable narrators of their own success. They will report completion when they've done nothing. They will exit clean from a failed state. They will cache-read 7.6 million tokens of prior work and call it their own.

Never trust self-reported success from an AI. Verify the artifacts. Count the commits. Check the files. Run the tests. Exit code 0 is evidence the process didn't crash — not evidence of work.

[Post 4] covers what MissionControl looks like once it stops believing its own agent.

DEV Community

My AI Agent Spent $5.84 and Did Nothing

My AI Agent Spent $5.84 and Did Nothing

What Happened

The Investigation

The Trust Chain That Failed

The Three-Part Fix

1. Kill Session Persistence

2. Rescue Uncommitted Work

3. Commit Count Verification

The Lesson

Top comments (0)