DEV Community

RAXXO Studios
RAXXO Studios

Posted on • Originally published at raxxo.shop

5 Claude API Errors That Cost Me Money (And How I Trapped Them)

  • Retry storms turned 1 timeout into 340 duplicate calls billed in 90 seconds

  • Infinite tool loop ran 1,200 iterations before I noticed at 2am

  • Partial stream cleanup stopped half-written DB writes corrupting records

  • Trap every error class with a circuit breaker and a hard iteration cap

Five Claude API errors quietly drained my account before I built guards around them. None of them threw a loud crash. They just kept billing while I slept. Here is exactly what broke, what it cost, and the traps I now run on every project.

The Retry Storm That Billed 340 Times in 90 Seconds

The most expensive mistake I made was naive retry logic. A single request timed out. My code caught the timeout and retried. The retry also timed out, so it retried again. Within 90 seconds I had fired 340 requests for one piece of work.

The problem was that the Claude API had actually received and processed several of those requests. The timeout happened on my side waiting for the response, not on Anthropic's side. So I was paying for completed work I never saw, then paying again for the retry.

My first version of the retry looked harmless. A while loop, a counter set to 5, a sleep of one second between attempts. The flaw was that the sleep was constant and the counter reset on every new job. Under load, jobs stacked, and each one spawned its own retry chain. That is how 1 timeout became 340 calls.

The fix was exponential backoff with a hard ceiling and a request ID. I now generate a unique idempotency-style key per logical job and refuse to issue a second call for the same key until the first fully resolves or hard-fails. Backoff starts at 2 seconds and doubles up to 32 seconds, then gives up after 5 total attempts.


attempt = 0
delay = 2
while attempt < 5:
    try:
        return call_claude(job_key)
    except Timeout:
        attempt += 1
        sleep(delay + random_jitter())
        delay = min(delay * 2, 32)
raise GiveUp(job_key)

Enter fullscreen mode Exit fullscreen mode

The jitter matters more than it looks. Without it, ten failed jobs all retry at the exact same second and create a synchronized stampede. Jitter spreads them out so the recovery is smooth instead of another spike.

I also added a daily call counter that hard-stops the whole process if it crosses a threshold I would never legitimately hit. If something goes wrong at 3am, the worst case is now a stopped queue, not a four-figure surprise. If you want the broader workflow context, the Claude Blueprint lays out how I structure these jobs end to end.

The Infinite Tool Loop That Ran 1,200 Iterations

Tool use is where agents earn their keep and also where they burn money fastest. I gave Claude a set of tools: search a file, read a file, write a result. The agent was supposed to call two or three tools then finish.

Instead it got stuck. It would call the search tool, read the result, decide it needed to search again with a slightly different query, read that, and repeat. Each loop is a full round trip with the growing message history attached, so each iteration cost more than the last. I caught it at 2am after 1,200 iterations.

The model was not broken. My prompt left it an escape hatch with no exit condition. When it could not find what it wanted, "search again" was always a valid next move, so it always took it.

I fixed this with three traps stacked together. First, a hard iteration cap. No agent run is allowed more than 12 tool cycles. Hit the cap and the run terminates with a clear failure I can inspect later.

Second, a repeat detector. I hash each tool call's name plus its arguments. If the same hash appears three times in one run, I block it and force the agent to either answer or fail. The "slightly different query" trick still gets caught because near-identical searches usually normalize to the same hash once I strip whitespace and lowercase.

Third, a cost meter per run. Every run carries a running token tally. Cross the ceiling and the run stops mid-flight. I would rather get a partial answer and a flag than a perfect answer that cost 40 times the budget.

The iteration cap alone would have saved me that night. The other two stop subtler loops that stay under the cap but still waste calls. I covered the production version of this pattern in Claude Agent SDK in Production, which goes through the guard layers in more detail.

The Partial Stream That Corrupted My Database

Streaming responses feel great until a stream dies halfway. I was streaming Claude's output and writing chunks to a database as they arrived. It seemed efficient. Write as you go, no waiting.

Then a connection dropped at roughly the 60 percent mark of a long response. My code had already written the first 60 percent into the record. The record now held half a product description that ended mid-sentence. Worse, my downstream publishing job did not know the write was incomplete, so it shipped the broken text live.

The root cause was treating a stream like it was guaranteed to complete. Streams are not transactions. A stream can stop at any byte for any reason: network, timeout, server-side hiccup, my own process getting killed.

The trap is to never commit partial stream output. I now accumulate the entire streamed response into a local cache and only write to the database after the stream signals a clean completion event. If the stream errors before that event, I discard everything collected so far and retry the whole job from scratch.


chunks = []
try:
    for event in stream:
        chunks.append(event.text)
    final = "".join(chunks)
    commit(final)        # only here, after clean finish
except StreamError:
    discard(chunks)      # write nothing
    requeue(job)

Enter fullscreen mode Exit fullscreen mode

The cost of accumulating in memory first is tiny compared to publishing garbage. Even a long response fits comfortably in a few hundred kilobytes, so holding it before commit is free in practice.

I also added a completeness check before any publish step. The text has to end with sentence-final punctuation and pass a minimum length gate for its content type. A description under 200 characters is almost always a truncated stream, so it gets flagged and held rather than shipped. That single check has caught more partial writes than I expected, including ones from totally unrelated failures.

The Malformed tool_use Block That Crashed the Parser

This one did not bill me directly. It cost me time, which on a solo studio is the same thing.

I assumed every tool_use block from Claude would contain valid JSON arguments. Most do. But occasionally, especially with complex nested inputs, the model produces arguments that are not quite parseable, or it references a tool name I had renamed weeks earlier. My parser threw an exception, the whole job died, and because the job died inside a retry wrapper, it triggered the retry storm from section one. Two bugs feeding each other.

The fix was to treat every tool_use block as untrusted input. I wrap the parse in a try block. If the JSON fails, I do not crash. I send a tool_result back to Claude saying the arguments were malformed and asking it to retry the call with valid input. The model corrects itself most of the time within one extra turn.


try:
    args = json.loads(block.input)
except JSONDecodeError:
    return tool_result(
        block.id,
        "Arguments were not valid JSON. Resend with valid JSON.",
        is_error=True
    )

Enter fullscreen mode Exit fullscreen mode

For unknown tool names, I return a similar error result listing the tools that actually exist. This stops the agent from getting stuck on a phantom tool and gives it a path forward instead of a dead end.

The bigger lesson was that error results are not just for my code's failures. They are a conversation channel. Telling the model precisely what went wrong lets it self-correct far better than silently failing and retrying blind. Since adding structured error results, my agent runs recover from bad tool calls without any human in the loop, which matters at 2am when the human is asleep. If you schedule the published output afterward, I run mine through Buffer so the parsing stage and the posting stage stay fully separate.

How I Trap Errors Before They Bill

Across all five incidents the pattern is identical. The error was never the model behaving strangely. It was my code assuming the happy path and having no ceiling when reality disagreed.

So I built four ceilings that run on every project now, no exceptions.

A circuit breaker sits in front of all API calls. After 5 consecutive failures of any kind, it opens and rejects new calls for 60 seconds instead of hammering the API. This single guard would have killed three of the five incidents on its own. It converts a runaway loop into a brief pause and a log line.

A per-job iteration cap of 12 tool cycles and a per-run token budget catch the loops. Anything that exceeds either limit stops cleanly and lands in a dead-letter queue I review by hand. I would rather review 5 stuck jobs in the morning than discover a 1,200-iteration run.

A daily spend tripwire stops the entire worker if total calls cross a number I would never reach in normal operation. It is a blunt instrument and that is the point. When I am not watching, blunt beats clever.

Structured logging ties it together. Every call logs its job key, attempt number, token count, and outcome. When something breaks I can reconstruct the timeline in minutes instead of guessing. I run my whole store on Shopify and the same logging discipline carries over to every integration I bolt onto it.

The mistakes taught me more than any tutorial. Each one became a permanent trap. None of them have fired in anger since.

Bottom Line

Every error here cost me because I trusted the happy path. A timeout is not a one-time event, it is the start of a retry chain. A tool call is not guaranteed to be the last one, it is an invitation to loop. A stream is not a transaction, it can die at any byte. A tool_use block is not trusted input, it is something to validate.

The traps are simple: exponential backoff with jitter, idempotency keys, hard iteration caps, repeat detection, accumulate-then-commit streaming, structured error results, a circuit breaker, and a daily tripwire. None of them are clever. All of them are boring. Boring is what you want at 2am.

If you are building agents on the Claude API, start with the ceilings before you write the features. The full structure I use is in the Claude Blueprint, and the production guard layers go deeper in Claude Agent SDK in Production. Build the trap first. The bug will come.

This article contains affiliate links. If you sign up through them, I may earn a small commission at no extra cost to you. (Ad)

Top comments (0)