An AI agent rarely does one thing at a time. A single turn might call a model, run three tool invocations in parallel, fetch a document, and query a vector store — each with its own latency curve, cost, and failure mode. When one task hangs, the reflexive fix is a timeout. When one fails, the reflexive fix is a retry. Stack both across a dozen concurrent tasks and you get a system that quietly burns tokens on work nobody is waiting for anymore.
The part most agent code gets wrong is ownership: who controls a task's lifecycle once it has started.
Why Promise.race leaks work and money
The most common timeout in TypeScript looks like this:
const result = await Promise.race([
callModel(prompt),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('timeout')), 30_000)),
]);
It looks correct. It is not. Promise.race settles with whichever promise finishes first, but it has no power to stop the others. When the timeout wins, callModel(prompt) is still running. The HTTP request is still open. The provider is still streaming tokens you are still paying for. The promise just has nobody listening.
For one call that is a rounding error. For an agent that fans out several tool calls per turn across hundreds of turns, the leaked work compounds: orphaned model calls, connection-pool exhaustion, and a bill that does not reconcile with your logs.
The leak is invisible in development, where you make one call at a time. It shows up in production as token spend that drifts above the sum of the responses you actually used. Every
Promise.racepaired with asetTimeoutis a candidate leak site — grep for them.
One owner per task
The fix is to give every task a single owner holding three controls: the signal that cancels it, the timer that enforces its deadline, and the catch block that decides retries. AbortController is the primitive that ties them together.
function withDeadline<T>(
work: (signal: AbortSignal) => Promise<T>,
ms: number,
parent?: AbortSignal,
): Promise<T> {
const signal = parent
? AbortSignal.any([parent, AbortSignal.timeout(ms)])
: AbortSignal.timeout(ms);
return work(signal);
}
Two things matter. First, the signal is passed into the work, not wrapped around it. callModel must accept an AbortSignal and forward it to fetch — fetch and every current provider SDK support this. When the deadline fires, the socket closes and the provider stops generating. Second, AbortSignal.any (Node 20+, current browsers) lets a task be cancelled by either its own timeout or its parent. When a user cancels a turn, every in-flight tool call beneath it dies in one propagation instead of running to its own deadline.
That is the single-owner idea: a task is never cancelled by a race against an unrelated promise. It is cancelled by a signal its owner controls and that the task itself listens to.
Make
AbortSignala required parameter on every async function that touches the network. If it is optional, someone eventually omits it, and that function becomes an un-cancellable island the rest of your cancellation logic cannot reach.
Retries and timeouts share one budget
Retries and timeouts are usually written on different days by different people, and they fight. A 30-second per-attempt timeout with three retries is a two-minute worst case — long after the user gave up. The fix is one deadline budget that every retry draws down from, instead of a fresh timeout per attempt.
async function retry<T>(
work: (signal: AbortSignal) => Promise<T>,
opts: { attempts: number; budgetMs: number; parent?: AbortSignal },
): Promise<T> {
const start = Date.now();
for (let i = 1; ; i++) {
const left = opts.budgetMs - (Date.now() - start);
if (left <= 0) throw new Error('deadline exceeded');
try {
return await withDeadline(work, left, opts.parent);
} catch (err) {
if (i >= opts.attempts || !isRetryable(err)) throw err;
const backoff = Math.min(500 * 2 ** i, 8_000);
await sleep(Math.random() * backoff); // full jitter
}
}
}
Three rules this enforces:
- Each attempt's timeout is the remaining budget, so total wall-clock time never exceeds
budgetMs. -
isRetryablemust distinguish causes. Retry on 429, 503, and connection resets. Do not retry on 400 or 401 — a malformed or unauthorized request fails identically every time, and you have tripled latency for nothing. - Backoff uses full jitter (
Math.random() * backoff), not a fixed delay. When a provider rate-limits your whole agent at once, synchronized retries arrive as a thundering herd and get rate-limited again.
One trap is specific to agents: idempotency. Retrying a model call is safe — it has no side effect beyond cost. Retrying a tool call that sends an email, charges a card, or writes a row is not. Tag each tool as idempotent or not, and let only the idempotent ones into the retry path. The rest should fail loudly on the first error rather than repeat a side effect.
Wire these three patterns together and the payoff is structural: a cancelled turn stops all of its work, a slow provider cannot blow your latency budget, and a retry storm never amplifies an outage. None of it requires a framework — AbortController, AbortSignal.any, and a budget counter are enough.
Originally published at pickuma.com. Subscribe to the RSS or follow @pickuma.bsky.social for new reviews.
Top comments (0)