Assili Salim

Posted on Jun 27

If AI Agents Run in Parallel, Budget Checks Need to Happen Before Every Provider Call

#ai #javascript #api #agents

OpenAI’s latest Codex usage data shows a clear shift from short assistant interactions to longer, delegated agent work.

By May 2026, 80.6% of sampled individual Codex users had made at least one request estimated to exceed 30 minutes of human work. 70.2% had made one estimated to exceed one hour.

The more interesting detail:

By June 2026, the 99th percentile of daily active OpenAI users regularly generated more than 60 hours of Codex agent turns per day, distributed across multiple parallel agents.

That is the engineering lesson.

A parallel agent workflow is not a prompt.

It is a runtime system.

Runtime systems need budgets.

The naive version

A simple agent loop often looks like this:

while (!task.done) {
const result = await provider.call({
model: task.model,
messages: task.messages,
});

task = await applyAgentStep(task, result);
}

This is easy to write.

It is also missing the controls that matter in production.

No max-step limit.

No budget check.

No retry-storm detection.

No prompt-loop detection.

No unknown-pricing block.

No no-progress stop.

Now imagine running many of these in parallel.

await Promise.all(tasks.map(runAgent));

This is where the failure mode changes.

One agent overspending is visible.

Ten agents each overspending slightly can look like normal usage until the bill or queue pressure shows up.

The better shape

Before every provider call, the runtime should make a decision.

const decision = guard.beforeCall({
runId: task.id,
workflowId: task.workflowId,
model: task.model,
messages: task.messages,
stepCount: task.steps.length,
retryCount: task.retryCount,
budgetRemaining: task.budgetRemaining,
sharedBudgetRemaining: workflow.budgetRemaining,
previousPrompts: task.previousPrompts,
progressState: task.progress,
});

if (!decision.allowed) {
return {
status: "stopped",
reason: decision.reason,
error: decision.error,
};
}

const result = await provider.call({
model: task.model,
messages: task.messages,
});

The API shape does not matter.

The placement matters.

The guard runs before the provider call.

That gives the runtime a chance to stop the next unit of spend before it exists.

What should be checked?
Known model pricing

If the runtime does not know the model price, it cannot enforce a reliable budget.

if (!pricingCatalog.has(model)) {
return {
allowed: false,
reason: "unknown_model_pricing",
};
}

Do not guess.

Fail closed.

Per-run budget

Each agent run needs its own budget.

if (estimatedNextCallCost > runBudgetRemaining) {
return {
allowed: false,
reason: "run_budget_exceeded",
};
}

This stops one confused task from consuming more than it should.

Shared workflow budget

Parallel agents also need a shared ceiling.

if (estimatedNextCallCost > workflowBudgetRemaining) {
return {
allowed: false,
reason: "workflow_budget_exceeded",
};
}

This matters because parallel waste is harder to notice.

Each worker may look reasonable locally while the workflow burns too much globally.

Max-step limit

Agents should not run forever.

if (stepCount >= maxSteps) {
return {
allowed: false,
reason: "max_steps_exceeded",
};
}

Simple controls are often the most valuable.

Retry-storm detection

Retries are useful until they become the workload.

if (retryCount > maxRetries && recentErrorsAreSimilar(errors)) {
return {
allowed: false,
reason: "retry_storm_detected",
};
}

The goal is not to ban retries.

The goal is to prevent blind retries.

Prompt-loop detection

If the current prompt is too similar to earlier failed prompts, the agent may be stuck.

if (similarToRecentPrompt(currentPrompt, previousPrompts)) {
return {
allowed: false,
reason: "similar_prompt_loop",
};
}

This catches a common agent failure:

the system looks active, but it is asking the same question again.

No-progress detection

A run can consume steps without improving the outcome.

Track progress signals:

tests passing
errors decreasing
files changing meaningfully
plan items completing
user-defined success criteria improving

If nothing improves after several steps, stop.

Why this matters

OpenAI’s post says agents change the unit of knowledge work from single interactions to delegated, long-horizon tasks. Agents can operate independently for minutes or hours while using tools, interacting with environments, and iterating toward solutions.

That is exactly why runtime control matters.

A chatbot can fail and wait for the next user message.

An agent can fail and continue.

That continuation is the risk.

Where AI CostGuard fits

AI CostGuard is the local-first TypeScript / Node.js runtime safety layer I’m building for this problem.

It is designed to stop agent failure modes before provider calls execute:

retry storms
prompt loops
max-step explosions
no-progress runs
budget overruns
unknown model pricing
runaway agent behavior

The core question is:

Should this next provider call be allowed?

If no, the runtime should stop with a structured reason.

Not after the invoice.

Before the call.

When agents run for minutes or hours, cost control becomes runtime control.

When agents run in parallel, cost control becomes coordination.

Start with one practical rule:

Never call the provider before asking whether the next call is still allowed.

Add a pre-call decision object to your agent loop before adding another dashboard.
https://github.com/salimassili62-afk/ai-costguard

Top comments (6)

Nazar Boyko • Jun 27

Where this gets slippery is the moment two workers check the shared budget at the same instant. Both read "enough left," both pass the guard, and both fire, so the shared ceiling gets blown by exactly the parallelism the guard was meant to protect. The per-run budget is fine because only one worker ever touches it, but the shared one really wants the worker to subtract its estimated cost before the call and reconcile the real number after, otherwise checking and then calling leaves a gap the other workers can slip through. Curious how you handle that part in CostGuard.

Alex Shev • Jun 27

Parallel agents change the cost-control problem from accounting to admission control. A budget check after the run is just a receipt. The useful gate is before every provider call, with shared state across parallel workers, so one runaway branch cannot spend the budget while the others are still waiting.

Frank • Jun 27

This is a great point about pre-call budget

Some comments may only be visible to logged-in visitors. Sign in to view all comments.