OpenAI’s latest Codex usage data shows a clear shift from short assistant interactions to longer, delegated agent work.
By May 2026, 80.6% of sampled individual Codex users had made at least one request estimated to exceed 30 minutes of human work. 70.2% had made one estimated to exceed one hour.
The more interesting detail:
By June 2026, the 99th percentile of daily active OpenAI users regularly generated more than 60 hours of Codex agent turns per day, distributed across multiple parallel agents.
That is the engineering lesson.
A parallel agent workflow is not a prompt.
It is a runtime system.
Runtime systems need budgets.
The naive version
A simple agent loop often looks like this:
while (!task.done) {
const result = await provider.call({
model: task.model,
messages: task.messages,
});
task = await applyAgentStep(task, result);
}
This is easy to write.
It is also missing the controls that matter in production.
No max-step limit.
No budget check.
No retry-storm detection.
No prompt-loop detection.
No unknown-pricing block.
No no-progress stop.
Now imagine running many of these in parallel.
await Promise.all(tasks.map(runAgent));
This is where the failure mode changes.
One agent overspending is visible.
Ten agents each overspending slightly can look like normal usage until the bill or queue pressure shows up.
The better shape
Before every provider call, the runtime should make a decision.
const decision = guard.beforeCall({
runId: task.id,
workflowId: task.workflowId,
model: task.model,
messages: task.messages,
stepCount: task.steps.length,
retryCount: task.retryCount,
budgetRemaining: task.budgetRemaining,
sharedBudgetRemaining: workflow.budgetRemaining,
previousPrompts: task.previousPrompts,
progressState: task.progress,
});
if (!decision.allowed) {
return {
status: "stopped",
reason: decision.reason,
error: decision.error,
};
}
const result = await provider.call({
model: task.model,
messages: task.messages,
});
The API shape does not matter.
The placement matters.
The guard runs before the provider call.
That gives the runtime a chance to stop the next unit of spend before it exists.
What should be checked?
Known model pricing
If the runtime does not know the model price, it cannot enforce a reliable budget.
if (!pricingCatalog.has(model)) {
return {
allowed: false,
reason: "unknown_model_pricing",
};
}
Do not guess.
Fail closed.
Per-run budget
Each agent run needs its own budget.
if (estimatedNextCallCost > runBudgetRemaining) {
return {
allowed: false,
reason: "run_budget_exceeded",
};
}
This stops one confused task from consuming more than it should.
Shared workflow budget
Parallel agents also need a shared ceiling.
if (estimatedNextCallCost > workflowBudgetRemaining) {
return {
allowed: false,
reason: "workflow_budget_exceeded",
};
}
This matters because parallel waste is harder to notice.
Each worker may look reasonable locally while the workflow burns too much globally.
Max-step limit
Agents should not run forever.
if (stepCount >= maxSteps) {
return {
allowed: false,
reason: "max_steps_exceeded",
};
}
Simple controls are often the most valuable.
Retry-storm detection
Retries are useful until they become the workload.
if (retryCount > maxRetries && recentErrorsAreSimilar(errors)) {
return {
allowed: false,
reason: "retry_storm_detected",
};
}
The goal is not to ban retries.
The goal is to prevent blind retries.
Prompt-loop detection
If the current prompt is too similar to earlier failed prompts, the agent may be stuck.
if (similarToRecentPrompt(currentPrompt, previousPrompts)) {
return {
allowed: false,
reason: "similar_prompt_loop",
};
}
This catches a common agent failure:
the system looks active, but it is asking the same question again.
No-progress detection
A run can consume steps without improving the outcome.
Track progress signals:
tests passing
errors decreasing
files changing meaningfully
plan items completing
user-defined success criteria improving
If nothing improves after several steps, stop.
Why this matters
OpenAI’s post says agents change the unit of knowledge work from single interactions to delegated, long-horizon tasks. Agents can operate independently for minutes or hours while using tools, interacting with environments, and iterating toward solutions.
That is exactly why runtime control matters.
A chatbot can fail and wait for the next user message.
An agent can fail and continue.
That continuation is the risk.
Where AI CostGuard fits
AI CostGuard is the local-first TypeScript / Node.js runtime safety layer I’m building for this problem.
It is designed to stop agent failure modes before provider calls execute:
retry storms
prompt loops
max-step explosions
no-progress runs
budget overruns
unknown model pricing
runaway agent behavior
The core question is:
Should this next provider call be allowed?
If no, the runtime should stop with a structured reason.
Not after the invoice.
Before the call.
When agents run for minutes or hours, cost control becomes runtime control.
When agents run in parallel, cost control becomes coordination.
Start with one practical rule:
Never call the provider before asking whether the next call is still allowed.
Add a pre-call decision object to your agent loop before adding another dashboard.
https://github.com/salimassili62-afk/ai-costguard
Top comments (2)
Parallel agents change the cost-control problem from accounting to admission control. A budget check after the run is just a receipt. The useful gate is before every provider call, with shared state across parallel workers, so one runaway branch cannot spend the budget while the others are still waiting.
This is a great point about pre-call budget