DEV Community

Assili Salim
Assili Salim

Posted on

The AI-Agent Call You Should Block Before It Happens

The most expensive AI-agent call is not always the biggest one.

Sometimes it is the next one.

The one after the agent already failed.

The one after it retried the same operation.

The one after it called tools without progress.

The one after the prompt changed slightly but the task did not move forward.

That provider call may look valid by itself.

The model name may be correct.

The prompt may be well-formed.

The provider may return a normal response.

But inside the whole agent run, the call should not have happened.

That is why AI-agent cost control needs to happen before provider API calls execute.

Agents are loops

A basic LLM request is simple.

Input goes in.

Output comes back.

You can measure the cost afterward.

Agents are different.

An agent may:

call a model
call a tool
retry
modify the prompt
add more context
switch strategy
call another model
retry again

That loop is where cost failures appear.

One step can look reasonable.

Ten steps can look suspicious.

Forty steps can become a budget problem.

The runtime needs to understand the run, not just the individual call.

The boring failure modes

Most AI-agent cost failures are not dramatic.

They are usually boring.

Examples:

retry storms
similar prompt loops
max-step explosions
unknown model pricing
no-progress runs
budget overruns

These are not only billing issues.

They are runtime-control issues.

A billing dashboard can tell you that the failure happened.

A runtime guard can stop the next call before it happens.

That distinction matters.

A provider success can still be a runtime failure

This is easy to miss.

A provider API call can succeed and still be the wrong decision.

The API can return 200.

The model can generate text.

The tool can execute.

The trace can look clean.

But the agent may still be stuck.

This looks simple.

But there are missing questions.

How many steps are allowed?

How much budget can this run spend?

Are the prompts becoming too similar?

Is the agent making progress?

Is the model price known?

Should the next call be blocked?

The exact API is not the point.

The placement is the point.

The check happens before the provider call.

What should be checked before the call?

A useful runtime guard can check simple things.

Is the model price known?

If the runtime does not know the price of a model, guessing is risky.

A typo, alias, fallback, or provider change can break cost assumptions.

A safer default is to fail closed.

Has the run exceeded its budget?

A task-level budget is different from a monthly invoice.

It asks:

How much is this specific run allowed to spend?

Once that limit is reached, the next provider call should not happen.

Has the agent exceeded max steps?

Max-step protection is basic but important.

An agent that cannot finish within a reasonable number of steps may be stuck.

Letting it run forever is not intelligence.

It is missing control.

Is the prompt too similar?

A similar prompt loop is when the agent keeps asking nearly the same thing with small changes.

This can burn tokens without producing new information.

The runtime should detect that pattern.

Is the agent making progress?

No-progress runs are expensive because they look active.

The agent is doing things.

But the task is not moving forward.

A guard should be able to stop this before the loop becomes waste.

Where AI CostGuard fits

This is the layer I am building with AI CostGuard.

AI CostGuard is a local-first TypeScript / Node.js runtime safety layer for AI agents.

It is designed to catch cost and loop failures before provider API calls execute.

It currently focuses on:

retry storm detection
similar prompt loop detection
unknown model pricing blocks
max-step protection
budget guards
middleware and wrappers
structured errors

It is not a billing ledger.

It is not a hard security boundary.

It is not an enterprise firewall.

It is a pre-call runtime kill switch for AI-agent cost and loop failures.

The package is public as @salimassili/ai-costguard.

The takeaway

Cheaper tokens help.

Better routing helps.

Caching helps.

Dashboards help.

But none of them replace runtime limits.

For AI agents, the important question is not only:

How much did this model call cost?

It is:

Should this next provider call be allowed?

That question should be answered before execution.
https://github.com/salimassili62-afk/ai-costguard

Top comments (0)