A pattern keeps showing up in production agent systems:
the agent is cross-cutting, but the controls usually are not.
One agent run can touch:
- multiple model providers
- multiple tools
- multiple tenants
- multiple workers
But the controls teams usually start with still live inside one slice:
- provider spending caps
- observability tools
- framework loop limits
- some Redis counter somebody wrote in an afternoon
Those controls are useful. They solve real problems.
They just do not answer the actual runtime question:
Can this agent, for this customer, on this worker, take the next action right now?
That is not a missing feature in one product.
It is a layer problem.
The shape of the problem
Take a normal SaaS setup.
You have an agent feature serving many customers. A single run might use:
- OpenAI for reasoning
- Anthropic for long context
- a cheaper model for embeddings
- a search API
- a CRM client
- an outbound mailer
- a payment API
- a vector store
- a web fetcher
And it does not all happen in one process. It runs on stateless workers, with retries, fan-out, and concurrent jobs hitting shared budgets.
Now ask a simple question:
Where do we put the budget cap?
That is where things get messy.
Why the obvious controls fall short
1. Provider spending caps
Provider caps are useful safety nets.
They can stop a catastrophic monthly bill on that provider.
But they only see their own billing surface.
An OpenAI cap does not know:
- what you spent on Anthropic
- what the search API billed
- which customer the run belonged to
- whether this is one runaway workflow or normal traffic across fifty tenants
Provider caps govern the provider’s exposure to your account.
They do not govern your application’s full runtime surface.
2. Observability tools
Observability tools are also useful.
They help you answer:
- what happened
- where cost went
- which step was slow
- which tool was called
- where the run failed
But they answer those questions after the action already happened.
That is the constraint.
A trace can tell you an agent burned through budget over the weekend. It cannot stop the bad call at iteration 50 when the damage was still small.
Alerts reduce reaction time.
They do not create a pre-execution decision point.
3. Framework limits
Frameworks often provide things like:
- max iterations
- max execution time
- step limits
- tool-call ceilings
Those are sensible defaults.
But they are still local.
They usually apply to one orchestrator instance. Not to:
- multiple workers
- multiple retries
- multiple runs hitting the same budget
- multi-tenant shared infrastructure
They also speak in loop counts, not in money or blast radius.
A loop limit does not know whether the next step costs $0.001 or $4.00.
It does not know whether the action is a harmless read or a customer-facing mutation.
4. DIY counters
This is the classic path.
A team writes a Redis-backed counter. Increment per call. Check before request. Done.
It works for a sprint.
Then reality shows up:
- TOCTOU races under concurrency
- drift across providers
- retries that double-count or under-count
- counters lost on worker crashes
- awkward logic around estimate vs actual
- tenant isolation problems
- a growing pile of exceptions and patches
What started as “just a counter” turns into a distributed transactional system.
That is usually the moment people realize they are not building a helper anymore. They are building a control plane.
The structural mismatch
All of those tools share the same limit.
They govern themselves, not the full agent.
That is the mismatch.
| Control | What it governs |
|---|---|
| Provider cap | One provider’s billing surface |
| Observability | A read path into events and traces |
| Framework limit | One orchestrator instance |
| DIY counter | One local or semi-shared budget view |
An agent, on the other hand, spans all of them.
So the control layer has to span the same surface the agent spans.
Anything inside only one dimension is a partial view.
And a partial view is not governance.
What the missing layer actually needs
Once you accept that the control layer has to live outside any one provider, tool, framework, or worker, the requirements become pretty clear.
External authority
The decision point has to live outside any one runtime slice.
Otherwise it will always be blind to the rest of the system.
Atomic distributed reservations
Two workers cannot both think the same remaining budget is available.
The concurrency problem has to be solved at the control layer itself, not patched afterward.
Hierarchical scope
The same primitive should work across:
- tenant
- workspace
- app
- workflow
- agent
- toolset
That is how you answer both:
- how much can this customer spend?
- how much can this single run spend?
Reserve, commit, release
You need to hold budget before the action runs, then settle with the actual amount after.
Otherwise you end up with one of two bad choices:
- optimistic execution with bad enforcement
- conservative blocking with permanently stranded budget
More than binary allow/deny
Real systems need graceful degradation.
Sometimes the right answer is not just ALLOW or DENY.
Sometimes it is:
- allow, but cap tools
- allow, but downgrade the model
- allow, but reduce context
- allow, but disable optional steps
That requires a real decision layer, not just a counter.
Provider- and framework-agnostic design
The control primitive cannot care whether the next action is:
- an OpenAI call
- an Anthropic call
- a Stripe charge
- an outbound email
- a search query
- a database write
If the agent spans all of them, governance has to as well.
This is a layer, not a feature
It is tempting to think the gap closes with one more feature release.
Maybe observability tools add enforcement.
Maybe providers add richer caps.
Maybe frameworks add distributed counters.
Possible? Sure.
But that would turn each of those tools into a different category of system.
This is why I think the missing piece is not another feature inside one runtime component.
It is an external, cross-cutting authority layer.
Keep the provider cap.
Keep the observability stack.
Keep the framework guardrails.
Keep the local circuit breakers.
But add the layer that answers the one question none of those can answer on their own:
May this agent, for this customer, on this worker, take the next action right now?
That is the control question that actually matters in production.
And it is why agent governance has to be cross-cutting.
Top comments (0)