Albert Mavashev

Posted on Apr 12 • Originally published at runcycles.io

AI Agents Are Cross-Cutting. Your Controls Aren't.

#ai #devops #architecture #opensource

A pattern keeps showing up in production agent systems:

the agent is cross-cutting, but the controls usually are not.

One agent run can touch:

multiple model providers
multiple tools
multiple tenants
multiple workers

But the controls teams usually start with still live inside one slice:

provider spending caps
observability tools
framework loop limits
some Redis counter somebody wrote in an afternoon

Those controls are useful. They solve real problems.

They just do not answer the actual runtime question:

Can this agent, for this customer, on this worker, take the next action right now?

That is not a missing feature in one product.

It is a layer problem.

The shape of the problem

Take a normal SaaS setup.

You have an agent feature serving many customers. A single run might use:

OpenAI for reasoning
Anthropic for long context
a cheaper model for embeddings
a search API
a CRM client
an outbound mailer
a payment API
a vector store
a web fetcher

And it does not all happen in one process. It runs on stateless workers, with retries, fan-out, and concurrent jobs hitting shared budgets.

Now ask a simple question:

Where do we put the budget cap?

That is where things get messy.

Why the obvious controls fall short

1. Provider spending caps

Provider caps are useful safety nets.

They can stop a catastrophic monthly bill on that provider.

But they only see their own billing surface.

An OpenAI cap does not know:

what you spent on Anthropic
what the search API billed
which customer the run belonged to
whether this is one runaway workflow or normal traffic across fifty tenants

Provider caps govern the provider’s exposure to your account.

They do not govern your application’s full runtime surface.

2. Observability tools

Observability tools are also useful.

They help you answer:

what happened
where cost went
which step was slow
which tool was called
where the run failed

But they answer those questions after the action already happened.

That is the constraint.

A trace can tell you an agent burned through budget over the weekend. It cannot stop the bad call at iteration 50 when the damage was still small.

Alerts reduce reaction time.

They do not create a pre-execution decision point.

3. Framework limits

Frameworks often provide things like:

max iterations
max execution time
step limits
tool-call ceilings

Those are sensible defaults.

But they are still local.

They usually apply to one orchestrator instance. Not to:

multiple workers
multiple retries
multiple runs hitting the same budget
multi-tenant shared infrastructure

They also speak in loop counts, not in money or blast radius.

A loop limit does not know whether the next step costs $0.001 or $4.00.
It does not know whether the action is a harmless read or a customer-facing mutation.

4. DIY counters

This is the classic path.

A team writes a Redis-backed counter. Increment per call. Check before request. Done.

It works for a sprint.

Then reality shows up:

TOCTOU races under concurrency
drift across providers
retries that double-count or under-count
counters lost on worker crashes
awkward logic around estimate vs actual
tenant isolation problems
a growing pile of exceptions and patches

What started as “just a counter” turns into a distributed transactional system.

That is usually the moment people realize they are not building a helper anymore. They are building a control plane.

The structural mismatch

All of those tools share the same limit.

They govern themselves, not the full agent.

That is the mismatch.

Control	What it governs
Provider cap	One provider’s billing surface
Observability	A read path into events and traces
Framework limit	One orchestrator instance
DIY counter	One local or semi-shared budget view

An agent, on the other hand, spans all of them.

So the control layer has to span the same surface the agent spans.

Anything inside only one dimension is a partial view.

And a partial view is not governance.

What the missing layer actually needs

Once you accept that the control layer has to live outside any one provider, tool, framework, or worker, the requirements become pretty clear.

External authority

The decision point has to live outside any one runtime slice.

Otherwise it will always be blind to the rest of the system.

Atomic distributed reservations

Two workers cannot both think the same remaining budget is available.

The concurrency problem has to be solved at the control layer itself, not patched afterward.

Hierarchical scope

The same primitive should work across:

tenant
workspace
app
workflow
agent
toolset

That is how you answer both:

how much can this customer spend?
how much can this single run spend?

Reserve, commit, release

You need to hold budget before the action runs, then settle with the actual amount after.

Otherwise you end up with one of two bad choices:

optimistic execution with bad enforcement
conservative blocking with permanently stranded budget

More than binary allow/deny

Real systems need graceful degradation.

Sometimes the right answer is not just ALLOW or DENY.

Sometimes it is:

allow, but cap tools
allow, but downgrade the model
allow, but reduce context
allow, but disable optional steps

That requires a real decision layer, not just a counter.

Provider- and framework-agnostic design

The control primitive cannot care whether the next action is:

an OpenAI call
an Anthropic call
a Stripe charge
an outbound email
a search query
a database write

If the agent spans all of them, governance has to as well.

This is a layer, not a feature

It is tempting to think the gap closes with one more feature release.

Maybe observability tools add enforcement.
Maybe providers add richer caps.
Maybe frameworks add distributed counters.

Possible? Sure.

But that would turn each of those tools into a different category of system.

This is why I think the missing piece is not another feature inside one runtime component.

It is an external, cross-cutting authority layer.

Keep the provider cap.
Keep the observability stack.
Keep the framework guardrails.
Keep the local circuit breakers.

But add the layer that answers the one question none of those can answer on their own:

May this agent, for this customer, on this worker, take the next action right now?

That is the control question that actually matters in production.

And it is why agent governance has to be cross-cutting.

DEV Community