DEV Community

Albert Mavashev
Albert Mavashev

Posted on • Originally published at runcycles.io

AI Agents Are Cross-Cutting. Your Controls Aren't.

A pattern keeps showing up in production agent systems:

the agent is cross-cutting, but the controls usually are not.

One agent run can touch:

  • multiple model providers
  • multiple tools
  • multiple tenants
  • multiple workers

But the controls teams usually start with still live inside one slice:

  • provider spending caps
  • observability tools
  • framework loop limits
  • some Redis counter somebody wrote in an afternoon

Those controls are useful. They solve real problems.

They just do not answer the actual runtime question:

Can this agent, for this customer, on this worker, take the next action right now?

That is not a missing feature in one product.

It is a layer problem.

The shape of the problem

Take a normal SaaS setup.

You have an agent feature serving many customers. A single run might use:

  • OpenAI for reasoning
  • Anthropic for long context
  • a cheaper model for embeddings
  • a search API
  • a CRM client
  • an outbound mailer
  • a payment API
  • a vector store
  • a web fetcher

And it does not all happen in one process. It runs on stateless workers, with retries, fan-out, and concurrent jobs hitting shared budgets.

Now ask a simple question:

Where do we put the budget cap?

That is where things get messy.

Why the obvious controls fall short

1. Provider spending caps

Provider caps are useful safety nets.

They can stop a catastrophic monthly bill on that provider.

But they only see their own billing surface.

An OpenAI cap does not know:

  • what you spent on Anthropic
  • what the search API billed
  • which customer the run belonged to
  • whether this is one runaway workflow or normal traffic across fifty tenants

Provider caps govern the provider’s exposure to your account.

They do not govern your application’s full runtime surface.

2. Observability tools

Observability tools are also useful.

They help you answer:

  • what happened
  • where cost went
  • which step was slow
  • which tool was called
  • where the run failed

But they answer those questions after the action already happened.

That is the constraint.

A trace can tell you an agent burned through budget over the weekend. It cannot stop the bad call at iteration 50 when the damage was still small.

Alerts reduce reaction time.

They do not create a pre-execution decision point.

3. Framework limits

Frameworks often provide things like:

  • max iterations
  • max execution time
  • step limits
  • tool-call ceilings

Those are sensible defaults.

But they are still local.

They usually apply to one orchestrator instance. Not to:

  • multiple workers
  • multiple retries
  • multiple runs hitting the same budget
  • multi-tenant shared infrastructure

They also speak in loop counts, not in money or blast radius.

A loop limit does not know whether the next step costs $0.001 or $4.00.
It does not know whether the action is a harmless read or a customer-facing mutation.

4. DIY counters

This is the classic path.

A team writes a Redis-backed counter. Increment per call. Check before request. Done.

It works for a sprint.

Then reality shows up:

  • TOCTOU races under concurrency
  • drift across providers
  • retries that double-count or under-count
  • counters lost on worker crashes
  • awkward logic around estimate vs actual
  • tenant isolation problems
  • a growing pile of exceptions and patches

What started as “just a counter” turns into a distributed transactional system.

That is usually the moment people realize they are not building a helper anymore. They are building a control plane.

The structural mismatch

All of those tools share the same limit.

They govern themselves, not the full agent.

That is the mismatch.

Control What it governs
Provider cap One provider’s billing surface
Observability A read path into events and traces
Framework limit One orchestrator instance
DIY counter One local or semi-shared budget view

An agent, on the other hand, spans all of them.

So the control layer has to span the same surface the agent spans.

Anything inside only one dimension is a partial view.

And a partial view is not governance.

What the missing layer actually needs

Once you accept that the control layer has to live outside any one provider, tool, framework, or worker, the requirements become pretty clear.

External authority

The decision point has to live outside any one runtime slice.

Otherwise it will always be blind to the rest of the system.

Atomic distributed reservations

Two workers cannot both think the same remaining budget is available.

The concurrency problem has to be solved at the control layer itself, not patched afterward.

Hierarchical scope

The same primitive should work across:

  • tenant
  • workspace
  • app
  • workflow
  • agent
  • toolset

That is how you answer both:

  • how much can this customer spend?
  • how much can this single run spend?

Reserve, commit, release

You need to hold budget before the action runs, then settle with the actual amount after.

Otherwise you end up with one of two bad choices:

  • optimistic execution with bad enforcement
  • conservative blocking with permanently stranded budget

More than binary allow/deny

Real systems need graceful degradation.

Sometimes the right answer is not just ALLOW or DENY.

Sometimes it is:

  • allow, but cap tools
  • allow, but downgrade the model
  • allow, but reduce context
  • allow, but disable optional steps

That requires a real decision layer, not just a counter.

Provider- and framework-agnostic design

The control primitive cannot care whether the next action is:

  • an OpenAI call
  • an Anthropic call
  • a Stripe charge
  • an outbound email
  • a search query
  • a database write

If the agent spans all of them, governance has to as well.

This is a layer, not a feature

It is tempting to think the gap closes with one more feature release.

Maybe observability tools add enforcement.
Maybe providers add richer caps.
Maybe frameworks add distributed counters.

Possible? Sure.

But that would turn each of those tools into a different category of system.

This is why I think the missing piece is not another feature inside one runtime component.

It is an external, cross-cutting authority layer.

Keep the provider cap.
Keep the observability stack.
Keep the framework guardrails.
Keep the local circuit breakers.

But add the layer that answers the one question none of those can answer on their own:

May this agent, for this customer, on this worker, take the next action right now?

That is the control question that actually matters in production.

And it is why agent governance has to be cross-cutting.

Top comments (0)