Assili Salim

Posted on Jul 3

Gateway Routing Helps AI Apps. Agent Runtimes Still Need Pre-Call Guards.

#agents #ai #api #llm

Vercel added routing rules to AI Gateway on July 2.

Routing rules let teams rewrite or deny model requests at the gateway level. A rewrite rule serves a request for one model using another model. A deny rule blocks a model and returns a 403. Vercel lists use cases like rerouting when a model is down or retired, standardizing on one model, routing an expensive model to a cheaper one, or keeping a team off unapproved models.

That is useful infrastructure.

But if you are building AI agents, it is not the whole safety layer.

A gateway can decide where a request goes.

Your runtime still needs to decide whether the request should happen.

The difference

Gateway routing answers:

"Which model should serve this request?"

Runtime guarding answers:

"Should this request be allowed at all?"

Those are different problems.

A gateway may rewrite:

anthropic/claude-opus-4.8 -> anthropic/claude-haiku-4.5

That can keep traffic moving.

But the gateway may not know:

the agent already retried 12 times
the current prompt is nearly identical to previous failed prompts
the run exceeded its task budget
the agent passed its max-step limit
tool calls are happening without progress
the fallback model keeps the loop alive but does not improve the task

That context usually lives inside the agent runtime.

A naive agent loop

Many agent loops start like this:

while (!task.done) {
const response = await provider.call({
model: task.model,
messages: task.messages,
});

task = await applyAgentStep(task, response);
}

This is simple.

It is also missing the controls that matter in production.

There is no budget check.

No max-step check.

No retry-storm detection.

No prompt-loop detection.

No unknown-pricing block.

No no-progress stop.

If a gateway rewrites the model, this loop may keep running.

That is not always what you want.

Add a pre-call guard

A safer pattern puts a local decision before the provider call:

const decision = guard.beforeCall({
runId: task.id,
model: task.model,
messages: task.messages,
stepCount: task.steps.length,
retryCount: task.retryCount,
previousPrompts: task.previousPrompts,
budgetRemaining: task.budgetRemaining,
progressState: task.progress,
});

if (!decision.allowed) {
return {
status: "stopped",
reason: decision.reason,
error: decision.error,
};
}

const response = await provider.call({
model: task.model,
messages: task.messages,
});

The exact API does not matter.

The placement matters.

The guard runs before the provider call.

What should the runtime check?

Known model pricing

If the runtime cannot price the model, it cannot enforce a reliable budget.

if (!pricingCatalog.has(model)) {
return {
allowed: false,
reason: "unknown_model_pricing",
};
}

This matters even more when routing and fallback rules exist.

A rewritten model still has a cost profile.

The runtime should know it.

Budget remaining

Task-level budgets are different from account-level limits.

if (estimatedNextCallCost > budgetRemaining) {
return {
allowed: false,
reason: "budget_exceeded",
};
}

A monthly dashboard can show spend later.

A runtime budget can stop the next call now.

Max steps

Agents should have explicit stopping rules.

if (stepCount >= maxSteps) {
return {
allowed: false,
reason: "max_steps_exceeded",
};
}

A model route can change.

The step limit should still apply.

Retry storms

Retries are useful.

Blind retries are not.

if (retryCount > maxRetries && recentErrorsAreSimilar(errors)) {
return {
allowed: false,
reason: "retry_storm_detected",
};
}

A fallback model can hide a retry storm by keeping the run alive.

The runtime should detect the pattern.

Prompt loops

Agents sometimes ask almost the same thing repeatedly.

if (similarToRecentPrompt(currentPrompt, previousPrompts)) {
return {
allowed: false,
reason: "similar_prompt_loop",
};
}

If the prompt is not changing meaningfully, the model route may not be the main issue.

The agent may be stuck.

No progress

A run can be active without improving.

Useful progress signals include:

tests passing
errors decreasing
files changing meaningfully
task checklist items completing
user-defined success criteria improving

If the agent consumes steps without progress, stop.

Layer the controls

A good AI-agent architecture can use both gateway policy and runtime guards.

One possible flow:

agent wants next call
↓
local runtime guard checks run state
↓
gateway applies model routing or deny rules
↓
provider executes request
↓
logs and dashboards record result

The order matters.

The runtime has run context.

The gateway has team-level model policy.

The provider executes.

The dashboard explains.

Do not ask one layer to do all four jobs.

Where AI CostGuard fits

AI CostGuard is the local-first TypeScript / Node.js runtime safety layer I’m building for this exact class of problem.

It focuses on pre-call checks for AI-agent projects:

retry storms
prompt loops
max-step explosions
runaway agent execution
unknown model pricing
budget overruns
uncontrolled provider calls

It is not a billing ledger.

It is not a hard security boundary.

It does not replace provider dashboards or gateway routing.

The goal is narrower:

help the agent runtime decide whether the next provider call should execute.

Takeaway

Gateway routing is useful.

It centralizes model policy.

It helps teams move traffic when models are down, retired, too expensive, or not approved.

But routing does not replace runtime safety.

A cheaper fallback can still waste money.

A policy-approved model can still be part of a prompt loop.

A valid request can still exceed the task budget.

For AI agents, the critical question happens before the call:

Should this request exist?
https://github.com/salimassili62-afk/ai-costguard

DEV Community

Gateway Routing Helps AI Apps. Agent Runtimes Still Need Pre-Call Guards.

Top comments (0)