Vercel added routing rules to AI Gateway on July 2.
Routing rules let teams rewrite or deny model requests at the gateway level. A rewrite rule serves a request for one model using another model. A deny rule blocks a model and returns a 403. Vercel lists use cases like rerouting when a model is down or retired, standardizing on one model, routing an expensive model to a cheaper one, or keeping a team off unapproved models.
That is useful infrastructure.
But if you are building AI agents, it is not the whole safety layer.
A gateway can decide where a request goes.
Your runtime still needs to decide whether the request should happen.
The difference
Gateway routing answers:
"Which model should serve this request?"
Runtime guarding answers:
"Should this request be allowed at all?"
Those are different problems.
A gateway may rewrite:
anthropic/claude-opus-4.8 -> anthropic/claude-haiku-4.5
That can keep traffic moving.
But the gateway may not know:
the agent already retried 12 times
the current prompt is nearly identical to previous failed prompts
the run exceeded its task budget
the agent passed its max-step limit
tool calls are happening without progress
the fallback model keeps the loop alive but does not improve the task
That context usually lives inside the agent runtime.
A naive agent loop
Many agent loops start like this:
while (!task.done) {
const response = await provider.call({
model: task.model,
messages: task.messages,
});
task = await applyAgentStep(task, response);
}
This is simple.
It is also missing the controls that matter in production.
There is no budget check.
No max-step check.
No retry-storm detection.
No prompt-loop detection.
No unknown-pricing block.
No no-progress stop.
If a gateway rewrites the model, this loop may keep running.
That is not always what you want.
Add a pre-call guard
A safer pattern puts a local decision before the provider call:
const decision = guard.beforeCall({
runId: task.id,
model: task.model,
messages: task.messages,
stepCount: task.steps.length,
retryCount: task.retryCount,
previousPrompts: task.previousPrompts,
budgetRemaining: task.budgetRemaining,
progressState: task.progress,
});
if (!decision.allowed) {
return {
status: "stopped",
reason: decision.reason,
error: decision.error,
};
}
const response = await provider.call({
model: task.model,
messages: task.messages,
});
The exact API does not matter.
The placement matters.
The guard runs before the provider call.
What should the runtime check?
- Known model pricing
If the runtime cannot price the model, it cannot enforce a reliable budget.
if (!pricingCatalog.has(model)) {
return {
allowed: false,
reason: "unknown_model_pricing",
};
}
This matters even more when routing and fallback rules exist.
A rewritten model still has a cost profile.
The runtime should know it.
- Budget remaining
Task-level budgets are different from account-level limits.
if (estimatedNextCallCost > budgetRemaining) {
return {
allowed: false,
reason: "budget_exceeded",
};
}
A monthly dashboard can show spend later.
A runtime budget can stop the next call now.
- Max steps
Agents should have explicit stopping rules.
if (stepCount >= maxSteps) {
return {
allowed: false,
reason: "max_steps_exceeded",
};
}
A model route can change.
The step limit should still apply.
- Retry storms
Retries are useful.
Blind retries are not.
if (retryCount > maxRetries && recentErrorsAreSimilar(errors)) {
return {
allowed: false,
reason: "retry_storm_detected",
};
}
A fallback model can hide a retry storm by keeping the run alive.
The runtime should detect the pattern.
- Prompt loops
Agents sometimes ask almost the same thing repeatedly.
if (similarToRecentPrompt(currentPrompt, previousPrompts)) {
return {
allowed: false,
reason: "similar_prompt_loop",
};
}
If the prompt is not changing meaningfully, the model route may not be the main issue.
The agent may be stuck.
- No progress
A run can be active without improving.
Useful progress signals include:
tests passing
errors decreasing
files changing meaningfully
task checklist items completing
user-defined success criteria improving
If the agent consumes steps without progress, stop.
Layer the controls
A good AI-agent architecture can use both gateway policy and runtime guards.
One possible flow:
agent wants next call
↓
local runtime guard checks run state
↓
gateway applies model routing or deny rules
↓
provider executes request
↓
logs and dashboards record result
The order matters.
The runtime has run context.
The gateway has team-level model policy.
The provider executes.
The dashboard explains.
Do not ask one layer to do all four jobs.
Where AI CostGuard fits
AI CostGuard is the local-first TypeScript / Node.js runtime safety layer I’m building for this exact class of problem.
It focuses on pre-call checks for AI-agent projects:
retry storms
prompt loops
max-step explosions
runaway agent execution
unknown model pricing
budget overruns
uncontrolled provider calls
It is not a billing ledger.
It is not a hard security boundary.
It does not replace provider dashboards or gateway routing.
The goal is narrower:
help the agent runtime decide whether the next provider call should execute.
Takeaway
Gateway routing is useful.
It centralizes model policy.
It helps teams move traffic when models are down, retired, too expensive, or not approved.
But routing does not replace runtime safety.
A cheaper fallback can still waste money.
A policy-approved model can still be part of a prompt loop.
A valid request can still exceed the task budget.
For AI agents, the critical question happens before the call:
Should this request exist?
https://github.com/salimassili62-afk/ai-costguard

Top comments (0)