You deploy an AI agent. It processes orders. It works fine for a week.
Then an upstream API starts returning intermittent 500s. The agent retries. And retries. And retries. There is no backoff cap. There is no rate limit. There is no cost ceiling.
By the time someone checks the dashboard, the agent has made 10,000 API calls in an hour. LLM costs are $130 and climbing. The upstream API has rate-limited your entire API key, so now every other agent in your system is also failing.
This is not a hypothetical. This is what happens when AI agents have no centralized rate control.
Why Agent Rate Limiting Is Different
Traditional rate limiting protects your API from external callers. Agent rate limiting is the opposite - it protects external APIs (and your budget) from your own agents.
The difference matters because:
Traditional rate limiting - you control the server. You add middleware. You return 429. Done.
Agent rate limiting - you control the client. The agent makes outbound calls. There is no middleware layer between your agent and the APIs it calls. Unless you build one.
Most teams don't build one. They add time.sleep(1) between calls and call it rate limiting. That works until:
- The agent spawns sub-agents that each have their own sleep timers
- Multiple agents share the same API key
- Retry loops override the sleep timers
- Nobody is tracking total cost across all agents
What You Actually End Up Building
If you take rate limiting seriously, you end up with something like this:
import redis
import time
from datetime import datetime
r = redis.Redis()
def rate_limited_call(agent_id, func, *args):
# Hourly limit
hour_key = f"rate:{agent_id}:{datetime.now().strftime('%Y%m%d%H')}"
hourly_count = r.incr(hour_key)
r.expire(hour_key, 3600)
if hourly_count > 200:
raise RateLimitExceeded(f"Hourly limit: {hourly_count}/200")
# Daily limit
day_key = f"rate:{agent_id}:{datetime.now().strftime('%Y%m%d')}"
daily_count = r.incr(day_key)
r.expire(day_key, 86400)
if daily_count > 2000:
raise RateLimitExceeded(f"Daily limit: {daily_count}/2000")
# Cost tracking (need a separate cost accumulator)
cost = estimate_cost(func, *args)
cost_key = f"cost:{agent_id}:{datetime.now().strftime('%Y%m%d')}"
current_cost = float(r.get(cost_key) or 0)
if current_cost + cost > 10.00:
raise CostLimitExceeded(f"Daily cost: ${current_cost + cost:.2f}/$10.00")
r.incrbyfloat(cost_key, cost)
r.expire(cost_key, 86400)
return func(*args)
Redis. Two key patterns. Cost estimation. Expiry management. And this is the simplified version that handles one agent. Now multiply by:
- Per-agent policies (some agents get 200/hour, others get 5,000)
- Multiple breach actions (block vs alert vs require approval)
- A dashboard so ops can see current usage
- An audit trail for cost attribution
- Alerting when agents approach limits
That is 2-3 weeks of work that has nothing to do with your product.
What This Should Look Like
Set a cost policy on the agent. One API call:
import httpx
import os
api_key = os.environ["AXME_API_KEY"]
base_url = "https://cloud.axme.ai"
headers = {"x-api-key": api_key}
agent_address = "agent://myorg/production/order-processor"
httpx.put(
f"{base_url}/v1/mesh/agents/{agent_address}/policies/cost",
headers=headers,
json={
"max_intents_per_hour": 200,
"max_intents_per_day": 2000,
"max_cost_per_day_usd": 10.00,
"action_on_breach": "block",
},
)
That is the entire rate limiting implementation. No Redis. No key expiry logic. No cost accumulator.
When the agent exceeds any limit, the gateway returns 429 with a Retry-After header. The agent stops. The other agents on the same workspace keep running because the limit is per-agent, not per-key.
The Three Limits
AXME cost policies support three dimensions:
| Limit | What it controls |
|---|---|
max_intents_per_hour |
Rolling hourly intent count per agent |
max_intents_per_day |
Calendar day intent count per agent |
max_cost_per_day_usd |
Estimated USD spend per agent per day |
Each is optional. Set one, two, or all three.
Breach Actions
When a limit is hit, you choose what happens:
block - Gateway returns 429. Agent cannot send more intents until the window resets. This is the hard stop.
alert - Intent is delivered, but an alert fires. Use this when you want visibility without disruption. Good for observing normal patterns before setting hard limits.
require_approval - Intent is held in a pending state. A human must approve it before delivery continues. Use this for high-cost operations where you want a human checkpoint.
Timeline: Without vs With
Without rate limiting:
09:00 Agent processes 50 orders (normal)
09:15 Upstream API returns 500s intermittently
09:16 Agent retries aggressively (no backoff cap)
09:30 5,000 API calls. $47 in LLM costs.
09:45 12,000 API calls. $130 in costs.
09:45 Upstream rate-limits your API key.
09:45 All other agents start failing.
11:00 Someone finally notices the dashboard is red.
With AXME cost policy:
09:00 Agent processes 50 orders (normal)
09:15 Upstream API returns 500s intermittently
09:16 Agent retries aggressively
09:16 200 intents/hour limit reached. Gateway returns 429.
09:16 Agent stops. Alert fires. $0.80 spent.
09:16 All other agents continue working normally.
The difference: $130 and a system-wide outage vs $0.80 and one agent paused for an hour.
Checking Usage
You can query the current policy and usage at any time:
response = httpx.get(
f"{base_url}/v1/mesh/agents/{agent_address}/policies/cost",
headers=headers,
)
policy = response.json()["policy"]
print(f"Hourly limit: {policy['max_intents_per_hour']}")
print(f"Daily limit: {policy['max_intents_per_day']}")
print(f"Cost cap: ${policy['max_cost_per_day_usd']}")
Or use the dashboard at mesh.axme.ai for real-time counters across all agents:
Rate and cost policies are configured alongside agent health:
The Pattern
Rate limiting for AI agents is not the same as rate limiting for APIs. Your agents are the callers, not the receivers. You need the limit enforced between your agents and the outside world - at the gateway.
That is what AXME cost policies do. One API call sets the limits. The gateway enforces them. The dashboard shows usage. The audit trail records breaches.
No Redis. No cron jobs. No custom middleware.
Try It
Working example with policy setup, agent, and rate-limit trigger:
github.com/AxmeAI/ai-agent-rate-limiting
Built with AXME - rate limiting, cost caps, and usage policies built into the agent mesh. Alpha - feedback welcome.


Top comments (0)