DEV Community

George Belsky
George Belsky

Posted on

Your AI Agent Made 10,000 API Calls in an Hour. Here's How to Stop That.

You deploy an AI agent. It processes orders. It works fine for a week.

Then an upstream API starts returning intermittent 500s. The agent retries. And retries. And retries. There is no backoff cap. There is no rate limit. There is no cost ceiling.

By the time someone checks the dashboard, the agent has made 10,000 API calls in an hour. LLM costs are $130 and climbing. The upstream API has rate-limited your entire API key, so now every other agent in your system is also failing.

This is not a hypothetical. This is what happens when AI agents have no centralized rate control.

Why Agent Rate Limiting Is Different

Traditional rate limiting protects your API from external callers. Agent rate limiting is the opposite - it protects external APIs (and your budget) from your own agents.

The difference matters because:

Traditional rate limiting - you control the server. You add middleware. You return 429. Done.

Agent rate limiting - you control the client. The agent makes outbound calls. There is no middleware layer between your agent and the APIs it calls. Unless you build one.

Most teams don't build one. They add time.sleep(1) between calls and call it rate limiting. That works until:

  1. The agent spawns sub-agents that each have their own sleep timers
  2. Multiple agents share the same API key
  3. Retry loops override the sleep timers
  4. Nobody is tracking total cost across all agents

What You Actually End Up Building

If you take rate limiting seriously, you end up with something like this:

import redis
import time
from datetime import datetime

r = redis.Redis()

def rate_limited_call(agent_id, func, *args):
    # Hourly limit
    hour_key = f"rate:{agent_id}:{datetime.now().strftime('%Y%m%d%H')}"
    hourly_count = r.incr(hour_key)
    r.expire(hour_key, 3600)
    if hourly_count > 200:
        raise RateLimitExceeded(f"Hourly limit: {hourly_count}/200")

    # Daily limit
    day_key = f"rate:{agent_id}:{datetime.now().strftime('%Y%m%d')}"
    daily_count = r.incr(day_key)
    r.expire(day_key, 86400)
    if daily_count > 2000:
        raise RateLimitExceeded(f"Daily limit: {daily_count}/2000")

    # Cost tracking (need a separate cost accumulator)
    cost = estimate_cost(func, *args)
    cost_key = f"cost:{agent_id}:{datetime.now().strftime('%Y%m%d')}"
    current_cost = float(r.get(cost_key) or 0)
    if current_cost + cost > 10.00:
        raise CostLimitExceeded(f"Daily cost: ${current_cost + cost:.2f}/$10.00")
    r.incrbyfloat(cost_key, cost)
    r.expire(cost_key, 86400)

    return func(*args)
Enter fullscreen mode Exit fullscreen mode

Redis. Two key patterns. Cost estimation. Expiry management. And this is the simplified version that handles one agent. Now multiply by:

  • Per-agent policies (some agents get 200/hour, others get 5,000)
  • Multiple breach actions (block vs alert vs require approval)
  • A dashboard so ops can see current usage
  • An audit trail for cost attribution
  • Alerting when agents approach limits

That is 2-3 weeks of work that has nothing to do with your product.

What This Should Look Like

Set a cost policy on the agent. One API call:

import httpx
import os

api_key = os.environ["AXME_API_KEY"]
base_url = "https://cloud.axme.ai"
headers = {"x-api-key": api_key}

agent_address = "agent://myorg/production/order-processor"

httpx.put(
    f"{base_url}/v1/mesh/agents/{agent_address}/policies/cost",
    headers=headers,
    json={
        "max_intents_per_hour": 200,
        "max_intents_per_day": 2000,
        "max_cost_per_day_usd": 10.00,
        "action_on_breach": "block",
    },
)
Enter fullscreen mode Exit fullscreen mode

That is the entire rate limiting implementation. No Redis. No key expiry logic. No cost accumulator.

When the agent exceeds any limit, the gateway returns 429 with a Retry-After header. The agent stops. The other agents on the same workspace keep running because the limit is per-agent, not per-key.

The Three Limits

AXME cost policies support three dimensions:

Limit What it controls
max_intents_per_hour Rolling hourly intent count per agent
max_intents_per_day Calendar day intent count per agent
max_cost_per_day_usd Estimated USD spend per agent per day

Each is optional. Set one, two, or all three.

Breach Actions

When a limit is hit, you choose what happens:

block - Gateway returns 429. Agent cannot send more intents until the window resets. This is the hard stop.

alert - Intent is delivered, but an alert fires. Use this when you want visibility without disruption. Good for observing normal patterns before setting hard limits.

require_approval - Intent is held in a pending state. A human must approve it before delivery continues. Use this for high-cost operations where you want a human checkpoint.

Timeline: Without vs With

Without rate limiting:

09:00  Agent processes 50 orders (normal)
09:15  Upstream API returns 500s intermittently
09:16  Agent retries aggressively (no backoff cap)
09:30  5,000 API calls. $47 in LLM costs.
09:45  12,000 API calls. $130 in costs.
09:45  Upstream rate-limits your API key.
09:45  All other agents start failing.
11:00  Someone finally notices the dashboard is red.
Enter fullscreen mode Exit fullscreen mode

With AXME cost policy:

09:00  Agent processes 50 orders (normal)
09:15  Upstream API returns 500s intermittently
09:16  Agent retries aggressively
09:16  200 intents/hour limit reached. Gateway returns 429.
09:16  Agent stops. Alert fires. $0.80 spent.
09:16  All other agents continue working normally.
Enter fullscreen mode Exit fullscreen mode

The difference: $130 and a system-wide outage vs $0.80 and one agent paused for an hour.

Checking Usage

You can query the current policy and usage at any time:

response = httpx.get(
    f"{base_url}/v1/mesh/agents/{agent_address}/policies/cost",
    headers=headers,
)
policy = response.json()["policy"]
print(f"Hourly limit: {policy['max_intents_per_hour']}")
print(f"Daily limit:  {policy['max_intents_per_day']}")
print(f"Cost cap:     ${policy['max_cost_per_day_usd']}")
Enter fullscreen mode Exit fullscreen mode

Or use the dashboard at mesh.axme.ai for real-time counters across all agents:

Agent Mesh Dashboard

Rate and cost policies are configured alongside agent health:

Policies

The Pattern

Rate limiting for AI agents is not the same as rate limiting for APIs. Your agents are the callers, not the receivers. You need the limit enforced between your agents and the outside world - at the gateway.

That is what AXME cost policies do. One API call sets the limits. The gateway enforces them. The dashboard shows usage. The audit trail records breaches.

No Redis. No cron jobs. No custom middleware.

Try It

Working example with policy setup, agent, and rate-limit trigger:

github.com/AxmeAI/ai-agent-rate-limiting

Built with AXME - rate limiting, cost caps, and usage policies built into the agent mesh. Alpha - feedback welcome.

Top comments (0)