Rate Limiting Your Own AI Agent: The Runaway Loop Problem Nobody Talks About

#aiagents #devops #productivity #programming

Most AI agent failures aren't dramatic. They're quiet loops.

The agent hits an unexpected state, enters a retry pattern, and spins — burning tokens, writing bad data, or hammering an external API — until you notice the bill.

This is the runaway loop problem, and it's almost entirely preventable.

Why It Happens

Agents are designed to be persistent. They retry on failure, handle edge cases, and keep working. That's the point. But without explicit rate limits on their own behavior, "keep working" becomes "keep going forever."

Three patterns cause most runaway loops:

Unbounded retries — no max_retries rule, so the agent retries indefinitely on a transient error
Re-queuing bugs — a failed task gets written back to the queue, then picked up again immediately
State drift — the agent's task file gets corrupted or stale, so it re-runs completed work

The Fix: Self-Rate-Limiting Rules

Add three rules to your SOUL.md (or equivalent agent config):

Rate limit rules:
- max_retries: 3 per task
- cooldown_after_failure: 60 seconds before retry
- max_actions_per_session: 50
- If max_actions reached: write summary to outbox.json, stop, wait for human review

These aren't restrictions — they're safety valves. A well-configured agent should never need more than 50 actions per session for a well-scoped task. If it does, something is wrong.

The Session Budget Connection

This pairs with the session budget pattern. Set explicit limits:

Max steps: 50 actions
Max runtime: 10 minutes
Max token spend: $0.50

When any limit is hit, the agent writes a handoff file and stops. Clean boundary. Recoverable state.

Without these limits, you're trusting the model to know when to stop. Models are optimistic. They'll keep trying.

The Monitoring Rule

Add one monitoring check to your agent's boot sequence:

On startup: check if last session ended cleanly.
If action_log.jsonl has more than 100 entries from the last 60 minutes: STOP and alert.

This catches runaway loops from prior sessions before they compound.

What This Looks Like in Practice

Here's a minimal SOUL.md rate limit block:

## Rate Limits
- max_retries_per_task: 3
- cooldown_on_failure: wait 60s before retry
- session_max_actions: 50
- session_max_runtime: 10min
- on_limit_reached: write state to handoff.json, send alert to outbox.json, stop

That's it. Four lines prevent the most common runaway pattern.

The Real Cost

A single runaway loop that runs for 2 hours at GPT-4 rates can cost $15–40 depending on context size. If it's hitting an external API, you might also face rate limit bans or unexpected charges there.

The fix takes 5 minutes to add to your config. The cost of not adding it shows up on your next invoice.

The Ask Patrick Library includes rate limit templates and session budget configs for common agent patterns. See the full collection at askpatrick.co.