DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

My agent called web_search 47 times in 10 seconds. One class stopped it.

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge.

A Hermes agent got stuck in a retry loop, hitting web_search over and over trying to find a paper that didn't exist. By the time I noticed, it had made 47 calls in 10 seconds and triggered the search API's rate limit on my behalf.

I needed to enforce limits inside my own code before I hit external ones. That's tool-call-rate-limit.

One call

from tool_call_rate_limit import RateLimiter

limiter = RateLimiter(calls=10, per_seconds=60)

# In your tool dispatch:
limiter.check("web_search")   # ok or raises RateLimitExceeded
result = do_web_search(query)
Enter fullscreen mode Exit fullscreen mode

Per-tool rules

Different tools have different risk profiles. web_search should be tight; read_file can be loose.

limiter = RateLimiter(calls=20, per_seconds=60)  # default for everything
limiter.set_limit("web_search", calls=3, per_seconds=10)  # web_search is strict

limiter.check("web_search")   # 3 calls per 10s limit
limiter.check("read_file")    # 20 calls per 60s limit
Enter fullscreen mode Exit fullscreen mode

Sliding window, not fixed buckets

The window slides with real time, not against fixed clock ticks. If you made 3 calls between t=0 and t=5, the oldest call expires at t=10 and you get a new slot — not when the minute resets.

limiter = RateLimiter(calls=2, per_seconds=10)
limiter.check("search")  # t=0
limiter.check("search")  # t=0 — at limit
# ... 11 seconds later ...
limiter.check("search")  # ok — first call has expired
Enter fullscreen mode Exit fullscreen mode

RateLimitExceeded tells you when to retry

try:
    limiter.check("web_search")
except RateLimitExceeded as e:
    print(f"Wait {e.retry_after:.1f}s before retrying")
    # e.tool, e.calls, e.per_seconds, e.current_count also available
Enter fullscreen mode Exit fullscreen mode

Don't raise — return False

limiter = RateLimiter(calls=5, per_seconds=10, raise_on_limit=False)

if not limiter.check("tool"):
    return {"status": "rate_limited"}
Enter fullscreen mode Exit fullscreen mode

Inspect state

limiter.call_count("web_search")   # calls in current window
limiter.remaining("web_search")    # slots left before limit (None if no rule)
limiter.is_limited("web_search")   # True if any rule applies
Enter fullscreen mode Exit fullscreen mode

Factory

from tool_call_rate_limit import make_rate_limiter

limiter = make_rate_limiter(
    calls=10, per_seconds=60,
    tool_limits={"web_search": (2, 10), "execute_code": (5, 30)},
)
Enter fullscreen mode Exit fullscreen mode

Zero dependencies

Standard library only: time, collections.deque, dataclasses. Nothing to install beyond the package.

pip install tool-call-rate-limit
Enter fullscreen mode Exit fullscreen mode

Repo: https://github.com/MukundaKatta/tool-call-rate-limit

Top comments (0)