This is a submission for the Hermes Agent Challenge.
A Hermes agent got stuck in a retry loop, hitting web_search over and over trying to find a paper that didn't exist. By the time I noticed, it had made 47 calls in 10 seconds and triggered the search API's rate limit on my behalf.
I needed to enforce limits inside my own code before I hit external ones. That's tool-call-rate-limit.
One call
from tool_call_rate_limit import RateLimiter
limiter = RateLimiter(calls=10, per_seconds=60)
# In your tool dispatch:
limiter.check("web_search") # ok or raises RateLimitExceeded
result = do_web_search(query)
Per-tool rules
Different tools have different risk profiles. web_search should be tight; read_file can be loose.
limiter = RateLimiter(calls=20, per_seconds=60) # default for everything
limiter.set_limit("web_search", calls=3, per_seconds=10) # web_search is strict
limiter.check("web_search") # 3 calls per 10s limit
limiter.check("read_file") # 20 calls per 60s limit
Sliding window, not fixed buckets
The window slides with real time, not against fixed clock ticks. If you made 3 calls between t=0 and t=5, the oldest call expires at t=10 and you get a new slot — not when the minute resets.
limiter = RateLimiter(calls=2, per_seconds=10)
limiter.check("search") # t=0
limiter.check("search") # t=0 — at limit
# ... 11 seconds later ...
limiter.check("search") # ok — first call has expired
RateLimitExceeded tells you when to retry
try:
limiter.check("web_search")
except RateLimitExceeded as e:
print(f"Wait {e.retry_after:.1f}s before retrying")
# e.tool, e.calls, e.per_seconds, e.current_count also available
Don't raise — return False
limiter = RateLimiter(calls=5, per_seconds=10, raise_on_limit=False)
if not limiter.check("tool"):
return {"status": "rate_limited"}
Inspect state
limiter.call_count("web_search") # calls in current window
limiter.remaining("web_search") # slots left before limit (None if no rule)
limiter.is_limited("web_search") # True if any rule applies
Factory
from tool_call_rate_limit import make_rate_limiter
limiter = make_rate_limiter(
calls=10, per_seconds=60,
tool_limits={"web_search": (2, 10), "execute_code": (5, 30)},
)
Zero dependencies
Standard library only: time, collections.deque, dataclasses. Nothing to install beyond the package.
pip install tool-call-rate-limit
Top comments (0)