You ship an AI agent that can send emails. It works great in testing.
Then one night, the agent hits a retry loop. A flaky API responds slowly, the agent interprets the delay as failure, and it tries again. And again. By morning, a single user has received 847 confirmation emails. Your support inbox is on fire. Your API provider has suspended your account.
This isn't a hypothetical. It's the kind of thing that happens when you give agents real tools and don't put guardrails around how often they can use them.
In the first article, I introduced Guardio - a policy enforcement proxy that sits between your AI agent and the outside world. Today, I want to show you one of its newest built-in policies: rate limiting.
Why Rate Limiting Is Different for AI Agents
With traditional APIs, rate limiting is simple: a client sends too many requests, the server returns a 429, and the client backs off. Problem solved.
AI agents are messier.
- They can retry silently without you noticing until it's too late
- They don't always respect error signals the way a human-coded client would
- A single agent decision (like "send a daily summary") can be triggered hundreds of times if the agent's context gets corrupted or the loop condition misbehaves
- Different tools deserve different limits - spamming a read-only knowledge base is annoying; spamming a billing endpoint is catastrophic
You need rate limiting that is per-tool, deterministic, and enforced outside the agent - so the agent literally cannot exceed it, regardless of how it behaves.
That's exactly what Guardio's rate-limit-tool policy plugin does.
Quick Recap: What Is Guardio?
Guardio is a proxy you run alongside your AI agent. Every tool call your agent makes (to an MCP server, an external API, a database) passes through Guardio first. Guardio evaluates it against your configured policies, and only forwards it if it's allowed.
AI Agent → Guardio → MCP Tool / External API
No AI in the enforcement path. No prompt engineering. Just hard rules.
Setting Up Guardio
If you haven't set it up yet, one command scaffolds a full project:
npx create-guardio
You'll be prompted to choose:
- A project directory name
- The HTTP port Guardio will listen on (default:
3939) - A storage backend (SQLite is the easiest to start with)
- Whether to install the dashboard UI
Once scaffolded:
cd guardio-project
npm install
npm run guardio
Then point your AI agent or MCP client at http://127.0.0.1:3939 instead of directly at your tools.
Your config lives in guardio.config.ts. Here's a minimal example with an MCP tool connected:
// guardio.config.ts
import type { GuardioConfig } from "@guardiojs/guardio";
const config: GuardioConfig = {
client: {
port: 3939,
},
servers: [
{
name: "email-tool",
type: "url",
url: "https://your-mcp-email-server.com/sse",
},
],
plugins: [
{
type: "storage",
name: "sqlite",
config: { database: "guardio.sqlite" },
},
],
};
export default config;
Your agent connects to http://127.0.0.1:3939/email-tool/sse - Guardio is now in the middle.
Introducing rate-limit-tool
The rate-limit-tool policy plugin enforces a maximum number of calls to any given tool within a fixed time window. It's a built-in plugin shipped with Guardio - no extra installation needed.
The configuration is intentionally simple:
| Field | Type | Description |
|---|---|---|
limit |
number | Maximum calls allowed in the window |
windowSeconds |
number | Duration of the time window, in seconds |
For example: limit: 5, windowSeconds: 60 means no more than 5 calls per minute.
How It Works Under the Hood
The plugin uses fixed time windows - it doesn't slide. If your window is 60 seconds, windows are 0:00–1:00, 1:00–2:00, etc. Simple and predictable.
State (current count and window start) is stored in the PluginRepository - meaning it persists across requests and survives restarts if you're using SQLite or PostgreSQL. If no storage is configured, the plugin fails open (allows all calls) and logs a warning. This is a deliberate design choice: Guardio doesn't silently break your agent in misconfigured environments.
When the limit is exceeded, the agent receives a structured block response - not a raw error, but a clean JSON-RPC success result with human-readable reason:
Rate limit exceeded: 5/5 calls in 60s window. Resets at 2025-03-18T12:01:00.000Z.
The agent frameworks won't choke on this. They'll get a clear message they can surface or log.
Configuring the Policy via the Dashboard
If you installed the Guardio dashboard, configuring rate limits is point-and-click.
- Open the dashboard (
npm run dashboard) - Navigate to Policies
- Create a new policy, select
rate-limit-tool - Fill in
limitandwindowSeconds - Assign it to the tool(s) you want to protect
You can create multiple instances of the policy with different limits - for example, a strict limit on your email tool and a more generous one on a read-only search tool.
Configuring the Policy in Code
If you prefer to manage things programmatically, you can wire up the plugin directly. Here's the full implementation for reference - this is exactly what's shipping in Guardio:
import { z } from "zod";
import type {
PolicyPluginInterface,
PolicyRequestContext,
PolicyResult,
PluginRepository,
} from "@guardiojs/guardio";
const rateLimitToolConfigSchema = z.object({
limit: z.number().int().min(1),
windowSeconds: z.number().int().min(1),
});
class RateLimitToolPolicyPlugin implements PolicyPluginInterface {
readonly name = "rate-limit-tool";
constructor(
private readonly limit: number,
private readonly windowSeconds: number,
private readonly repo?: PluginRepository,
) {}
async evaluate(context: PolicyRequestContext): Promise<PolicyResult> {
if (!this.repo) return { verdict: "allow" };
const windowMs = this.windowSeconds * 1000;
const now = Date.now();
const currentWindowStart = Math.floor(now / windowMs);
const contextKey = `ratelimit:${context.toolName}`;
const doc = await this.repo.getDocument(contextKey);
const stored = doc?.data as { windowStart: number; count: number } | undefined;
const isNewWindow = (stored?.windowStart ?? 0) !== currentWindowStart;
const currentCount = isNewWindow ? 0 : (stored?.count ?? 0);
const resetsAt = new Date((currentWindowStart + 1) * windowMs).toISOString();
if (currentCount >= this.limit) {
return {
verdict: "block",
code: "RATE_LIMIT_EXCEEDED",
reason: `Rate limit exceeded: ${currentCount}/${this.limit} calls in ${this.windowSeconds}s window. Resets at ${resetsAt}.`,
metadata: { currentCount, limit: this.limit, windowSeconds: this.windowSeconds, resetsAt },
};
}
await this.repo.saveDocument(contextKey, {
windowStart: currentWindowStart,
count: currentCount + 1,
}, doc?.id);
return { verdict: "allow" };
}
}
A few things worth noticing here:
-
Per-tool keying: the storage key is
ratelimit:{toolName}, so each tool gets its own independent counter. Exceeding the limit onsend_emaildoesn't affectsearch_docs. - Atomic-ish updates: the plugin reads the current count, increments, and saves in sequence. For very high-concurrency scenarios you'd want to pair this with a more robust store, but for typical agent workloads this is more than sufficient.
-
Clean metadata: the
PolicyResultcarriescurrentCount,limit, andresetsAtinmetadata- so your event sink and dashboard can surface real usage data, not just "blocked".
A Practical Example: Protecting an Email Tool
Say your agent has access to a send_email MCP tool. You want to allow it to send at most 10 emails per hour - enough for normal operation, but a hard cap against runaway loops.
Set up Guardio with:
limit: 10
windowSeconds: 3600
Assign this policy to the send_email tool in the dashboard (or via config).
Now, when the agent calls send_email for the 11th time in the same hour, it gets back:
{
"isError": true,
"content": [
{
"type": "text",
"text": "Rate limit exceeded: 10/10 calls in 3600s window. Resets at 2025-03-18T13:00:00.000Z."
}
],
"_guardio": {
"action": "BLOCKED",
"policyId": "rate-limit-tool",
"code": "RATE_LIMIT_EXCEEDED"
}
}
The email is never sent. The upstream server never sees the request. And in your dashboard, you have a full audit trail of every allowed and blocked call.
Stacking Policies
Rate limiting doesn't have to stand alone. Guardio evaluates policies as a chain - if any returns block, the call is stopped. This means you can combine rate-limit-tool with other policies:
-
deny-regex-parameter- block calls where an argument matches a pattern (e.g. block emails to*@competitor.com) -
deny-tool-access- block the tool entirely for specific agents - Your own custom policy plugin - any TypeScript class that implements
PolicyPluginInterface
A real setup might look like: rate limit the email tool to 10/hour, AND block any call where the recipient matches a known bad domain. Both policies apply. Either one can stop the call.
Try It
npx create-guardio
🔗 GitHub: https://github.com/radoslaw-sz/guardio
If this solves a problem you've been staring at, a ⭐ on GitHub goes a long way. And if you have a policy use case you'd like to see built in - open an issue.
Top comments (0)