DEV Community: Radosław

🛡️ How to Stop Your AI Agent from Sending 10,000 Emails in a Loop

Radosław — Wed, 18 Mar 2026 11:37:13 +0000

You ship an AI agent that can send emails. It works great in testing.

Then one night, the agent hits a retry loop. A flaky API responds slowly, the agent interprets the delay as failure, and it tries again. And again. By morning, a single user has received 847 confirmation emails. Your support inbox is on fire. Your API provider has suspended your account.

This isn't a hypothetical. It's the kind of thing that happens when you give agents real tools and don't put guardrails around how often they can use them.

In the first article, I introduced Guardio - a policy enforcement proxy that sits between your AI agent and the outside world. Today, I want to show you one of its newest built-in policies: rate limiting.

Why Rate Limiting Is Different for AI Agents

With traditional APIs, rate limiting is simple: a client sends too many requests, the server returns a 429, and the client backs off. Problem solved.

AI agents are messier.

They can retry silently without you noticing until it's too late
They don't always respect error signals the way a human-coded client would
A single agent decision (like "send a daily summary") can be triggered hundreds of times if the agent's context gets corrupted or the loop condition misbehaves
Different tools deserve different limits - spamming a read-only knowledge base is annoying; spamming a billing endpoint is catastrophic

You need rate limiting that is per-tool, deterministic, and enforced outside the agent - so the agent literally cannot exceed it, regardless of how it behaves.

That's exactly what Guardio's rate-limit-tool policy plugin does.

Quick Recap: What Is Guardio?

Guardio is a proxy you run alongside your AI agent. Every tool call your agent makes (to an MCP server, an external API, a database) passes through Guardio first. Guardio evaluates it against your configured policies, and only forwards it if it's allowed.

AI Agent → Guardio → MCP Tool / External API

No AI in the enforcement path. No prompt engineering. Just hard rules.

Setting Up Guardio

If you haven't set it up yet, one command scaffolds a full project:

npx create-guardio

You'll be prompted to choose:

A project directory name
The HTTP port Guardio will listen on (default: 3939)
A storage backend (SQLite is the easiest to start with)
Whether to install the dashboard UI

Once scaffolded:

cd guardio-project
npm install
npm run guardio

Then point your AI agent or MCP client at http://127.0.0.1:3939 instead of directly at your tools.

Your config lives in guardio.config.ts. Here's a minimal example with an MCP tool connected:

// guardio.config.ts
import type { GuardioConfig } from "@guardiojs/guardio";

const config: GuardioConfig = {
  client: {
    port: 3939,
  },
  servers: [
    {
      name: "email-tool",
      type: "url",
      url: "https://your-mcp-email-server.com/sse",
    },
  ],
  plugins: [
    {
      type: "storage",
      name: "sqlite",
      config: { database: "guardio.sqlite" },
    },
  ],
};

export default config;

Your agent connects to http://127.0.0.1:3939/email-tool/sse - Guardio is now in the middle.

Introducing `rate-limit-tool`

The rate-limit-tool policy plugin enforces a maximum number of calls to any given tool within a fixed time window. It's a built-in plugin shipped with Guardio - no extra installation needed.

The configuration is intentionally simple:

Field	Type	Description
`limit`	number	Maximum calls allowed in the window
`windowSeconds`	number	Duration of the time window, in seconds

For example: limit: 5, windowSeconds: 60 means no more than 5 calls per minute.

How It Works Under the Hood

The plugin uses fixed time windows - it doesn't slide. If your window is 60 seconds, windows are 0:00–1:00, 1:00–2:00, etc. Simple and predictable.

State (current count and window start) is stored in the PluginRepository - meaning it persists across requests and survives restarts if you're using SQLite or PostgreSQL. If no storage is configured, the plugin fails open (allows all calls) and logs a warning. This is a deliberate design choice: Guardio doesn't silently break your agent in misconfigured environments.

When the limit is exceeded, the agent receives a structured block response - not a raw error, but a clean JSON-RPC success result with human-readable reason:

Rate limit exceeded: 5/5 calls in 60s window. Resets at 2025-03-18T12:01:00.000Z.

The agent frameworks won't choke on this. They'll get a clear message they can surface or log.

Configuring the Policy via the Dashboard

If you installed the Guardio dashboard, configuring rate limits is point-and-click.

Open the dashboard (npm run dashboard)
Navigate to Policies
Create a new policy, select rate-limit-tool
Fill in limit and windowSeconds
Assign it to the tool(s) you want to protect

You can create multiple instances of the policy with different limits - for example, a strict limit on your email tool and a more generous one on a read-only search tool.

Configuring the Policy in Code

If you prefer to manage things programmatically, you can wire up the plugin directly. Here's the full implementation for reference - this is exactly what's shipping in Guardio:

import { z } from "zod";
import type {
  PolicyPluginInterface,
  PolicyRequestContext,
  PolicyResult,
  PluginRepository,
} from "@guardiojs/guardio";

const rateLimitToolConfigSchema = z.object({
  limit: z.number().int().min(1),
  windowSeconds: z.number().int().min(1),
});

class RateLimitToolPolicyPlugin implements PolicyPluginInterface {
  readonly name = "rate-limit-tool";

  constructor(
    private readonly limit: number,
    private readonly windowSeconds: number,
    private readonly repo?: PluginRepository,
  ) {}

  async evaluate(context: PolicyRequestContext): Promise<PolicyResult> {
    if (!this.repo) return { verdict: "allow" };

    const windowMs = this.windowSeconds * 1000;
    const now = Date.now();
    const currentWindowStart = Math.floor(now / windowMs);
    const contextKey = `ratelimit:${context.toolName}`;

    const doc = await this.repo.getDocument(contextKey);
    const stored = doc?.data as { windowStart: number; count: number } | undefined;

    const isNewWindow = (stored?.windowStart ?? 0) !== currentWindowStart;
    const currentCount = isNewWindow ? 0 : (stored?.count ?? 0);

    const resetsAt = new Date((currentWindowStart + 1) * windowMs).toISOString();

    if (currentCount >= this.limit) {
      return {
        verdict: "block",
        code: "RATE_LIMIT_EXCEEDED",
        reason: `Rate limit exceeded: ${currentCount}/${this.limit} calls in ${this.windowSeconds}s window. Resets at ${resetsAt}.`,
        metadata: { currentCount, limit: this.limit, windowSeconds: this.windowSeconds, resetsAt },
      };
    }

    await this.repo.saveDocument(contextKey, {
      windowStart: currentWindowStart,
      count: currentCount + 1,
    }, doc?.id);

    return { verdict: "allow" };
  }
}

A few things worth noticing here:

Per-tool keying: the storage key is ratelimit:{toolName}, so each tool gets its own independent counter. Exceeding the limit on send_email doesn't affect search_docs.
Atomic-ish updates: the plugin reads the current count, increments, and saves in sequence. For very high-concurrency scenarios you'd want to pair this with a more robust store, but for typical agent workloads this is more than sufficient.
Clean metadata: the PolicyResult carries currentCount, limit, and resetsAt in metadata - so your event sink and dashboard can surface real usage data, not just "blocked".

A Practical Example: Protecting an Email Tool

Say your agent has access to a send_email MCP tool. You want to allow it to send at most 10 emails per hour - enough for normal operation, but a hard cap against runaway loops.

Set up Guardio with:

limit: 10
windowSeconds: 3600

Assign this policy to the send_email tool in the dashboard (or via config).

Now, when the agent calls send_email for the 11th time in the same hour, it gets back:

{
  "isError": true,
  "content": [
    {
      "type": "text",
      "text": "Rate limit exceeded: 10/10 calls in 3600s window. Resets at 2025-03-18T13:00:00.000Z."
    }
  ],
  "_guardio": {
    "action": "BLOCKED",
    "policyId": "rate-limit-tool",
    "code": "RATE_LIMIT_EXCEEDED"
  }
}

The email is never sent. The upstream server never sees the request. And in your dashboard, you have a full audit trail of every allowed and blocked call.

Stacking Policies

Rate limiting doesn't have to stand alone. Guardio evaluates policies as a chain - if any returns block, the call is stopped. This means you can combine rate-limit-tool with other policies:

deny-regex-parameter - block calls where an argument matches a pattern (e.g. block emails to *@competitor.com)
deny-tool-access - block the tool entirely for specific agents
Your own custom policy plugin - any TypeScript class that implements PolicyPluginInterface

A real setup might look like: rate limit the email tool to 10/hour, AND block any call where the recipient matches a known bad domain. Both policies apply. Either one can stop the call.

Try It

npx create-guardio

🔗 GitHub: https://github.com/radoslaw-sz/guardio

If this solves a problem you've been staring at, a ⭐ on GitHub goes a long way. And if you have a policy use case you'd like to see built in - open an issue.

🛡️ Introducing Guardio — Take Back Control of Your AI Agent's Actions

Radosław — Mon, 09 Mar 2026 11:05:33 +0000

You've built an AI Agent. It's smart, it's fast, and it connects to the real world through tools and APIs.
Then one day it sends 400 emails. Or deletes a file it shouldn't have touched. Or calls a billing endpoint with a parameter you never anticipated.
Sound familiar? This is the unsolved reliability problem of agentic AI - and it's exactly why I built Guardio.

What Is Guardio?

Guardio is a policy enforcement proxy that sits between your AI agents and the outside world. Every call your agent makes - to an MCP tool, an external API, a database - passes through Guardio first. Guardio evaluates it against your rules, and only lets it through if it's allowed.

No AI in the middle. No second-guessing. Just deterministic, guaranteed enforcement of your policies.

The Problem It Solves

Modern AI Agent frameworks give agents a lot of power. That power comes with real risks:

An agent hallucinates a parameter and calls a destructive endpoint
A retry loop causes an API to be hit thousands of times
Different agents in your system have different trust levels, but nothing enforces that
You have no audit trail of what your agent actually did

Traditional middleware can catch some of this - but it requires custom code for every project, every tool, every edge case. Guardio makes it a configuration problem, not a code problem.

How It Works

When a message flows from your agent to a tool or API, Guardio intercepts it and runs it through a policy chain:

Getting Started in One Command

npx create-guardio

Follow the prompts, and Guardio will scaffold a ready-to-run project tailored to your setup.

A Real Policy Example

Here's what a policy looks like in practice - blocking any DELETE endpoint call:

import type {
  PolicyPluginInterface,
  PolicyRequestContext,
  PolicyResult,
} from "../../interfaces/index.js";
import { logger } from "../../logger.js";

/**
 * UI schema for the generic policy summary widget (agent + tool assignment).
 * Any policy can use this in getUiSchema() to show the summary in the dashboard.
 */
export const POLICY_SUMMARY_UI_SCHEMA: object = {
  effect: {
    "ui:widget": "PolicySummary",
    "ui:readonly": true,
    "ui:label": false,
  },
};

/**
 * Deny tool access policy plugin: always blocks tool calls.
 * Which tools are subject to this policy is determined by assignment outside
 * of the plugin (e.g. which tools have this policy attached). No config.
 */
export class DenyToolAccessPolicyPlugin implements PolicyPluginInterface {
  readonly name = "deny-tool-access";

  getUiSchema(): object {
    return POLICY_SUMMARY_UI_SCHEMA;
  }

  async evaluate(context: PolicyRequestContext): Promise<PolicyResult> {
    logger.debug(
      { toolName: context.toolName, plugin: this.name },
      "Tool blocked by deny-tool-access policy",
    );
    return {
      verdict: "block",
      code: "FORBIDDEN_TOOL",
      reason: `The tool '${context.toolName}' is not allowed by policy.`,
    };
  }
}

Fully Pluggable Architecture

The best part of Guardio is that the core framework is just the engine - everything else is a plugin you own and control:

Policy - Any TypeScript class
Storage - PostgreSQL, MongoDB, Redis
Event handlers - Webhooks, Slack, Datadog

This means Guardio adapts to your stack - not the other way around.

Who Is This For?

Developers building AI Agents with MCP tools or external API integrations
Teams that need audit logs of agent actions for compliance or debugging
Anyone who's ever thought "I hope the agent doesn't do something weird in production"

What's Coming Next

This is an early release - the foundation is solid, and here's what's on the roadmap:

🔐 Per-agent permission scopes - assign different trust levels to different agents
🔌 Official plugin registry - community-contributed storage adapters and handlers
🧪 Simulation mode - dry-run your agent against policies before going live

Try It & Get Involved

🔗 GitHub: https://github.com/radoslaw-sz/guardio
📦 npm: npx create-guardio

If Guardio solves a problem you've run into, give it a ⭐ on GitHub — it genuinely helps. And if you have a use case you'd like to see supported, open an issue. The roadmap is being shaped by real problems right now.

Setting Up Your First Multi-Agent Test with Maia

Radosław — Wed, 24 Sep 2025 12:47:43 +0000

In one of the previous articles, I introduced the MAIA Framework — an open-source toolkit for testing multi-agent AI systems. We discussed what it does, why it exists, and some of its key features such as assertions and validators.

Now it’s time to get practical.

In this post, we’ll set up our first test with MAIA, using both assertions and validators.

Note: I am assuming you have your Python project set up.

Installing the framework.

pip install maia-test-framework

Create a simple test file

import pytest
from maia_test_framework.testing.base import MaiaTest
from maia_test_framework.providers.generic_lite_llm import GenericLiteLLMProvider

class TestContentAssertions(MaiaTest):
    def setup_agents(self):
        self.create_agent(
            name="Alice",
            provider=GenericLiteLLMProvider(config={
                "model": "ollama/mistral",
                "api_base": "http://localhost:11434"
            }),
            system_message="You are a helpful AI assistant. You will follow user instructions precisely."
        )
    @pytest.mark.asyncio
    async def test_basic(self):
        session = self.create_session(["Alice"])

        await session.user_says("Please describe the usual weather in London in July, including temperature and conditions.")

        response = await session.agent_responds("Alice")

        assert "sunny" in response.content

Breaking down the test

Creating an agent

First, we define the agent that will be under test. The function takes a few key parameters:

name- a unique identifier for the agent in the test.
provider- specifies which model the agent should use. MAIA provides many integrations (e.g., LiteLLM, CrewAI), and you can also create your own.
system_message- the system prompt describing what the agent is.

Creating a session

To use the agent, you first need to create a session. In MAIA, a Session groups agents into communication channels.

For simple agentic systems, you’ll likely use just one session, since all agents can talk to each other freely.

Simulating a simple conversation.

Next, we simulate a short conversation between the user and the agent. The user asks for the weather, and the agent responds.

Finally, we check the content of the agent’s reply for specific patterns (in this case, whether it mentions “sunny”).

This is a very simple example — now let’s extend it with assertions and validators.

Adding Maia assertion

Assertions can be attached to a session, so every message is automatically checked.

For example, the built-in assert_professional_tone ensures that responses don’t contain unprofessional language (such as lol _ or _u r).

Here is the example of using such assertion:

from maia_test_framework.testing.assertions.content_patterns import assert_professional_tone

...

session = self.create_session(["Alice"], assertions=[assert_professional_tone])

You can also define your own custom assertions (we’ll cover this in a future article).

Adding Maia validator

Validators check the overall session instead of individual messages. They’re automatically run at the end of a test.

For example, this validator ensures that Alice sends at most one message in the session:

from maia_test_framework.testing.validators.performance import performance_validator

...

session = self.create_session(["Alice"], validators=[agent_message_count_validator(agent_name="Alice", max_messages=1)])

Visualization

Using Maia assertions and validators lets you to see all results in a nice format in a dashboard.

✨ That’s it! You now have your first working test in MAIA, complete with assertions and validators.

Why Testing Multi-Agent AI Systems is Hard (and Why It Matters)

Radosław — Tue, 16 Sep 2025 06:48:32 +0000

A new era of AI collaboration.

Not long ago, interacting with an AI meant talking to a single assistant. You asked a question, it gave an answer. Simple.

But the landscape is shifting fast. Instead of one assistant doing everything, we’re now seeing multi-agent systems: groups of AI agents working together, each with their own role.

This shift unlocks exciting possibilities — but also a big challenge: how do we test if these agents actually work as intended?

Why multi-agent systems are taking off

There are good reasons for the move toward multi-agent setups:

Specialization: Just like humans, agents can become experts at different tasks (e.g., research, planning, coding).
Parallelization: Multiple agents can work at the same time, speeding up workflows.
Emergent collaboration: By talking to each other, agents can generate ideas or solutions that a single agent wouldn’t reach alone.

Examples are already here:

AI research assistants that brainstorm, fact-check, and summarize.
Customer service bots where one agent answers, another verifies tone and accuracy.
AI “companies” where planning, execution, and oversight are split across agents.

It’s powerful. But it is also complex.

The hidden challenge: testing multi-agent AI

Traditional software engineering has decades of experience in testing. We have unit tests for small functions, integration tests for bigger systems, and QA teams for real-world scenarios.

But AI agents — and especially multi-agent systems — break those familiar patterns. Here’s why:

Emergent behavior

When two or more agents interact, new and unexpected behaviors can emerge.

Maybe two agents start “arguing” endlessly instead of solving the task.
Maybe an agent interprets another’s response in an unintended way.

These weren’t explicitly coded; they emerged from the interaction. And that makes them hard to predict.

Unpredictability

Even single AI agents can behave differently when given the same input twice. Add multiple agents, and this unpredictability compounds.

You might run the same test ten times and get ten different results. Which one is “correct”?

Interoperability

Multi-agent systems often combine different providers or frameworks:

One agent powered by OpenAI.
Another using Anthropic.
Orchestrated through LiteLLM or CrewAI.

Each has different capabilities and limits. Getting them to play nicely together is tricky.

Evaluation complexity

How do you even define success in a multi-agent system? It’s not as simple as: “Did the agent respond?”

Instead, questions look more like:

Did the group reach the intended outcome?
Did they avoid hallucinations or contradictions?
Was the conversation efficient, or did it spiral into loops?

Evaluation itself becomes a challenge.

Why this matters now

You might wonder: “Sure, it’s complicated… but why does this matter?”

Here’s the thing: as multi-agent AI systems leave research labs and enter real-world applications, reliability and trust become non-negotiable.

Without testing, you risk:

Wrong or misleading outputs (dangerous in healthcare, finance, law).
Endless loops or stalled conversations.
Coordination failures that look fine at first but lead to errors later.

Think about it: would you deploy a team of human employees without a way to evaluate their performance? Of course not. The same should apply to AI “teams.”

A new category of tools is needed

In traditional software, we didn’t get to where we are without tools. Unit testing frameworks (like JUnit or pytest), CI/CD pipelines, QA automation — they became the backbone of trustworthy software development.

AI agents (especially multi-agent systems) need the same kind of foundation.

We need to have possibility to:

Set up agents from different providers.
Simulate conversations between agents and with users.
Orchestrate workflows when multiple agents collaborate.
Judge success or failure against predefined criteria.
Validate outcomes at both the single-message and whole-conversation levels.

Testing isn’t optional — it’s the foundation of trust

The story of software is the story of building trust through testing. We no longer ship code without automated tests, integration pipelines, and validation layers.

Multi-agent AI systems are no different. If anything, the need is greater, because:

Behavior is less predictable.
Interactions are more complex.
Stakes are higher as AI systems handle sensitive tasks.

By treating testing as a first-class citizen in AI development, we can move faster, deploy safer, and unlock the real potential of collaborative AI.

How to catch all above?

In the previous post, we explored the basics of Maia - the test framework for multi-agent AI systems. In this post we described what Maia tries to solve.
In the next articles we will back to practical examples with Maia to show you potential of that framework.

Stay tuned!

Maia - Multi-AI Agent Test Framework

Radosław — Wed, 03 Sep 2025 12:00:37 +0000

Hey Dev community!
I want to share with you my recent open-source project which I am working on - Test Framework for testing Multi-AI Agent systems.

Website: Maia

Framework is written in Python and uses standard pytest approach.

The main features are:

Multi-Agent Simulation - Simulate conversations and interactions between multiple AI agents
Extensible Provider Model - Easily integrate with various AI model providers (e.g., LiteLLM, LangChain, CrewAI)
Built-in Assertions - A suite of assertions to verify agent behavior, including content analysis and participation checks
Dashboard for visualization - NextJS application to show test results for checking and debugging purpose.

You can use the framework for testing such scenarios like:

asking various models for the same thing and check the results
broadcasting a prompt and wait for the completion without user intervention (using not only CrewAI but also other providers!)
simulate tool calling, so checking if your AI Agent uses your tool in a proper way
much much more

As an example, please see how easy is to write a test:

class TestConversationSessions(MaiaTest):
    def setup_agents(self):
        self.create_agent(
            name="Alice",
            provider=GenericLiteLLMProvider(config={
                "model": "ollama/mistral",
                "api_base": "http://localhost:11434"
            }),
            system_message="You are a weather assistant. Only describe the weather.",
        )

        self.create_agent(
            name="Bob",
            provider=GenericLiteLLMProvider(config={
                "model": "ollama/mistral",
                "api_base": "http://localhost:11434"
            }),
            system_message="You are an assistant who only suggests clothing.",
        )

@pytest.mark.asyncio
  async def test_agent_to_agent_conversation(self):
      session = self.create_session(["Alice", "Bob"])

      # Alice initiates conversation with Bob
      await session.agent_says("Alice", "Bob", "Given the weather: rainy and 20 degrees Celsius, what clothes should I wear?")
      response = await session.agent_responds("Bob")
      assert_agent_participated(session, "Bob")

      # Bob responds back to Alice
      await session.agent_says("Bob", "Alice", f"Based on my info: {response.content}")
      response = await session.agent_responds("Alice")
      assert_agent_participated(session, "Alice")

Everything is open-source and it provides basic dashboard, where you can see your tests results, including timeline, statuses, durations etc.

You can also see the assertions from the test:

The framework itself is in MVP phase, so more and more features are on the way.

Official website is here: Maia Framework
Github: Maia
PyPI: maia-test-framework

Looking forward for your feedback!