<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Radosław</title>
    <description>The latest articles on DEV Community by Radosław (@radoslawsz).</description>
    <link>https://dev.to/radoslawsz</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3477033%2F25a7f5f8-f471-4b80-a432-b4a582b8d6d0.png</url>
      <title>DEV Community: Radosław</title>
      <link>https://dev.to/radoslawsz</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/radoslawsz"/>
    <language>en</language>
    <item>
      <title>🛡️ How to Stop Your AI Agent from Sending 10,000 Emails in a Loop</title>
      <dc:creator>Radosław</dc:creator>
      <pubDate>Wed, 18 Mar 2026 11:37:13 +0000</pubDate>
      <link>https://dev.to/radoslawsz/how-to-stop-your-ai-agent-from-sending-10000-emails-in-a-loop-1d26</link>
      <guid>https://dev.to/radoslawsz/how-to-stop-your-ai-agent-from-sending-10000-emails-in-a-loop-1d26</guid>
      <description>&lt;p&gt;You ship an AI agent that can send emails. It works great in testing.&lt;/p&gt;

&lt;p&gt;Then one night, the agent hits a retry loop. A flaky API responds slowly, the agent interprets the delay as failure, and it tries again. And again. By morning, a single user has received 847 confirmation emails. Your support inbox is on fire. Your API provider has suspended your account.&lt;/p&gt;

&lt;p&gt;This isn't a hypothetical. It's the kind of thing that happens when you give agents real tools and don't put guardrails around how often they can use them.&lt;/p&gt;

&lt;p&gt;In the &lt;a href="https://dev.to/radoslawsz/introducing-guardio-take-back-control-of-your-ai-agents-actions-2mik"&gt;first article&lt;/a&gt;, I introduced &lt;strong&gt;Guardio&lt;/strong&gt; - a policy enforcement proxy that sits between your AI agent and the outside world. Today, I want to show you one of its newest built-in policies: &lt;strong&gt;rate limiting&lt;/strong&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Rate Limiting Is Different for AI Agents
&lt;/h2&gt;

&lt;p&gt;With traditional APIs, rate limiting is simple: a client sends too many requests, the server returns a &lt;code&gt;429&lt;/code&gt;, and the client backs off. Problem solved.&lt;/p&gt;

&lt;p&gt;AI agents are messier.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;They can retry silently without you noticing until it's too late&lt;/li&gt;
&lt;li&gt;They don't always respect error signals the way a human-coded client would&lt;/li&gt;
&lt;li&gt;A single agent decision (like "send a daily summary") can be triggered hundreds of times if the agent's context gets corrupted or the loop condition misbehaves&lt;/li&gt;
&lt;li&gt;Different tools deserve different limits - spamming a read-only knowledge base is annoying; spamming a billing endpoint is catastrophic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You need rate limiting that is &lt;strong&gt;per-tool&lt;/strong&gt;, &lt;strong&gt;deterministic&lt;/strong&gt;, and &lt;strong&gt;enforced outside the agent&lt;/strong&gt; - so the agent literally cannot exceed it, regardless of how it behaves.&lt;/p&gt;

&lt;p&gt;That's exactly what Guardio's &lt;code&gt;rate-limit-tool&lt;/code&gt; policy plugin does.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Recap: What Is Guardio?
&lt;/h2&gt;

&lt;p&gt;Guardio is a proxy you run alongside your AI agent. Every tool call your agent makes (to an MCP server, an external API, a database) passes through Guardio first. Guardio evaluates it against your configured policies, and only forwards it if it's allowed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AI Agent → Guardio → MCP Tool / External API
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No AI in the enforcement path. No prompt engineering. Just hard rules.&lt;/p&gt;




&lt;h2&gt;
  
  
  Setting Up Guardio
&lt;/h2&gt;

&lt;p&gt;If you haven't set it up yet, one command scaffolds a full project:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx create-guardio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You'll be prompted to choose:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A project directory name&lt;/li&gt;
&lt;li&gt;The HTTP port Guardio will listen on (default: &lt;code&gt;3939&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;A storage backend (SQLite is the easiest to start with)&lt;/li&gt;
&lt;li&gt;Whether to install the dashboard UI&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Once scaffolded:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;guardio-project
npm &lt;span class="nb"&gt;install
&lt;/span&gt;npm run guardio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then point your AI agent or MCP client at &lt;code&gt;http://127.0.0.1:3939&lt;/code&gt; instead of directly at your tools.&lt;/p&gt;

&lt;p&gt;Your config lives in &lt;code&gt;guardio.config.ts&lt;/code&gt;. Here's a minimal example with an MCP tool connected:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="c1"&gt;// guardio.config.ts&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;GuardioConfig&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@guardiojs/guardio&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;GuardioConfig&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;client&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="na"&gt;port&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;3939&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="na"&gt;servers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;email-tool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;url&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;https://your-mcp-email-server.com/sse&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="na"&gt;plugins&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;storage&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;sqlite&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;config&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;database&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;guardio.sqlite&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
  &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="k"&gt;default&lt;/span&gt; &lt;span class="nx"&gt;config&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your agent connects to &lt;code&gt;http://127.0.0.1:3939/email-tool/sse&lt;/code&gt; - Guardio is now in the middle.&lt;/p&gt;




&lt;h2&gt;
  
  
  Introducing &lt;code&gt;rate-limit-tool&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;The &lt;code&gt;rate-limit-tool&lt;/code&gt; policy plugin enforces a maximum number of calls to any given tool within a fixed time window. It's a built-in plugin shipped with Guardio - no extra installation needed.&lt;/p&gt;

&lt;p&gt;The configuration is intentionally simple:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Field&lt;/th&gt;
&lt;th&gt;Type&lt;/th&gt;
&lt;th&gt;Description&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;limit&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;td&gt;Maximum calls allowed in the window&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;windowSeconds&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;number&lt;/td&gt;
&lt;td&gt;Duration of the time window, in seconds&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;For example: &lt;code&gt;limit: 5, windowSeconds: 60&lt;/code&gt; means no more than 5 calls per minute.&lt;/p&gt;

&lt;h3&gt;
  
  
  How It Works Under the Hood
&lt;/h3&gt;

&lt;p&gt;The plugin uses &lt;strong&gt;fixed time windows&lt;/strong&gt; - it doesn't slide. If your window is 60 seconds, windows are &lt;code&gt;0:00–1:00&lt;/code&gt;, &lt;code&gt;1:00–2:00&lt;/code&gt;, etc. Simple and predictable.&lt;/p&gt;

&lt;p&gt;State (current count and window start) is stored in the &lt;code&gt;PluginRepository&lt;/code&gt; - meaning it persists across requests and survives restarts if you're using SQLite or PostgreSQL. If no storage is configured, the plugin &lt;strong&gt;fails open&lt;/strong&gt; (allows all calls) and logs a warning. This is a deliberate design choice: Guardio doesn't silently break your agent in misconfigured environments.&lt;/p&gt;

&lt;p&gt;When the limit is exceeded, the agent receives a structured block response - not a raw error, but a clean JSON-RPC success result with human-readable reason:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight console"&gt;&lt;code&gt;&lt;span class="go"&gt;Rate limit exceeded: 5/5 calls in 60s window. Resets at 2025-03-18T12:01:00.000Z.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent frameworks won't choke on this. They'll get a clear message they can surface or log.&lt;/p&gt;




&lt;h2&gt;
  
  
  Configuring the Policy via the Dashboard
&lt;/h2&gt;

&lt;p&gt;If you installed the Guardio dashboard, configuring rate limits is point-and-click.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the dashboard (&lt;code&gt;npm run dashboard&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Navigate to &lt;strong&gt;Policies&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Create a new policy, select &lt;code&gt;rate-limit-tool&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Fill in &lt;code&gt;limit&lt;/code&gt; and &lt;code&gt;windowSeconds&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Assign it to the tool(s) you want to protect&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You can create multiple instances of the policy with different limits - for example, a strict limit on your email tool and a more generous one on a read-only search tool.&lt;/p&gt;




&lt;h2&gt;
  
  
  Configuring the Policy in Code
&lt;/h2&gt;

&lt;p&gt;If you prefer to manage things programmatically, you can wire up the plugin directly. Here's the full implementation for reference - this is exactly what's shipping in Guardio:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;zod&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;PolicyPluginInterface&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;PolicyRequestContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;PolicyResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;PluginRepository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;@guardiojs/guardio&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;rateLimitToolConfigSchema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;object&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
  &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
  &lt;span class="na"&gt;windowSeconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;z&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;number&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;});&lt;/span&gt;

&lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RateLimitToolPolicyPlugin&lt;/span&gt; &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="nx"&gt;PolicyPluginInterface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;rate-limit-tool&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;constructor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;windowSeconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;private&lt;/span&gt; &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;?:&lt;/span&gt; &lt;span class="nx"&gt;PluginRepository&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PolicyRequestContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PolicyResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;allow&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;windowMs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;windowSeconds&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;currentWindowStart&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;Math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;floor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;now&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nx"&gt;windowMs&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;contextKey&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;`ratelimit:&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;`&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;doc&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contextKey&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;stored&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;data&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;windowStart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nl"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kr"&gt;number&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="kc"&gt;undefined&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;isNewWindow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;windowStart&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!==&lt;/span&gt; &lt;span class="nx"&gt;currentWindowStart&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;currentCount&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nx"&gt;isNewWindow&lt;/span&gt; &lt;span class="p"&gt;?&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;stored&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;count&lt;/span&gt; &lt;span class="o"&gt;??&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;resetsAt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;Date&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="nx"&gt;currentWindowStart&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="nx"&gt;windowMs&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;toISOString&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

    &lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;currentCount&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="na"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;block&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;RATE_LIMIT_EXCEEDED&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`Rate limit exceeded: &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;currentCount&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;/&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; calls in &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;windowSeconds&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;s window. Resets at &lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;resetsAt&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;currentCount&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;limit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;windowSeconds&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;windowSeconds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nx"&gt;resetsAt&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;};&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;repo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;saveDocument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;contextKey&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;windowStart&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;currentWindowStart&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;currentCount&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="nx"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;?.&lt;/span&gt;&lt;span class="nx"&gt;id&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;allow&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt; &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few things worth noticing here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Per-tool keying:&lt;/strong&gt; the storage key is &lt;code&gt;ratelimit:{toolName}&lt;/code&gt;, so each tool gets its own independent counter. Exceeding the limit on &lt;code&gt;send_email&lt;/code&gt; doesn't affect &lt;code&gt;search_docs&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Atomic-ish updates:&lt;/strong&gt; the plugin reads the current count, increments, and saves in sequence. For very high-concurrency scenarios you'd want to pair this with a more robust store, but for typical agent workloads this is more than sufficient.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Clean metadata:&lt;/strong&gt; the &lt;code&gt;PolicyResult&lt;/code&gt; carries &lt;code&gt;currentCount&lt;/code&gt;, &lt;code&gt;limit&lt;/code&gt;, and &lt;code&gt;resetsAt&lt;/code&gt; in &lt;code&gt;metadata&lt;/code&gt; - so your event sink and dashboard can surface real usage data, not just "blocked".&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  A Practical Example: Protecting an Email Tool
&lt;/h2&gt;

&lt;p&gt;Say your agent has access to a &lt;code&gt;send_email&lt;/code&gt; MCP tool. You want to allow it to send at most &lt;strong&gt;10 emails per hour&lt;/strong&gt; - enough for normal operation, but a hard cap against runaway loops.&lt;/p&gt;

&lt;p&gt;Set up Guardio with:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
&lt;span class="na"&gt;windowSeconds&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3600&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Assign this policy to the &lt;code&gt;send_email&lt;/code&gt; tool in the dashboard (or via config).&lt;/p&gt;

&lt;p&gt;Now, when the agent calls &lt;code&gt;send_email&lt;/code&gt; for the 11th time in the same hour, it gets back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"isError"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"text"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Rate limit exceeded: 10/10 calls in 3600s window. Resets at 2025-03-18T13:00:00.000Z."&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"_guardio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"action"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BLOCKED"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"policyId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rate-limit-tool"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"code"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"RATE_LIMIT_EXCEEDED"&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The email is never sent. The upstream server never sees the request. And in your dashboard, you have a full audit trail of every allowed and blocked call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Stacking Policies
&lt;/h2&gt;

&lt;p&gt;Rate limiting doesn't have to stand alone. Guardio evaluates policies as a &lt;strong&gt;chain&lt;/strong&gt; - if any returns &lt;code&gt;block&lt;/code&gt;, the call is stopped. This means you can combine &lt;code&gt;rate-limit-tool&lt;/code&gt; with other policies:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;deny-regex-parameter&lt;/code&gt;&lt;/strong&gt; - block calls where an argument matches a pattern (e.g. block emails to &lt;code&gt;*@competitor.com&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;deny-tool-access&lt;/code&gt;&lt;/strong&gt; - block the tool entirely for specific agents&lt;/li&gt;
&lt;li&gt;Your own &lt;strong&gt;custom policy plugin&lt;/strong&gt; - any TypeScript class that implements &lt;code&gt;PolicyPluginInterface&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A real setup might look like: rate limit the email tool to 10/hour, AND block any call where the recipient matches a known bad domain. Both policies apply. Either one can stop the call.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx create-guardio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;🔗 GitHub: &lt;a href="https://github.com/radoslaw-sz/guardio" rel="noopener noreferrer"&gt;https://github.com/radoslaw-sz/guardio&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If this solves a problem you've been staring at, a ⭐ on GitHub goes a long way. And if you have a policy use case you'd like to see built in - open an issue.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>opensource</category>
      <category>typescript</category>
    </item>
    <item>
      <title>🛡️ Introducing Guardio — Take Back Control of Your AI Agent's Actions</title>
      <dc:creator>Radosław</dc:creator>
      <pubDate>Mon, 09 Mar 2026 11:05:33 +0000</pubDate>
      <link>https://dev.to/radoslawsz/introducing-guardio-take-back-control-of-your-ai-agents-actions-2mik</link>
      <guid>https://dev.to/radoslawsz/introducing-guardio-take-back-control-of-your-ai-agents-actions-2mik</guid>
      <description>&lt;p&gt;You've built an AI Agent. It's smart, it's fast, and it connects to the real world through tools and APIs.&lt;br&gt;
Then one day it sends 400 emails. Or deletes a file it shouldn't have touched. Or calls a billing endpoint with a parameter you never anticipated.&lt;br&gt;
Sound familiar? This is the unsolved reliability problem of agentic AI - and it's exactly why I built Guardio.&lt;/p&gt;
&lt;h2&gt;
  
  
  What Is Guardio?
&lt;/h2&gt;

&lt;p&gt;Guardio is a policy enforcement proxy that sits between your AI agents and the outside world. Every call your agent makes - to an MCP tool, an external API, a database - passes through Guardio first. Guardio evaluates it against your rules, and only lets it through if it's allowed.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;No AI in the middle. No second-guessing. Just deterministic, guaranteed enforcement of your policies.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6xn4dm9u2c8kiubs1o0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc6xn4dm9u2c8kiubs1o0.png" alt="Guardio Architecture"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Problem It Solves
&lt;/h2&gt;

&lt;p&gt;Modern AI Agent frameworks give agents a lot of power. That power comes with real risks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;An agent hallucinates a parameter and calls a destructive endpoint&lt;/li&gt;
&lt;li&gt;A retry loop causes an API to be hit thousands of times&lt;/li&gt;
&lt;li&gt;Different agents in your system have different trust levels, but nothing enforces that&lt;/li&gt;
&lt;li&gt;You have no audit trail of what your agent actually did&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Traditional middleware can catch some of this - but it requires custom code for every project, every tool, every edge case. Guardio makes it a configuration problem, not a code problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;When a message flows from your agent to a tool or API, Guardio intercepts it and runs it through a &lt;strong&gt;policy chain&lt;/strong&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sp3lw6pt5400sza7f1x.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0sp3lw6pt5400sza7f1x.png" alt="Policy events"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Getting Started in One Command
&lt;/h2&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx create-guardio
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Follow the prompts, and Guardio will scaffold a ready-to-run project tailored to your setup.&lt;/p&gt;
&lt;h2&gt;
  
  
  A Real Policy Example
&lt;/h2&gt;

&lt;p&gt;Here's what a policy looks like in practice - blocking any DELETE endpoint call:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight typescript"&gt;&lt;code&gt;&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="kd"&gt;type&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="nx"&gt;PolicyPluginInterface&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;PolicyRequestContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="nx"&gt;PolicyResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../../interfaces/index.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;import&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;logger&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;from&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;../../logger.js&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * UI schema for the generic policy summary widget (agent + tool assignment).
 * Any policy can use this in getUiSchema() to show the summary in the dashboard.
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;POLICY_SUMMARY_UI_SCHEMA&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="na"&gt;effect&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ui:widget&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;PolicySummary&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ui:readonly&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;ui:label&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
  &lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="cm"&gt;/**
 * Deny tool access policy plugin: always blocks tool calls.
 * Which tools are subject to this policy is determined by assignment outside
 * of the plugin (e.g. which tools have this policy attached). No config.
 */&lt;/span&gt;
&lt;span class="k"&gt;export&lt;/span&gt; &lt;span class="kd"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;DenyToolAccessPolicyPlugin&lt;/span&gt; &lt;span class="k"&gt;implements&lt;/span&gt; &lt;span class="nx"&gt;PolicyPluginInterface&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="k"&gt;readonly&lt;/span&gt; &lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;deny-tool-access&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

  &lt;span class="nf"&gt;getUiSchema&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt; &lt;span class="nx"&gt;object&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nx"&gt;POLICY_SUMMARY_UI_SCHEMA&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;

  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="nf"&gt;evaluate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;PolicyRequestContext&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="nb"&gt;Promise&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="nx"&gt;PolicyResult&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nx"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;debug&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
      &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="na"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="na"&gt;plugin&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;this&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;name&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;Tool blocked by deny-tool-access policy&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
      &lt;span class="na"&gt;verdict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;block&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="s2"&gt;FORBIDDEN_TOOL&lt;/span&gt;&lt;span class="dl"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;`The tool '&lt;/span&gt;&lt;span class="p"&gt;${&lt;/span&gt;&lt;span class="nx"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;toolName&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;' is not allowed by policy.`&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Fully Pluggable Architecture
&lt;/h2&gt;

&lt;p&gt;The best part of Guardio is that the core framework is just the engine - everything else is a plugin you own and control:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;em&gt;Policy&lt;/em&gt; - Any TypeScript class&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Storage&lt;/em&gt; - PostgreSQL, MongoDB, Redis&lt;/li&gt;
&lt;li&gt;
&lt;em&gt;Event handlers&lt;/em&gt; - Webhooks, Slack, Datadog&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This means Guardio adapts to your stack - not the other way around.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who Is This For?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Developers building AI Agents with MCP tools or external API integrations&lt;/li&gt;
&lt;li&gt;Teams that need audit logs of agent actions for compliance or debugging&lt;/li&gt;
&lt;li&gt;Anyone who's ever thought "I hope the agent doesn't do something weird in production"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What's Coming Next
&lt;/h2&gt;

&lt;p&gt;This is an early release - the foundation is solid, and here's what's on the roadmap:&lt;/p&gt;

&lt;p&gt;🔐 Per-agent permission scopes - assign different trust levels to different agents&lt;br&gt;
🔌 Official plugin registry - community-contributed storage adapters and handlers&lt;br&gt;
🧪 Simulation mode - dry-run your agent against policies before going live&lt;/p&gt;

&lt;p&gt;

  &lt;iframe src="https://www.youtube.com/embed/3mkNBbz_u5U"&gt;
  &lt;/iframe&gt;


&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It &amp;amp; Get Involved
&lt;/h2&gt;

&lt;p&gt;🔗 GitHub: &lt;a href="https://github.com/radoslaw-sz/guardio" rel="noopener noreferrer"&gt;https://github.com/radoslaw-sz/guardio&lt;/a&gt;&lt;br&gt;
📦 npm: npx create-guardio&lt;/p&gt;

&lt;p&gt;If Guardio solves a problem you've run into, give it a ⭐ on GitHub — it genuinely helps. And if you have a use case you'd like to see supported, open an issue. The roadmap is being shaped by real problems right now.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>security</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Setting Up Your First Multi-Agent Test with Maia</title>
      <dc:creator>Radosław</dc:creator>
      <pubDate>Wed, 24 Sep 2025 12:47:43 +0000</pubDate>
      <link>https://dev.to/radoslawsz/setting-up-your-first-multi-agent-test-with-maia-4kab</link>
      <guid>https://dev.to/radoslawsz/setting-up-your-first-multi-agent-test-with-maia-4kab</guid>
      <description>&lt;p&gt;In one of the previous articles, I introduced the MAIA Framework — an open-source toolkit for testing multi-agent AI systems. We discussed what it does, why it exists, and some of its key features such as assertions and validators.&lt;/p&gt;

&lt;p&gt;Now it’s time to get practical.&lt;/p&gt;

&lt;p&gt;In this post, we’ll set up our first test with MAIA, using both assertions and validators.&lt;/p&gt;

&lt;p&gt;Note: I am assuming you have your Python project set up.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing the framework.
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;maia-test-framework
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Create a simple test file
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pytest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;maia_test_framework.testing.base&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MaiaTest&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;maia_test_framework.providers.generic_lite_llm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;GenericLiteLLMProvider&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TestContentAssertions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MaiaTest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setup_agents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;GenericLiteLLMProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama/mistral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}),&lt;/span&gt;
            &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful AI assistant. You will follow user instructions precisely.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nd"&gt;@pytest.mark.asyncio&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_basic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;user_says&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please describe the usual weather in London in July, including temperature and conditions.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_responds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;assert&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sunny&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Breaking down the test
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Creating an agent
&lt;/h3&gt;

&lt;p&gt;First, we define the agent that will be under test. The function takes a few key parameters:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;name&lt;/strong&gt;- a unique identifier for the agent in the test.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;provider&lt;/strong&gt;- specifies which model the agent should use. MAIA provides many integrations (e.g., LiteLLM, CrewAI), and you can also create your own.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;system_message&lt;/strong&gt;- the system prompt describing what the agent is.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Creating a session
&lt;/h3&gt;

&lt;p&gt;To use the agent, you first need to create a session. In MAIA, a Session groups agents into communication channels.&lt;/p&gt;

&lt;p&gt;For simple agentic systems, you’ll likely use just one session, since all agents can talk to each other freely.&lt;/p&gt;

&lt;h3&gt;
  
  
  Simulating a simple conversation.
&lt;/h3&gt;

&lt;p&gt;Next, we simulate a short conversation between the user and the agent. The user asks for the weather, and the agent responds.&lt;/p&gt;

&lt;p&gt;Finally, we check the content of the agent’s reply for specific patterns (in this case, whether it mentions “sunny”).&lt;/p&gt;

&lt;p&gt;This is a very simple example — now let’s extend it with assertions and validators.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding Maia assertion
&lt;/h3&gt;

&lt;p&gt;Assertions can be attached to a session, so every message is automatically checked.&lt;/p&gt;

&lt;p&gt;For example, the built-in assert_professional_tone ensures that responses don’t contain unprofessional language (such as &lt;em&gt;lol _ or _u r&lt;/em&gt;).&lt;/p&gt;

&lt;p&gt;Here is the example of using such assertion:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;maia_test_framework.testing.assertions.content_patterns&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;assert_professional_tone&lt;/span&gt;

&lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;assertions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;assert_professional_tone&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can also define your own custom assertions (we’ll cover this in a future article).&lt;/p&gt;

&lt;h3&gt;
  
  
  Adding Maia validator
&lt;/h3&gt;

&lt;p&gt;Validators check the overall session instead of individual messages. They’re automatically run at the end of a test.&lt;/p&gt;

&lt;p&gt;For example, this validator ensures that Alice sends at most one message in the session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;maia_test_framework.testing.validators.performance&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;performance_validator&lt;/span&gt;

&lt;span class="bp"&gt;...&lt;/span&gt;

&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;validators&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;agent_message_count_validator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;agent_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Visualization
&lt;/h2&gt;

&lt;p&gt;Using Maia assertions and validators lets you to see all results in a nice format in a dashboard.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjyol5jx7guifxygl8mgk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjyol5jx7guifxygl8mgk.png" alt="Assertions" width="800" height="305"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feqhteg9095k1qhzf9j8g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feqhteg9095k1qhzf9j8g.png" alt="Validators" width="800" height="293"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;✨ That’s it! You now have your first working test in MAIA, complete with assertions and validators.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>testing</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Why Testing Multi-Agent AI Systems is Hard (and Why It Matters)</title>
      <dc:creator>Radosław</dc:creator>
      <pubDate>Tue, 16 Sep 2025 06:48:32 +0000</pubDate>
      <link>https://dev.to/radoslawsz/why-testing-multi-agent-ai-systems-is-hard-and-why-it-matters-2d3l</link>
      <guid>https://dev.to/radoslawsz/why-testing-multi-agent-ai-systems-is-hard-and-why-it-matters-2d3l</guid>
      <description>&lt;h2&gt;
  
  
  A new era of AI collaboration.
&lt;/h2&gt;

&lt;p&gt;Not long ago, interacting with an AI meant talking to a single assistant. You asked a question, it gave an answer. Simple.&lt;/p&gt;

&lt;p&gt;But the landscape is shifting fast. Instead of one assistant doing everything, we’re now seeing multi-agent systems: groups of AI agents working together, each with their own role.&lt;/p&gt;

&lt;p&gt;This shift unlocks exciting possibilities — but also a big challenge: how do we test if these agents actually work as intended?&lt;/p&gt;

&lt;h2&gt;
  
  
  Why multi-agent systems are taking off
&lt;/h2&gt;

&lt;p&gt;There are good reasons for the move toward multi-agent setups:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Specialization: Just like humans, agents can become experts at different tasks (e.g., research, planning, coding).&lt;/li&gt;
&lt;li&gt;Parallelization: Multiple agents can work at the same time, speeding up workflows.&lt;/li&gt;
&lt;li&gt;Emergent collaboration: By talking to each other, agents can generate ideas or solutions that a single agent wouldn’t reach alone.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Examples are already here:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI research assistants that brainstorm, fact-check, and summarize.&lt;/li&gt;
&lt;li&gt;Customer service bots where one agent answers, another verifies tone and accuracy.&lt;/li&gt;
&lt;li&gt;AI “companies” where planning, execution, and oversight are split across agents.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It’s powerful. But it is also complex.&lt;/p&gt;

&lt;h2&gt;
  
  
  The hidden challenge: testing multi-agent AI
&lt;/h2&gt;

&lt;p&gt;Traditional software engineering has decades of experience in testing. We have unit tests for small functions, integration tests for bigger systems, and QA teams for real-world scenarios.&lt;/p&gt;

&lt;p&gt;But AI agents — and especially multi-agent systems — break those familiar patterns. Here’s why:&lt;/p&gt;

&lt;h3&gt;
  
  
  Emergent behavior
&lt;/h3&gt;

&lt;p&gt;When two or more agents interact, new and unexpected behaviors can emerge.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Maybe two agents start “arguing” endlessly instead of solving the task.&lt;/li&gt;
&lt;li&gt;Maybe an agent interprets another’s response in an unintended way.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These weren’t explicitly coded; they emerged from the interaction. And that makes them hard to predict.&lt;/p&gt;

&lt;h3&gt;
  
  
  Unpredictability
&lt;/h3&gt;

&lt;p&gt;Even single AI agents can behave differently when given the same input twice. Add multiple agents, and this unpredictability compounds.&lt;/p&gt;

&lt;p&gt;You might run the same test ten times and get ten different results. Which one is “correct”?&lt;/p&gt;

&lt;h3&gt;
  
  
  Interoperability
&lt;/h3&gt;

&lt;p&gt;Multi-agent systems often combine different providers or frameworks:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One agent powered by OpenAI.&lt;/li&gt;
&lt;li&gt;Another using Anthropic.&lt;/li&gt;
&lt;li&gt;Orchestrated through LiteLLM or CrewAI.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each has different capabilities and limits. Getting them to play nicely together is tricky.&lt;/p&gt;

&lt;h3&gt;
  
  
  Evaluation complexity
&lt;/h3&gt;

&lt;p&gt;How do you even define success in a multi-agent system? It’s not as simple as: “&lt;em&gt;Did the agent respond?&lt;/em&gt;”&lt;/p&gt;

&lt;p&gt;Instead, questions look more like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;em&gt;Did the group reach the intended outcome?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Did they avoid hallucinations or contradictions?&lt;/em&gt;&lt;/li&gt;
&lt;li&gt;&lt;em&gt;Was the conversation efficient, or did it spiral into loops?&lt;/em&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Evaluation itself becomes a challenge.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why this matters now
&lt;/h2&gt;

&lt;p&gt;You might wonder: “Sure, it’s complicated… but why does this matter?”&lt;/p&gt;

&lt;p&gt;Here’s the thing: as multi-agent AI systems leave research labs and enter real-world applications, reliability and trust become non-negotiable.&lt;/p&gt;

&lt;p&gt;Without testing, you risk:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Wrong or misleading outputs (dangerous in healthcare, finance, law).&lt;/li&gt;
&lt;li&gt;Endless loops or stalled conversations.&lt;/li&gt;
&lt;li&gt;Coordination failures that look fine at first but lead to errors later.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Think about it: would you deploy a team of human employees without a way to evaluate their performance? Of course not. The same should apply to AI “teams.”&lt;/p&gt;

&lt;h2&gt;
  
  
  A new category of tools is needed
&lt;/h2&gt;

&lt;p&gt;In traditional software, we didn’t get to where we are without tools. Unit testing frameworks (like JUnit or pytest), CI/CD pipelines, QA automation — they became the backbone of trustworthy software development.&lt;/p&gt;

&lt;p&gt;AI agents (especially multi-agent systems) need the same kind of foundation.&lt;/p&gt;

&lt;p&gt;We need to have possibility to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Set up agents from different providers.&lt;/li&gt;
&lt;li&gt;Simulate conversations between agents and with users.&lt;/li&gt;
&lt;li&gt;Orchestrate workflows when multiple agents collaborate.&lt;/li&gt;
&lt;li&gt;Judge success or failure against predefined criteria.&lt;/li&gt;
&lt;li&gt;Validate outcomes at both the single-message and whole-conversation levels.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Testing isn’t optional — it’s the foundation of trust
&lt;/h2&gt;

&lt;p&gt;The story of software is the story of building trust through testing. We no longer ship code without automated tests, integration pipelines, and validation layers.&lt;/p&gt;

&lt;p&gt;Multi-agent AI systems are no different. If anything, the need is greater, because:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Behavior is less predictable.&lt;/li&gt;
&lt;li&gt;Interactions are more complex.&lt;/li&gt;
&lt;li&gt;Stakes are higher as AI systems handle sensitive tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By treating testing as a first-class citizen in AI development, we can move faster, deploy safer, and unlock the real potential of collaborative AI.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to catch all above?
&lt;/h2&gt;

&lt;p&gt;In the previous post, we explored the basics of Maia - the test framework for multi-agent AI systems. In this post we described what Maia tries to solve.&lt;br&gt;
In the next articles we will back to practical examples with Maia to show you potential of that framework.&lt;/p&gt;

&lt;p&gt;Stay tuned!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>architecture</category>
      <category>testing</category>
      <category>productivity</category>
    </item>
    <item>
      <title>Maia - Multi-AI Agent Test Framework</title>
      <dc:creator>Radosław</dc:creator>
      <pubDate>Wed, 03 Sep 2025 12:00:37 +0000</pubDate>
      <link>https://dev.to/radoslawsz/maia-multi-ai-agent-test-framework-2opn</link>
      <guid>https://dev.to/radoslawsz/maia-multi-ai-agent-test-framework-2opn</guid>
      <description>&lt;p&gt;Hey Dev community!&lt;br&gt;
I want to share with you my recent open-source project which I am working on - Test Framework for testing Multi-AI Agent systems.&lt;/p&gt;

&lt;p&gt;Website: &lt;a href="https://www.maiaframework.com/" rel="noopener noreferrer"&gt;Maia&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Framework is written in Python and uses standard pytest approach. &lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The main features&lt;/strong&gt; are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Multi-Agent Simulation - Simulate conversations and interactions between multiple AI agents&lt;/li&gt;
&lt;li&gt;Extensible Provider Model - Easily integrate with various AI model providers (e.g., LiteLLM, LangChain, CrewAI)&lt;/li&gt;
&lt;li&gt;Built-in Assertions - A suite of assertions to verify agent behavior, including content analysis and participation checks&lt;/li&gt;
&lt;li&gt;Dashboard for visualization - NextJS application to show test results for checking and debugging purpose.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can use the framework for testing such &lt;strong&gt;scenarios&lt;/strong&gt; like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;asking various models for the same thing and check the results&lt;/li&gt;
&lt;li&gt;broadcasting a prompt and wait for the completion without user intervention (using not only CrewAI but also other providers!)&lt;/li&gt;
&lt;li&gt;simulate tool calling, so checking if your AI Agent uses your tool in a proper way&lt;/li&gt;
&lt;li&gt;much much more&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As an example, please see how easy is to write a test:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;TestConversationSessions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;MaiaTest&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;setup_agents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;GenericLiteLLMProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama/mistral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}),&lt;/span&gt;
            &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a weather assistant. Only describe the weather.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;GenericLiteLLMProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ollama/mistral&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://localhost:11434&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;}),&lt;/span&gt;
            &lt;span class="n"&gt;system_message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are an assistant who only suggests clothing.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nd"&gt;@pytest.mark.asyncio&lt;/span&gt;
  &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;test_agent_to_agent_conversation&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
      &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

      &lt;span class="c1"&gt;# Alice initiates conversation with Bob
&lt;/span&gt;      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_says&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Given the weather: rainy and 20 degrees Celsius, what clothes should I wear?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_responds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;assert_agent_participated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

      &lt;span class="c1"&gt;# Bob responds back to Alice
&lt;/span&gt;      &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_says&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Bob&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Based on my info: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agent_responds&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="nf"&gt;assert_agent_participated&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Alice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Everything is open-source and it provides basic dashboard, where you can see your tests results, including timeline, statuses, durations etc.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdacpjm5vr49zjmn25npg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdacpjm5vr49zjmn25npg.png" alt="Test view" width="800" height="413"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also see the assertions from the test:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4ycyueums098zxzw2jx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb4ycyueums098zxzw2jx.png" alt="Assertion view" width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The framework itself is in MVP phase, so more and more features are on the way.&lt;/p&gt;

&lt;p&gt;Official website is here: &lt;a href="https://maiaframework.com/" rel="noopener noreferrer"&gt;Maia Framework&lt;/a&gt;&lt;br&gt;
Github: &lt;a href="https://github.com/radoslaw-sz/maia" rel="noopener noreferrer"&gt;Maia&lt;/a&gt;&lt;br&gt;
PyPI: &lt;a href="https://pypi.org/project/maia-test-framework" rel="noopener noreferrer"&gt;maia-test-framework&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Looking forward for your feedback!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
