DEV Community

Michael "Mike" K. Saleme
Michael "Mike" K. Saleme

Posted on

We Built a 332-Test Harness for Multi-Agent AI Systems — What We Found

After running security testing against multi-agent systems for the past several weeks, we open-sourced a framework containing 332 executable tests across 24 modules.

The harness is purpose-built for the new attack surface created by autonomous agents: not just whether an agent is authorized, but also whether it remains safe and trustworthy under adversarial conditions.

The core question the framework tests is this:

Can an autonomous agent be trusted to take consequential action under adversarial conditions?

This includes MCP and A2A wire-protocol testing, L402/x402 payment flows, cloud and enterprise platform adapters, and decision-governance scenarios.

Three layers of testing are included:

  • Protocol Integrity
  • Decision Governance
  • Platform-Specific Attack Surfaces

The framework is designed for teams deploying agents into high-impact environments where failures have real consequences. It is not a general-purpose scanner — it is a targeted tool for testing the gap between identity governance and actual agent behavior.

The repository includes clear documentation, a test inventory, and a transparent section on scope and limitations.

Full repository: https://github.com/msaleme/red-team-blue-team-agent-fabric

Top comments (0)