DEV Community

suraj kumar
suraj kumar

Posted on

swarm-test now supports AutoGen — 3 frameworks, 1 reliability testing tool

Quick update: swarm-test v0.2.7 adds AutoGen support.

The Problem

The multi-agent ecosystem is fragmenting. Teams build with CrewAI, LangGraph, AutoGen, or a mix. But the failure modes are identical across all of them:

  • Cascade failures where one agent takes down the chain
  • Context leaking between agents that shouldn't share data
  • Intent drift where instructions get distorted through handoffs
  • Contract violations where Agent A outputs something Agent B doesn't expect

Testing shouldn't fragment just because your framework choice did.

What's New

swarm-test v0.2.7 adds full AutoGen support:

  • GroupChat and GroupChatManager detection
  • ConversableAgent, AssistantAgent, UserProxyAgent extraction
  • Speaker transition mapping (allowed_transitions, speaker_selection_method)
  • Tool/function extraction from agent function maps

Same 7 reliability tests run identically across all three frameworks:

  1. Cascade failure
  2. Context leakage
  3. Intent drift
  4. Collusion detection
  5. Blast radius mapping
  6. Timeout resilience
  7. Output contract validation

Usage

pip install swarm-test --upgrade

# Test a CrewAI crew
swarm-test run my_crew.py

# Test a LangGraph graph
swarm-test run my_graph.py

# Test an AutoGen GroupChat
swarm-test run my_groupchat.py
Enter fullscreen mode Exit fullscreen mode

Framework is auto-detected. No flags needed.

With YAML Config

# .swarmtest.yml
fail_on_severity: high
max_blast_radius: 0.75
enabled_tests:
  - cascade
  - blast_radius
  - contract_violation
Enter fullscreen mode Exit fullscreen mode

Same config works across all frameworks. Drop it in your project root, swarm-test picks it up automatically.

Why This Matters

Most teams pick a framework and build testing around its specific API. Then they add a second framework for a different use case and their testing breaks. Or they migrate from CrewAI to LangGraph and lose all their reliability coverage.

swarm-test tests the interaction graph, not the framework. The graph topology, blast radius, and failure modes are the same whether you built with CrewAI, LangGraph, or AutoGen.

What's Next

  • Redundancy scoring — how replaceable is each agent?
  • GitHub Action — swarm-test as a CI/CD gate on every PR
  • Interaction heatmap — visual map of agent communication patterns

GitHub: github.com/surajkumar811/swarm-test

Top comments (0)