<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: suraj kumar</title>
    <description>The latest articles on DEV Community by suraj kumar (@suraj_kumar_96bb8767435e2).</description>
    <link>https://dev.to/suraj_kumar_96bb8767435e2</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3954828%2F1f39d627-b41e-4efb-9c68-54dd2ab34dd9.jpeg</url>
      <title>DEV Community: suraj kumar</title>
      <link>https://dev.to/suraj_kumar_96bb8767435e2</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/suraj_kumar_96bb8767435e2"/>
    <language>en</language>
    <item>
      <title>swarm-test v0.3.3 — I Visualized My 14-Agent System and the Bottleneck Was Obvious</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Thu, 18 Jun 2026 18:27:13 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v033-i-visualized-my-14-agent-system-and-the-bottleneck-was-obvious-73b</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v033-i-visualized-my-14-agent-system-and-the-bottleneck-was-obvious-73b</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F2ud8s93sj8m9cu1y7xq9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Farticles%2F2ud8s93sj8m9cu1y7xq9.png" alt=" " width="800" height="204"&gt;&lt;/a&gt;Documentation rots. You draw your multi-agent architecture once, then six months later it's wrong because the system changed and nobody updated the picture.&lt;/p&gt;

&lt;p&gt;swarm-test v0.3.3 generates the diagram from your actual agent topology, so it stays accurate. And when I ran it on my own 14-agent system, the problem jumped out immediately.&lt;/p&gt;

&lt;p&gt;[ATTACH THE AREENGINE DIAGRAM HERE]&lt;/p&gt;

&lt;p&gt;That red node is OrchestratorAgent — a single point of failure with 92% blast radius. Look at the shape: nearly every one of the 14 agents funnels into two hubs, OrchestratorAgent and TrainerAgent. If either goes down, the system collapses. You can't see that in a console table. You see it in one glance at the diagram.&lt;/p&gt;

&lt;p&gt;Generating it is one command:&lt;/p&gt;

&lt;p&gt;swarm-test graph my_crew.py --format mermaid&lt;/p&gt;

&lt;p&gt;You get Mermaid syntax to paste straight into a GitHub README:&lt;/p&gt;

&lt;p&gt;graph TD&lt;br&gt;
      OrchestratorAgent[OrchestratorAgent ⚠️ SPOF]:::spof&lt;br&gt;
      TrainerAgent[TrainerAgent]:::healthy&lt;br&gt;
      ImageValidatorAgent --&amp;gt; OrchestratorAgent&lt;br&gt;
      classDef spof fill:#ff4444,stroke:#cc0000,color:#fff&lt;br&gt;
      classDef healthy fill:#44cc44,stroke:#22aa22,color:#fff&lt;/p&gt;

&lt;p&gt;GitHub, GitLab, and Notion render this natively. Single points of failure show red, healthy agents green, moderate-risk yellow — the same risk classification from the reliability analysis.&lt;/p&gt;

&lt;p&gt;Three formats depending on where the diagram is going:&lt;/p&gt;

&lt;p&gt;Mermaid — for READMEs and wikis. Renders inline, version-controllable as text, diffs cleanly in PRs.&lt;/p&gt;

&lt;p&gt;DOT — for Graphviz pipelines and custom tooling.&lt;/p&gt;

&lt;p&gt;PNG — for slide decks and external docs:&lt;/p&gt;

&lt;p&gt;swarm-test graph my_crew.py --format png --output topology.png&lt;/p&gt;

&lt;p&gt;(PNG needs matplotlib: pip install swarm-test[png])&lt;/p&gt;

&lt;p&gt;The value scales with system size. On a 3-agent crew you can already see the structure in the console. But on a 14-agent system with 40+ edges, the diagram reveals clusters, bottlenecks, and isolated agents that a table can't show. The shape of the problem becomes visible — and the fix becomes obvious. In my case: break the Orchestrator bottleneck by distributing routing across multiple agents.&lt;/p&gt;

&lt;p&gt;Because the diagram comes from the same graph analysis that runs the reliability tests, your documentation and your testing never disagree.&lt;/p&gt;

&lt;p&gt;Works across CrewAI, LangGraph, AutoGen, and custom orchestrators.&lt;/p&gt;

&lt;p&gt;pip install swarm-test --upgrade&lt;br&gt;
GitHub: github.com/surajkumar811/swarm-test&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>testing</category>
    </item>
    <item>
      <title>swarm-test v0.3.2 — Write Your Own Multi-Agent Reliability Tests</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Wed, 17 Jun 2026 05:29:08 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v032-write-your-own-multi-agent-reliability-tests-1e7n</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v032-write-your-own-multi-agent-reliability-tests-1e7n</guid>
      <description>&lt;p&gt;swarm-test v0.3.2 adds a plugin system. You can now write custom reliability tests for your specific multi-agent architecture.&lt;/p&gt;

&lt;p&gt;The built-in tests cover universal failure modes — cascade failures, context leakage, intent drift, collusion, blast radius, timeout resilience, sensitive data detection, and contract violations. But every team has domain-specific risks that a generic tool can't anticipate. Maybe you need to check that your billing agent never communicates directly with your data deletion agent. Maybe you need to verify that no agent chain exceeds 5 hops. Maybe you have compliance requirements unique to your industry.&lt;/p&gt;

&lt;p&gt;Now you can build those checks yourself and they run alongside everything else.&lt;/p&gt;

&lt;p&gt;Writing a plugin takes about 10 lines:&lt;/p&gt;

&lt;p&gt;from swarm_test.plugins import BasePlugin, PluginResult&lt;br&gt;
  from swarm_test.core.models import Finding&lt;/p&gt;

&lt;p&gt;class MaxHopsPlugin(BasePlugin):&lt;br&gt;
      name = "max_hops_check"&lt;br&gt;
      version = "0.1.0"&lt;br&gt;
      description = "Warns if any agent chain exceeds N hops"&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  def run(self, graph, agents, edges, config):
      findings = []
      # your test logic using the NetworkX graph
      return PluginResult(
          test_name=self.name,
          status="passed" if not findings else "failed",
          score=100,
          findings=findings,
          duration_ms=0.0
      )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Register it in your package's pyproject.toml:&lt;/p&gt;

&lt;p&gt;[project.entry-points."swarm_test.plugins"]&lt;br&gt;
  max_hops_check = "my_package:MaxHopsPlugin"&lt;/p&gt;

&lt;p&gt;Install your package and swarm-test discovers it automatically:&lt;/p&gt;

&lt;p&gt;swarm-test plugins list&lt;/p&gt;

&lt;p&gt;Plugin findings appear everywhere — console output, JSON export, HTML reports, GitHub Action annotations, CI/CD gates. They respect the same YAML config filtering (enabled_tests/disabled_tests) as built-in tests. One failing plugin doesn't crash the rest of the run.&lt;/p&gt;

&lt;p&gt;The graph object your plugin receives is a full NetworkX DiGraph with all agent nodes, edges, and metadata. You have access to every graph algorithm NetworkX provides — centrality, shortest paths, connected components, community detection. The agents and edges lists give you the full swarm-test model with roles, tools, health scores, and redundancy data.&lt;/p&gt;

&lt;p&gt;What I'd love to see the community build: rate limit validation (does any agent path exceed API rate limits), cost estimation plugins (token counting per path), compliance-specific checks (HIPAA, SOC 2, GDPR agent isolation), framework-specific tests that go deeper than the generic adapters.&lt;/p&gt;

&lt;p&gt;If you build a plugin, open an issue on the repo and I'll add it to a community plugins directory.&lt;/p&gt;

&lt;p&gt;pip install swarm-test --upgrade&lt;br&gt;
GitHub: github.com/surajkumar811/swarm-test&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feozbwx04330nto0c96je.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Feozbwx04330nto0c96je.png" alt=" " width="800" height="431"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F239nxo5qumjs5z89cdyz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F239nxo5qumjs5z89cdyz.png" alt=" " width="799" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>testing</category>
    </item>
    <item>
      <title>swarm-test v0.3.1 — Interactive HTML Reports and Developer Experience Overhaul</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Sun, 14 Jun 2026 19:15:30 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v031-interactive-html-reports-and-developer-experience-overhaul-45d4</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v031-interactive-html-reports-and-developer-experience-overhaul-45d4</guid>
      <description>&lt;p&gt;Major update to swarm-test — the open-source multi-agent reliability testing tool.&lt;/p&gt;

&lt;p&gt;The problem with CLI output: tools dump everything. You run the test, get 200 lines, and scroll back trying to find what matters. For CI scripts, you need one line. For debugging, you need everything. For daily use, you need something in between. Most tools pick one mode. That's wrong.&lt;/p&gt;

&lt;p&gt;swarm-test v0.3.1 adds three output modes:&lt;/p&gt;

&lt;p&gt;Default — first line is the verdict: "Swarm Score: 0/100 — CRITICAL (5 critical, 1 high findings)" followed by only CRITICAL and HIGH findings with actionable fixes. Lower-severity findings hidden with a note.&lt;/p&gt;

&lt;p&gt;Quiet (--quiet) — one line only. "Swarm Score: 10/100 — CRITICAL (2 critical findings)". Exit code does the rest. 0 = pass. 1 = threshold exceeded. Perfect for CI scripts.&lt;/p&gt;

&lt;p&gt;Verbose (--verbose) — everything. All findings including LOW and INFO. Full graph metrics. All agent health details. Complete redundancy table.&lt;/p&gt;

&lt;p&gt;Every finding now ends with a specific fix, not just a problem statement:&lt;/p&gt;

&lt;p&gt;CRITICAL | cascade_failure&lt;br&gt;
  Catastrophic cascade potential: Hub failure cascades to 5 agents&lt;br&gt;
  → Add a fallback agent for 'Hub' or distribute its responsibilities across multiple agents.&lt;/p&gt;

&lt;p&gt;The big addition is the interactive HTML report. Run: swarm-test run crew.py --output-format html --output-path report.html --open&lt;/p&gt;

&lt;p&gt;Your browser opens with a full dashboard:&lt;/p&gt;

&lt;p&gt;Swarm Score Gauge — large circular gauge showing 0-100 with certification level (EXCELLENT, GOOD, NEEDS IMPROVEMENT, AT RISK, CRITICAL). One look tells you the state of your system.&lt;/p&gt;

&lt;p&gt;Agent Interaction Graph — D3 force-directed graph. Nodes are agents, sized by connections, colored by health (green/yellow/red). SPOF agents get a pulsing red border. Drag to reposition, scroll to zoom, click to highlight edges.&lt;/p&gt;

&lt;p&gt;Interaction Heatmap — NxN grid showing which agent pairs communicate most. Darker = more interactions. Red overlay = findings on that edge. Instantly see where the risky connections are.&lt;/p&gt;

&lt;p&gt;Health Scores Table — sortable with colored progress bars. Each agent shows its score, status, and specific risk details like "100% blast radius, SPOF, high cascade depth."&lt;/p&gt;

&lt;p&gt;Redundancy Table — replaceability scores from IRREPLACEABLE (0-20) to FULLY REDUNDANT (81-100). SPOFs highlighted in red with green progress bars for safe agents.&lt;/p&gt;

&lt;p&gt;Findings Section — filter buttons (ALL / CRITICAL / HIGH / MEDIUM / LOW). Each finding is collapsible — click to expand for full description, affected agents, and remediation steps.&lt;/p&gt;

&lt;p&gt;Everything else still works. Same 8 reliability tests (cascade failure, context leakage, intent drift, collusion detection, blast radius, timeout resilience, sensitive data detection, contract violation). Same 3 framework adapters (CrewAI, LangGraph, AutoGen). Same YAML config with auto-discovery. Same GitHub Action for CI/CD gating. Same JSON and Markdown exports. Nothing removed, everything improved.&lt;/p&gt;

&lt;p&gt;Install: pip install swarm-test --upgrade&lt;/p&gt;

&lt;p&gt;What's next: plugin system — write your own custom reliability tests with a simple BasePlugin interface.&lt;/p&gt;

&lt;p&gt;GitHub: github.com/surajkumar811/swarm-test&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2y9fyjuhk0v54dxtkq82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2y9fyjuhk0v54dxtkq82.png" alt=" " width="800" height="412"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7ane5vmwqfb7sd97hth.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv7ane5vmwqfb7sd97hth.png" alt=" " width="800" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>opensource</category>
      <category>python</category>
    </item>
    <item>
      <title>swarm-test is now a GitHub Action — multi-agent reliability testing on every PR</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Sat, 13 Jun 2026 16:32:11 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-is-now-a-github-action-multi-agent-reliability-testing-on-every-pr-2c16</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-is-now-a-github-action-multi-agent-reliability-testing-on-every-pr-2c16</guid>
      <description>&lt;p&gt;swarm-test v0.3.0 turns multi-agent reliability testing into a CI/CD gate.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Setup
&lt;/h2&gt;

&lt;p&gt;Add this to .github/workflows/reliability.yml:&lt;/p&gt;

&lt;p&gt;name: Agent Reliability&lt;br&gt;
  on: [pull_request]&lt;br&gt;
  jobs:&lt;br&gt;
    swarm-test:&lt;br&gt;
      runs-on: ubuntu-latest&lt;br&gt;
      steps:&lt;br&gt;
        - uses: actions/checkout@v4&lt;br&gt;
        - uses: surajkumar811/&lt;a href="mailto:swarm-test@v0.3.0"&gt;swarm-test@v0.3.0&lt;/a&gt;&lt;br&gt;
          with:&lt;br&gt;
            script: my_crew.py&lt;br&gt;
            fail-on-severity: high&lt;/p&gt;

&lt;p&gt;That's it. Every PR now gets tested for cascade failures, blast radius, context leakage, intent drift, collusion, timeout resilience, contract violations, and single points of failure.&lt;/p&gt;

&lt;h2&gt;
  
  
  What You See on the PR
&lt;/h2&gt;

&lt;p&gt;Findings show up as inline annotations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Critical findings → errors (block the merge)&lt;/li&gt;
&lt;li&gt;High findings → warnings&lt;/li&gt;
&lt;li&gt;Medium findings → notices&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Plus a job summary with your Swarm Score and the top findings with remediation steps.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Most teams test individual agents and call it done. But the failures that take down production live in the interactions between agents — and those only surface when you test the whole graph.&lt;/p&gt;

&lt;p&gt;Running this manually means you test when you remember. Running it in CI means you test every single change, automatically, before it merges.&lt;/p&gt;

&lt;h2&gt;
  
  
  Works Across Frameworks
&lt;/h2&gt;

&lt;p&gt;CrewAI, LangGraph, AutoGen — same action, same config. The graph topology is what gets tested, not the framework.&lt;/p&gt;

&lt;p&gt;pip install swarm-test --upgrade&lt;br&gt;
GitHub: github.com/surajkumar811/swarm-test&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1x43q94c3l9royfgnu5y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1x43q94c3l9royfgnu5y.png" alt=" " width="800" height="473"&gt;&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkon36fhzfynu4qc2mg84.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkon36fhzfynu4qc2mg84.png" alt=" " width="800" height="303"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensourece</category>
      <category>agents</category>
      <category>python</category>
    </item>
    <item>
      <title>swarm-test v0.2.8 — This Reliability Report Found a Catastrophic Failure Nobody Caught</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Wed, 10 Jun 2026 10:49:06 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v028-this-reliability-report-found-a-catastrophic-failure-nobody-caught-4fml</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v028-this-reliability-report-found-a-catastrophic-failure-nobody-caught-4fml</guid>
      <description>&lt;p&gt;I ran swarm-test v0.2.8 on a 4-agent system. The result: Risk Score 90/100.&lt;br&gt;
One agent is holding the entire system together — and nobody knew.&lt;br&gt;
What the report found:&lt;br&gt;
The "Hub" agent has a health score of 15/100. It's classified as IRREPLACEABLE with a blast radius of 100%. If Hub fails, every downstream agent — Worker1, Worker2, Worker3 — receives corrupted input. The system throws no errors. Logs look clean. Output is silently garbage.&lt;br&gt;
Meanwhile all three workers scored 80-100/100 and are FULLY REDUNDANT. The individual agents are fine. The architecture is the vulnerability.&lt;br&gt;
No code review caught this. No unit test caught this. Only graph-based chaos testing exposed the structural weakness.&lt;br&gt;
What's new in v0.2.8:&lt;br&gt;
Per-agent redundancy scoring with SPOF detection. Every agent in your system gets classified: IRREPLACEABLE, PARTIALLY REDUNDANT, or FULLY REDUNDANT. You see exactly which agents are safe to lose and which ones would take down your entire pipeline.&lt;br&gt;
This sits on top of the 6 chaos tests swarm-test has run since v0.1.0: cascade failure analysis, blast radius mapping, intent drift measurement, context leakage detection, collusion detection, and timeout resilience testing.&lt;br&gt;
Why this matters:&lt;br&gt;
I found 54 of these failures in my own 14-agent production system. 15 were CRITICAL. The system had been running for weeks looking perfectly healthy. It wasn't.&lt;br&gt;
Every team building multi-agent AI systems with CrewAI, LangGraph, or AutoGen has these hidden vulnerabilities. The only question is whether you find them before your users do.Try it:&lt;br&gt;
Search "swarm-test" on PyPI. MIT licensed. 78 tests passing. Works with CrewAI and LangGraph.&lt;br&gt;
What does YOUR agent system's risk score look like?&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9da8glzw2h073yj3qj6r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9da8glzw2h073yj3qj6r.png" alt=" " width="800" height="485"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>python</category>
      <category>agents</category>
    </item>
    <item>
      <title>swarm-test now supports AutoGen — 3 frameworks, 1 reliability testing tool</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Mon, 08 Jun 2026 07:51:00 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-now-supports-autogen-3-frameworks-1-reliability-testing-tool-5079</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-now-supports-autogen-3-frameworks-1-reliability-testing-tool-5079</guid>
      <description>&lt;p&gt;Quick update: swarm-test v0.2.7 adds AutoGen support.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;The multi-agent ecosystem is fragmenting. Teams build with CrewAI, LangGraph, AutoGen, or a mix. But the failure modes are identical across all of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cascade failures where one agent takes down the chain&lt;/li&gt;
&lt;li&gt;Context leaking between agents that shouldn't share data&lt;/li&gt;
&lt;li&gt;Intent drift where instructions get distorted through handoffs&lt;/li&gt;
&lt;li&gt;Contract violations where Agent A outputs something Agent B doesn't expect&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Testing shouldn't fragment just because your framework choice did.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's New
&lt;/h2&gt;

&lt;p&gt;swarm-test v0.2.7 adds full AutoGen support:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GroupChat and GroupChatManager detection&lt;/li&gt;
&lt;li&gt;ConversableAgent, AssistantAgent, UserProxyAgent extraction&lt;/li&gt;
&lt;li&gt;Speaker transition mapping (allowed_transitions, speaker_selection_method)&lt;/li&gt;
&lt;li&gt;Tool/function extraction from agent function maps&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Same 7 reliability tests run identically across all three frameworks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Cascade failure&lt;/li&gt;
&lt;li&gt;Context leakage&lt;/li&gt;
&lt;li&gt;Intent drift&lt;/li&gt;
&lt;li&gt;Collusion detection&lt;/li&gt;
&lt;li&gt;Blast radius mapping&lt;/li&gt;
&lt;li&gt;Timeout resilience&lt;/li&gt;
&lt;li&gt;Output contract validation&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Usage
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;swarm-test &lt;span class="nt"&gt;--upgrade&lt;/span&gt;

&lt;span class="c"&gt;# Test a CrewAI crew&lt;/span&gt;
swarm-test run my_crew.py

&lt;span class="c"&gt;# Test a LangGraph graph&lt;/span&gt;
swarm-test run my_graph.py

&lt;span class="c"&gt;# Test an AutoGen GroupChat&lt;/span&gt;
swarm-test run my_groupchat.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Framework is auto-detected. No flags needed.&lt;/p&gt;

&lt;h2&gt;
  
  
  With YAML Config
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="c1"&gt;# .swarmtest.yml&lt;/span&gt;
&lt;span class="na"&gt;fail_on_severity&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;high&lt;/span&gt;
&lt;span class="na"&gt;max_blast_radius&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.75&lt;/span&gt;
&lt;span class="na"&gt;enabled_tests&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;cascade&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;blast_radius&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;contract_violation&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same config works across all frameworks. Drop it in your project root, swarm-test picks it up automatically.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;Most teams pick a framework and build testing around its specific API. Then they add a second framework for a different use case and their testing breaks. Or they migrate from CrewAI to LangGraph and lose all their reliability coverage.&lt;/p&gt;

&lt;p&gt;swarm-test tests the interaction graph, not the framework. The graph topology, blast radius, and failure modes are the same whether you built with CrewAI, LangGraph, or AutoGen.&lt;/p&gt;

&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Redundancy scoring — how replaceable is each agent?&lt;/li&gt;
&lt;li&gt;GitHub Action — swarm-test as a CI/CD gate on every PR&lt;/li&gt;
&lt;li&gt;Interaction heatmap — visual map of agent communication patterns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;GitHub: github.com/surajkumar811/swarm-test&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7yy9utd5degg8pdzt3p6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7yy9utd5degg8pdzt3p6.png" alt=" " width="800" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>testing</category>
      <category>python</category>
      <category>opensource</category>
    </item>
    <item>
      <title>🐝 swarm-test v0.2.4 — Quick Scan CLI</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Wed, 03 Jun 2026 07:50:16 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v024-quick-scan-cli-1hk7</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v024-quick-scan-cli-1hk7</guid>
      <description>&lt;p&gt;Test any agent system in 30 seconds. No Python needed:&lt;/p&gt;

&lt;p&gt;swarm-test scan \&lt;br&gt;
  --agents "Researcher,Analyst,Writer,Reviewer" \&lt;br&gt;
  --edges "Researcher&amp;gt;Analyst,Analyst&amp;gt;Writer,Writer&amp;gt;Reviewer"&lt;/p&gt;

&lt;p&gt;One command. 13 findings. 2 SPOFs. Health scores for every agent.&lt;/p&gt;

&lt;p&gt;pip install swarm-test --upgrade&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fwn5ugujqj6oxvaalct.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5fwn5ugujqj6oxvaalct.png" alt=" " width="799" height="543"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>multiagent</category>
      <category>devtools</category>
      <category>opensource</category>
    </item>
    <item>
      <title>I Found 54 Reliability Issues in My 14-Agent AI System — Here's What Broke</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Sun, 31 May 2026 00:46:10 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/i-found-54-reliability-issues-in-my-14-agent-ai-system-heres-what-broke-2bj7</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/i-found-54-reliability-issues-in-my-14-agent-ai-system-heres-what-broke-2bj7</guid>
      <description>&lt;p&gt;Every testing tool for AI agents tests individual agents. But production failures don't happen inside agents — they happen &lt;strong&gt;between&lt;/strong&gt; them.&lt;/p&gt;

&lt;p&gt;I learned this the hard way.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Problem Nobody Is Solving
&lt;/h2&gt;

&lt;p&gt;I built a 14-agent document processing system using CrewAI. Each agent worked perfectly in isolation. In production, the system failed constantly — and I couldn't figure out why.&lt;/p&gt;

&lt;p&gt;The problem wasn't any single agent. It was the &lt;strong&gt;interactions&lt;/strong&gt;: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One agent failing silently took down 12 others&lt;/li&gt;
&lt;li&gt;Agents were sharing sensitive data across boundaries they shouldn't cross&lt;/li&gt;
&lt;li&gt;Three agents formed a communication clique that bypassed the orchestrator&lt;/li&gt;
&lt;li&gt;Every agent depended on one central orchestrator with zero fallback&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;No existing tool could find these issues. Arize, Langfuse, Braintrust — they all monitor individual agents. None of them test the &lt;strong&gt;graph&lt;/strong&gt; of agent interactions.&lt;/p&gt;

&lt;p&gt;So I built one.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built: swarm-test
&lt;/h2&gt;

&lt;p&gt;swarm-test builds a NetworkX interaction graph of your multi-agent system and runs 6 chaos engineering tests against it:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Cascade Failure&lt;/strong&gt; — which agents bring down the whole system if they fail&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context Leakage&lt;/strong&gt; — sensitive data (API keys, PII, credentials) crossing agent boundaries&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intent Drift&lt;/strong&gt; — agents acting outside their role or being manipulated&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Collusion Detection&lt;/strong&gt; — agents communicating outside the orchestrator's oversight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Blast Radius&lt;/strong&gt; — single points of failure and critical dependency paths&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Timeout Resilience&lt;/strong&gt; — agents with no fallback if upstream is slow&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;3-line API:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;swarm_test&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SwarmProbe&lt;/span&gt;

&lt;span class="n"&gt;probe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SwarmProbe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;crew&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;probe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  What It Found On My Real System
&lt;/h2&gt;

&lt;p&gt;I ran swarm-test on my 14-agent system. The results were brutal:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;54 total findings:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;15 CRITICAL (14 cascade failures + 1 SPOF)&lt;/li&gt;
&lt;li&gt;13 HIGH (9 timeout vulnerabilities + 4 collusion cliques)&lt;/li&gt;
&lt;li&gt;26 MEDIUM (13 intent drift + 13 missing timeout handling)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The worst agent: &lt;strong&gt;OrchestratorAgent scored 4 out of 100.&lt;/strong&gt; It's a single point of failure with 92% blast radius — if it fails, 12 of 14 agents go down. And it had zero timeout handling.&lt;/p&gt;

&lt;p&gt;The scariest finding: &lt;strong&gt;EvolutionAgent has 100% blast radius.&lt;/strong&gt; If it fails, every other agent in the system is affected.&lt;/p&gt;

&lt;p&gt;Three agents (OrchestratorAgent, FileOptimizerAgent, PrintOptimizerAgent) formed a &lt;strong&gt;collusion clique&lt;/strong&gt; — communicating directly with each other and bypassing orchestrator oversight.&lt;/p&gt;

&lt;p&gt;None of this was visible from testing individual agents. It only appeared when I tested the &lt;strong&gt;interaction graph&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  I Shipped 7 Features in 7 Days
&lt;/h2&gt;

&lt;p&gt;After launching, I shipped one feature every day:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Day&lt;/th&gt;
&lt;th&gt;Feature&lt;/th&gt;
&lt;th&gt;Impact&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;Launch — 5 chaos tests, GitHub + PyPI&lt;/td&gt;
&lt;td&gt;First multi-agent testing tool on PyPI&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;Timeout resilience test&lt;/td&gt;
&lt;td&gt;Found 22 new issues in my system&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;JSON export&lt;/td&gt;
&lt;td&gt;Another developer integrated it into his runtime gate within hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;LangGraph adapter&lt;/td&gt;
&lt;td&gt;Now supports CrewAI + LangGraph&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;Sensitive data detection (23 patterns)&lt;/td&gt;
&lt;td&gt;Catches AWS keys, JWT tokens, credit cards crossing agent boundaries&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;Per-agent health scores (0-100)&lt;/td&gt;
&lt;td&gt;Know exactly which agent to fix first&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;Before/after comparison&lt;/td&gt;
&lt;td&gt;Measure if your refactor actually improved reliability&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;ASCII agent graph&lt;/td&gt;
&lt;td&gt;See your agent topology right in the terminal&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;94 tests passing. Two frameworks supported. And growing.&lt;/p&gt;

&lt;h2&gt;
  
  
  The First Integration
&lt;/h2&gt;

&lt;p&gt;Within 48 hours of launch, another developer built an integration. He has a runtime action-gate that blocks dangerous agent actions before execution. He connected swarm-test's findings as "priors" — so when swarm-test flags an edge as high-risk, his gate becomes more cautious on that edge.&lt;/p&gt;

&lt;p&gt;The result: the same &lt;code&gt;run_sql&lt;/code&gt; action went from "CONFIRM" (risk 62) to "HUMAN_REQUIRED" (risk 78) when swarm-test's cascade finding was attached.&lt;/p&gt;

&lt;p&gt;Structural testing (swarm-test) + runtime enforcement (his gate) = the full reliability stack for multi-agent systems.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Matters Now
&lt;/h2&gt;

&lt;p&gt;According to recent industry research:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;88% of organizations report AI agent security incidents&lt;/li&gt;
&lt;li&gt;Only 14.4% of agents go live with full security approval&lt;/li&gt;
&lt;li&gt;OWASP classified cascade failures as ASI08 — a top AI security risk&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Multi-agent systems are going to production faster than anyone can secure them. The tools exist for single-agent monitoring. Nothing existed for multi-agent interaction testing — until now.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;swarm-test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;swarm_test&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;SwarmProbe&lt;/span&gt;

&lt;span class="c1"&gt;# Works with CrewAI
&lt;/span&gt;&lt;span class="n"&gt;probe&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SwarmProbe&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;your_crew&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;report&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;probe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_all&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;print_summary&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_html&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Interactive D3 graph
&lt;/span&gt;&lt;span class="n"&gt;report&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;to_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;report.json&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Machine-readable for CI/CD
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;GitHub: &lt;a href="https://github.com/surajkumar811/swarm-test" rel="noopener noreferrer"&gt;github.com/surajkumar811/swarm-test&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Open source. MIT licensed. Solo founder building in public.&lt;/p&gt;

&lt;p&gt;What reliability tests would YOU want for your multi-agent systems? Drop a comment — I'm shipping features based on real feedback.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..." class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..." alt="Uploading image" width="800" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>testing</category>
    </item>
    <item>
      <title>I Ship One AI Testing Feature Every Day — Here's What 6 Days Looks Like</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Thu, 28 May 2026 17:13:57 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/i-ship-one-ai-testing-feature-every-day-heres-what-6-days-looks-like-5cdn</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/i-ship-one-ai-testing-feature-every-day-heres-what-6-days-looks-like-5cdn</guid>
      <description>&lt;p&gt;I launched swarm-test 6 days ago. It's the first reliability testing tool &lt;br&gt;
for multi-agent AI systems. Here's what I've shipped every single day:&lt;/p&gt;

&lt;p&gt;Day 0: Launch — 5 chaos tests, GitHub + PyPI&lt;br&gt;
Day 1: Timeout resilience — found 22 new issues in my 14-agent system&lt;br&gt;
Day 2: JSON export — another developer integrated it into his runtime gate within hours&lt;br&gt;
Day 3: LangGraph adapter — doubled the addressable user base&lt;br&gt;
Day 4: Sensitive data detection — 23 pattern types (AWS keys, JWT, credit cards)&lt;br&gt;
Day 5: Per-agent health scores — every agent gets a 0-100 rating&lt;br&gt;
Day 6: Before/after comparison — measure if changes actually helped&lt;/p&gt;

&lt;p&gt;[Include screenshots from each day]&lt;/p&gt;

&lt;p&gt;What I learned: shipping daily does three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Forces you to keep features small and shippable&lt;/li&gt;
&lt;li&gt;Gives you something to post about every day&lt;/li&gt;
&lt;li&gt;Shows users the project is alive and actively maintained&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;What's next: AutoGen adapter, GitHub Action, YAML config, &lt;br&gt;
plugin system for community tests.&lt;/p&gt;

&lt;p&gt;Try it: pip install swarm-test&lt;br&gt;
GitHub: github.com/surajkumar811/swarm-test .&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzuoncbd9ohblhl9wl5r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbzuoncbd9ohblhl9wl5r.png" alt=" " width="800" height="486"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>python</category>
      <category>opensource</category>
      <category>testing</category>
    </item>
    <item>
      <title>🐝 swarm-test v0.1.5— Per-Agent Health Scores</title>
      <dc:creator>suraj kumar</dc:creator>
      <pubDate>Wed, 27 May 2026 19:04:05 +0000</pubDate>
      <link>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v015-per-agent-health-scores-8jf</link>
      <guid>https://dev.to/suraj_kumar_96bb8767435e2/swarm-test-v015-per-agent-health-scores-8jf</guid>
      <description>&lt;p&gt;My 14-agent system's results:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OrchestratorAgent scored 4/100 — single point of failure with 3 collusion cliques&lt;/li&gt;
&lt;li&gt;FaceDetectorAgent scored 44/100 — high cascade depth&lt;/li&gt;
&lt;li&gt;EvolutionAgent scored 50/100 — 100% blast radius&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Stop guessing which agent to fix. Let the data tell you.&lt;/p&gt;

&lt;p&gt;5 features in 5 days. 84 tests. CrewAI + LangGraph.&lt;/p&gt;

&lt;p&gt;github.com/surajkumar811/swarm-test&lt;/p&gt;

&lt;h1&gt;
  
  
  AI #MultiAgent #Reliability #OpenSource
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiajkjjy2mqo6xstfh2wt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiajkjjy2mqo6xstfh2wt.png" alt=" " width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>showdev</category>
      <category>testing</category>
    </item>
  </channel>
</rss>
