DEV Community

suraj kumar
suraj kumar

Posted on

The most expensive agent failure is the one that doesn't crash

Crashes are the easy case — catch the exception, move on. The failure that hurts is the agent that works but won't stop: it loops, burns tokens, throws no error. You learn about it from the invoice.

swarm-test now detects loop and runaway-path risks statically, straight from the agent topology. No live LLM calls.

It flags:

  • Unbounded cycles with no exit path — a directed cycle where execution can enter but never leave (critical)
  • Multi-agent feedback loops where step count can explode under prompt drift
  • Self-invocation loops with no visible depth guard
  • Repeated-call / retry-storm patterns

A clean DAG passes with zero findings. A cyclic topology gets flagged with the concrete fix — where to add a max-iteration guard or an explicit exit edge.


pip install swarm-test
swarm-test run my_crew.py

GitHub: github.com/surajkumar811/swarm-test

What loop or runaway failures have you hit in production that static topology analysis wouldn't catch? That's the boundary I'm mapping next.

Top comments (0)