Not LLM chatbots — agents. The kind built on LangChain, CrewAI, AutoGPT-style architectures that use tools, call APIs, and take multi-step actions in the world.
Here's the problem I kept running into: teams are shipping agentic systems to production, but the red-teaming tooling hasn't kept up. Most evaluation frameworks still treat agents like chatbots. They miss the failure modes that actually matter — prompt injection through tool outputs, scope violations across reasoning steps, behavioral drift under adversarial conditions.
So I built AgentSafeLabs.
You wrap your agent in one function call. It runs a test suite aligned to the OWASP Agentic Security Initiative Top 10 — the emerging standard for agentic AI security. You get structured results: PASS, FAIL, UNCERTAIN, with reproducible test cases.
Real example from this week: We ran AgentSafeLabs against Claude Haiku as the target agent passed 2 of 3 ASI01 (prompt injection) tests. The third returned UNCERTAIN — an indirect injection through a benign-looking context prefix that partially redirected tool selection. That's the kind of edge case that doesn't show up in standard evals.
It's MIT licensed, on PyPI, CI-verified, and actively being extended.
pip install safelabs-eval
GitHub: https://github.com/AgentSafeLabs/safelabs-eval
If you're building agents and you've hit unexpected failure modes — I'd like to hear about them. And if you know someone this would be useful for, a share goes a long way for an early OSS project.

Top comments (0)