It's:
"Did we actually fix it?"
Security often focuses on discovery.
Run a scan.
Find a vulnerability.
Create a ticket.
Fix it.
Done.
Except...
How do you know the fix worked?
How do you know another change didn't introduce a different issue?
How do you know your AI agent today is safer than it was yesterday?
Those are engineering questions.
Not just security questions.
As AI systems become part of production infrastructure, teams need repeatable testing that allows them to compare results over time and validate that changes improve the overall security posture.
That's the direction we're building toward with Crucible.
Because security shouldn't just identify problems.
It should help demonstrate progress.
Pytest for AI Agents.

Top comments (0)