Most AI evaluations focus on correctness.
Did the agent complete the task?
Did it retrieve the information?
Did it execute successfully?
But another question matters:
Did it know when it was uncertain?

Humans naturally recognize uncertainty.
We ask for clarification.
We verify assumptions.
We seek additional information.
AI agents often optimize for completion.
If information is incomplete, they may still proceed.
That creates risk.
As agents gain access to real-world systems, uncertainty becomes a security issue.
The challenge isn’t only preventing bad actions.
It’s preventing confident actions based on weak information.
This is one of the reasons we’re building Crucible.
Pytest for AI agents.
Top comments (0)