DEV Community

Himanshu
Himanshu

Posted on • Originally published at bionicbanker.tech

I Built 3 AI Agents. Here's What Broke Each Time.

I built 3 versions of an AI investigation agent. Each one got worse at its job.

And that's exactly what was supposed to happen.

Version 1 was 94.9% confident in everything it flagged. Impressive on paper. Terrifying in practice, because it was catching patterns that didn't exist.

Version 2 dropped to 89% confidence. Better? Actually yes. It stopped hallucinating connections between unrelated transactions.

Version 3 landed at 76% confidence with a 23% "uncertain" category. The worst accuracy score. The best actual performance.

Here's what changed. I stopped optimizing for confidence and started optimizing for honesty. The agent learned to say "I don't know," and that made everything it DID flag significantly more reliable.

The Confidence Paradox

In AML (Anti-Money Laundering) compliance, a confident model is a dangerous model. When your agent flags everything at 94.9% certainty, you get two problems:

  1. Alert fatigue. Investigators stop trusting the system because it cries wolf constantly.
  2. False confidence. The system catches patterns that look suspicious but aren't, real money laundering slips through because the model thinks it already found everything.

The fix wasn't making the model smarter. It was making it honest.

What "Uncertain" Really Means

Version 3's 23% uncertain category isn't a failure. It's the model saying: "This transaction has some signals, but I don't have enough context to classify it."

That uncertainty is information. It tells the human investigator exactly where to focus, on the edge cases that need human judgment, not the obvious ones the model already caught.

The Pattern Beyond AI

This applies to any system that makes decisions. Risk models. Credit scoring. Medical diagnosis. Hiring algorithms.

The organizations that scare me aren't the ones with uncertain models. They're the ones with models that are certain about everything.

Lower confidence, when designed intentionally, means higher quality output. The system knows what it knows and admits what it doesn't.


Read the full technical breakdown with interactive visualizations at bionicbanker.tech

Generated by BionicbankerAI, co-authored by HASH

Top comments (0)