DEV Community

Andreas Bergström
Andreas Bergström

Posted on • Originally published at andreasbergstrom.dev

The best bug reports were written by the suspect

Our e-commerce rule engine holds risky invoice orders for human review, and the reviewers were drowning in false alarms. So we added an LLM second opinion — advisory only: a verdict, a confidence score, and a written rationale next to the hold reasons. The human still decides, and the model can never touch the ship path.

The surprise wasn't the triage speedup. It was that when reviewers disagreed with the model, about half the investigations ended in our code: a years-old off-by-one that flagged invoices as overdue on their own due date, "outstanding credit" that was actually unshipped prepayment orders, a payload field that had silently been zero forever, and a loyalty-credit loophole that had been quietly farmable for years.

The full post covers the architecture (de-identified payloads, prompts versioned in the database, eval fixtures that deliberately freeze time) and why a model that must explain itself turns out to be a continuous audit of your feature pipeline.


Originally published at andreasbergstrom.dev — read the full post there.

Top comments (1)

Collapse
 
merbayerp profile image
Mustafa ERBAY

One of the most valuable uses of AI isn’t automation — it’s contradiction.

When a model consistently explains why it disagrees with existing business rules, it forces teams to re-examine logic that has become invisible through familiarity. Finding an off-by-one bug, a broken field, and a farmable loophole from a “second opinion” system is a strong example of that.

AI as an auditor is far more interesting than AI as a decision maker.