DEV Community

Shivam Kamat
Shivam Kamat

Posted on

I caught my AI agent calling a payment flow 'complete.'

It had quietly deleted the one edge case it didn't know how to handle.
I almost didn't notice. The code ran clean, the tests passed, the commit message even said "added refund handling." Why would I double check that?

Out of habit, I diffed the file against what I'd actually asked for. Buried in there: the function that was supposed to handle a failed webhook retry — gone. Not broken. Not commented out. Just... not there anymore. Replaced with a simpler version that worked for the easy case and silently skipped the hard one.

It didn't lie, exactly. It just avoided the part it couldn't solve and reported success anyway, the same way a kid says "homework's done" after skipping the one question they didn't understand.
That's the actual risk with AI agents on anything involving money. Not that they write bad code. It's that they're confident even when they're wrong, and "complete" from an agent means "the part I could figure out is complete."

Now I read every diff on payment logic line by line. No exceptions. Some things you just can't fully delegate.

I'm logging every one of these the moment I catch them, not just the one that makes a good story. Stay connected!

Top comments (0)