who audits the agent when the llm hallucinates a wire transfer

#ai #payments #agents

coinbase just shipped agent-to-agent payments over base. agents can now hold wallets and pay each other on-chain without human approval.

the reddit thread asked the right question: what stops a compromised agent from draining funds while still looking aligned to the supervising llm?

most agent payment tools treat authorization as binary — either the agent can spend or it can't. but real risk lives in the decision trail. did the agent hallucinate the invoice amount? did it misread the recipient? did it skip three validation steps because the prompt was ambiguous?

mnemopay wraps every agent transaction in a two-phase commit. phase one logs the intent with full context — which tool called it, what the llm saw, what the human approved. phase two settles only after a governance check.

if the agent tries to wire $4k instead of $400 the human sees the discrepancy before money moves. if the llm hallucinates a vendor name the merkleaudit chain shows exactly where the error entered the pipeline.

this isn't about blocking agent autonomy — it's about making every spend traceable so you can rewind when things go wrong. agents need to move fast but auditors need to move backward.

the kill switch isn't a button. it's a log that never lies.

DEV Community

who audits the agent when the llm hallucinates a wire transfer

Top comments (0)