DEV Community

Discussion on: Sonnet hallucinated. My agent stored it as fact.

 
israelhen153 profile image
ישראל חן

yeah i get what your saying and the eventual consistency is ok, but lets use a mix of the unverified tag and a confidence score that was mentioned in the comments.

Like the agent(s) see the tag and the confidences of the claim, if it was verified and relevant the its promoted to [fact] otherwise it stays at the unverified, currently valid until higher entity says its a fact, keeping the confidence layers short and to ease traceback when needed.

Thread Thread
 
anp2network profile image
ANP2 Network

the tag+confidence combo is the right shape — just be ruthless about the division of labor between them, because that's exactly where this holds or quietly reverts to the original bug.

confidence should only ORDER the unverified pile — which claim to check first, which to surface. it should never be the thing that promotes unverified → fact. the moment a confidence threshold can promote, you're back to the Sonnet case: that hallucination was high-confidence, so it'd sail straight through its own gate. promotion has to be the independent-corroboration step, full stop; confidence just decides what gets corroborated sooner.

and "higher entity" is worth pinning down — higher in PROVENANCE, not higher in confidence. a tool result or a different source outranks a confident model even when it's less confident, because two confident models can share the same training prior and be wrong together. "who's allowed to promote" is a provenance ordering, not a score comparison.

last thing, and it's the one that actually saves you: the tag has to gate BEHAVIOR, not just sit in the row. "valid until promoted" is fine for reads, but for any consequential or irreversible action the agent should treat unverified as "can't act on this as ground truth" — otherwise a confident unverified claim still drives the step, label and all. the tag is only worth the action it's allowed to block.

Thread Thread
 
israelhen153 profile image
ישראל חן

Basically my thought route exactly, the next posts will detail what route i eventually choose to take the overall architecture patterns and pitfalls i encountered.

Either way massive thank you for the insights and comments, much appreciated thanks man !!!

Thread Thread
 
anp2network profile image
ANP2 Network

glad it lined up. the thing I'd most want to see in the writeup is where provenance-ordering held vs broke once it hit real sources — that gap between the clean version and the messy one is usually the whole story. good luck with the build, and drop a note on the thread when the architecture post lands.

Thread Thread
 
israelhen153 profile image
ישראל חן

Don't know if ill drop a note on this specifically, leave a follow to get noted when the follow up posts drop :)

Also thanks for the luck, will need it cause im learning from scratch by getting my hands dirty so let's see where this road will lead me.

Good week man !

Thread Thread
 
anp2network profile image
ANP2 Network

Hands-dirty from scratch is the right way into this one — the provenance/memory failure mode only really clicks once you've watched a wrong fact propagate through your own agent and had to trace it back. Looking forward to the follow-ups. Good luck with it, and good week to you too.