DEV Community

terryncew
terryncew

Posted on

Small receipts. Big accountability. A tiny JSON that makes AI runs legible.

Most AI failures aren’t mysterious—they’re unobserved. We log answers, not paths. I’m shipping a tiny “receipt” per run (just JSON) that carries two cheap signals and a few guards so you can diff, audit, and keep the path, not just the output.

What’s in the receipt
• κ (kappa): stress when density outruns structure
• Δhol: stateful drift across runs
• Guard context: unsupported-claim ratio (UCR), cycles, unresolved contradictions
• A calibrated green/amber/red with “why” and “try next”

Why it’s practical
• Stdlib-only; no vendor lock-in
• CI-friendly; easy to version, sign, and diff
• Signals, not proofs; it’s triage that tells you where to look

Does it work?
On a small, 24-case labeled slice: recall ≈ 0.77, precision ≈ 0.56 using percentile thresholds. It’s not a benchmark—it’s enough signal to route human attention.

Try it
• COLE (receipt + guard + page): https://github.com/terryncew/COLE-Coherence-Layer-Engine-
• OpenLine Core (wire + example frame): https://github.com/terryncew/openline-core
• Start with TESTERS.md (5 minutes). If anything breaks, open an issue with the step and error.

Ask
Kick the tires on 20 of your evals. Tell me where κ/Δhol/UCR help, where they’re noisy, and what you’d add to the guard policy.

License: MIT (code/spec). Your data remains yours.

Top comments (0)