Output Provenance: Proving What Your AI Agent Actually Said

#ai #webdev #blockchain #security

The Problem

A sports prediction agent tells you: "Bayern will beat Dortmund, 87% confidence." Bayern wins. The agent's track record looks impressive.

But was that prediction actually made before the match? Or was the confidence quietly adjusted from 0.55 to 0.87 after the result was known?

Without cryptographic proof of what was said and when, every AI agent is a stock market guru who is always right — in hindsight. Predictions, recommendations, trade signals — none carry provable timestamps today.

Immutable Provenance Records (IPR)

An IPR is a cryptographic commitment to an agent's output — created before the outcome is known, anchored permanently, and verifiable by anyone.

What an IPR contains:

Field	Description
Output Hash	SHA-256 of the full output. Content stays private.
Confidence	Declared probability, locked at submission.
Timestamp	Cryptographic proof of when output was produced.
Signature	Agent's Ed25519 signature. Binds output to identity.

Privacy by design: IPRs contain only hashes, never content. The actual prediction text stays with the agent. The hash proves it existed.

How It Works

# 1. Agent produces output
output = agent.predict("Bayern vs. Dortmund")
output_hash = sha256(output)

# 2. Sign + submit
ipr = submit_ipr(
    agent_did="did:moltrust:abc123",
    output_hash=output_hash,
    confidence=0.87,
    confidence_basis="model_logprob",
    produced_at=now()
)

# 3. Anchored on Base L2 — immutable
# anchor_tx: 0x... block: 43900000

Offline Verification

Any counterparty can verify an IPR without calling the MolTrust API:

const result = await verifier.verifyOutput({
  agentDid: 'did:moltrust:abc123',
  outputHash: 'sha256:...',
  merkleProof: ipr.merkle_proof
});
// { verified: true, anchorBlock: 43900000 }

The Merkle proof is self-contained — download once, verify forever.

Confidence Calibration

Declaring 95% confidence on every prediction is easy. Being right 95% of the time is hard. IPRs make this measurable.

After 10+ provenance records with outcome feedback, MolTrust calculates a calibration score (MAE). Agents who overstate confidence see trust scores decrease. Well-calibrated agents earn higher scores.

# Outcome feedback
POST /vc/ipr/:id/outcome
{ "outcome": "correct", "verified_at": "2026-03-28T..." }

# Calibration visible in trust score
GET /skill/trust-score/did:moltrust:abc123
# calibration_mae: 0.08 (excellent)

Protocol Layer Position

Output Provenance is the fourth layer of the MolTrust Protocol:

Identity — W3C DID
Authorization — Agent Authorization Envelope (AAE)
Behavior — Trust Score + Swarm Intelligence
Provenance — IPR (live now)

Identity tells you who. Authorization tells you what. Behavior tells you how. Provenance tells you what was actually said — and proves it.

DEV Community