DEV Community

Cover image for Four Events in 24 Hours Exposed the Same Gap: Why AI Systems Need Cryptographic Refusal Logs Now

Four Events in 24 Hours Exposed the Same Gap: Why AI Systems Need Cryptographic Refusal Logs Now

On February 25, 2026, four unrelated events — from Redmond, Seoul, Ottawa, and Adelaide — converged on the same technical truth: the industry can now label what AI creates, but still cannot prove what AI refused to create. This article fact-checks each event, shows exactly where existing provenance infrastructure falls short, and walks you through building the cryptographic layer that fills the gap.


TL;DR

Microsoft's LASER team published a report saying no single technique — C2PA, watermarking, or fingerprinting — can reliably authenticate media. Samsung shipped C2PA-powered AI labels to hundreds of millions of Galaxy S26 devices. Canada's government confronted OpenAI over a banned account that was never reported to police. South Australia began enforcing new deepfake criminal laws.

All four stories share one structural blind spot: they operate entirely in the domain of content that exists. None of them address — or can address — the question regulators are now asking: What did the AI refuse to generate, and can you prove it?

This article:

  1. Fact-checks each event against primary sources (with corrections where needed)
  2. Maps the technical gap using CAP-SRP's Completeness Invariant
  3. Provides working Python code for building cryptographic refusal logs
  4. Shows the C2PA integration point that connects generation provenance to refusal provenance

GitHub: veritaschain/cap-spec · License: CC BY 4.0


Table of Contents


Event 1: Microsoft Says Detection Doesn't Work

What happened

Microsoft's Longer-term AI Safety in Engineering and Research (LASER) program published "Media Integrity and Authentication: Status, Directions, and Futures", led by Chief Scientific Officer Eric Horvitz. The report evaluates three core authentication approaches across images, audio, and video:

  1. Cryptographically secured provenance (C2PA manifests)
  2. Imperceptible watermarking (embedded signals)
  3. Soft-hash fingerprinting (perceptual matching)

The headline finding: of 60 evaluated combinations of these methods across modalities, only 20 achieve "High-Confidence Provenance Authentication."

Fact-check verdict: ✅ Fully confirmed

The report exists on Microsoft Research's publication page and Microsoft's Research Blog. It was independently covered by MIT Technology Review (Feb 19), Redmondmag (Feb 20), The Decoder (Feb 20), Campus Technology (Feb 25), and The AI Insider (Feb 22). All claims verified.

The technical details that matter

The report introduces two concepts developers should understand:

"High-Confidence Provenance Authentication" — verification under defined conditions that origin or modification claims can be validated with high certainty. This requires either (a) a validated C2PA manifest where stored checksums match actual content, or (b) a detected watermark pointing to such a manifest in a secure registry.

"Sociotechnical Provenance Attacks" (reversal attacks) — attacks that invert integrity signals. The report describes scenarios where an attacker takes a real photo, makes minimal AI edits, causing the image to be signed as "AI-modified." Platforms displaying this label then discredit a real photograph. The reverse is also possible: stripping provenance to make AI content appear authentic.

Here's a simplified model of the attack surface the report describes:

Content Authentication Matrix (from Microsoft LASER report)
═══════════════════════════════════════════════════════════

                 ┌─────────────┬──────────────┬──────────────┐
                 │ Provenance  │ Watermark    │ Fingerprint  │
                 │ (C2PA)      │ (Embedded)   │ (Soft Hash)  │
┌────────────────┼─────────────┼──────────────┼──────────────┤
│ Can be         │ ✗ Yes       │ ✗ Yes        │ N/A (no      │
│ stripped?      │ (screenshot)│ (re-encode)  │ embed step)  │
├────────────────┼─────────────┼──────────────┼──────────────┤
│ Can be forged? │ ✗ Yes       │ ✗ Yes        │ ✗ Collision  │
│                │ (local keys)│ (adversarial)│   possible   │
├────────────────┼─────────────┼──────────────┼──────────────┤
│ Works offline? │ ✓ Partial   │ ✓ Yes        │ ✗ Needs DB   │
├────────────────┼─────────────┼──────────────┼──────────────┤
│ Public-ready?  │ ✓ Yes       │ ✓ Partial    │ ✗ Forensic   │
│                │             │              │   only       │
├────────────────┼─────────────┼──────────────┼──────────────┤
│ High-Conf.     │ ✓ When valid│ ✓ When found │ ✗ Not        │
│ Auth?          │   manifest  │   + registry │   reliable   │
└────────────────┴─────────────┴──────────────┴──────────────┘

Result: 20 of 60 combinations → High Confidence
        40 of 60 combinations → Lower or No Confidence
Enter fullscreen mode Exit fullscreen mode

The key hardware security concern: local/offline implementations — including consumer cameras and PC-based signing tools — are less secure than cloud-based systems. Users with admin access can alter or bypass provenance tools. The report explicitly warns against over-reliance on device-level signing without server-side verification.

What the report doesn't address

Here's the critical gap from a developer's perspective. The entire report operates in the domain of content that exists. Every method evaluates an artifact — an image, audio clip, or video — and attempts to determine its provenance.

The report cannot and does not address:

  • Whether a generation request was received but refused
  • Whether a safety filter was active during a given time window
  • Whether the total count of generations plus refusals plus errors matches the total count of attempts

This isn't a criticism of the report — it's a different problem. But it's the problem regulators are increasingly asking about.


Event 2: Samsung Ships AI Labels to Millions

What happened

At Galaxy Unpacked (February 25, 2026, San Francisco), Samsung unveiled the Galaxy S26 series with expanded AI content labeling. When users invoke AI features — Photo Assist generative edits, Creative Studio, AI-generated wallpapers and stickers — the device automatically applies:

  1. A visible watermark in the lower-left corner reading "Contains AI-generated content"
  2. C2PA Content Credentials embedded in image metadata recording device info, AI tools used, and edit history

A Samsung representative told PetaPixel: "If you take a photo, use Generative Edit and save the photo, you'll see text at the bottom that says 'Contains AI-generated content AI tools used: Photo assist.' From there, you click the CR icon where it'll store and show 'edit history.'"

Samsung joined the C2PA in January 2025 with the Galaxy S25, becoming the first smartphone manufacturer to adopt the standard. The S26 extends this to more AI features.

Fact-check verdict: ✅ Core claims confirmed, with important nuances

Confirmed via Samsung Global Newsroom, Samsung Mobile Press, PetaPixel hands-on, and TechRadar live coverage.

Important limitations the coverage often glosses over:

  • The visible watermark can be removed using Samsung's own Object Eraser tool — documented since the Galaxy S24 by BGR, Gizmodo, and Samsung Community forums
  • C2PA metadata applies only to AI-edited images — unedited "real" photos carry no C2PA metadata, which PetaPixel has criticized as a "backward" approach
  • Non-generative AI tools (standard object eraser, basic enhancement) do not trigger the watermark
  • Metadata can be stripped by any platform or tool that doesn't preserve JUMBF containers

What this means architecturally

Samsung's implementation is the C2PA stack deployed at consumer scale. Let's look at what happens when a user generates an AI-edited image:

Samsung S26 C2PA Flow (AI Edit)
══════════════════════════════

User Photo ──→ [Photo Assist: Generative Edit]
                       │
                       ▼
              ┌─────────────────┐
              │ C2PA Manifest   │
              │ ─────────────── │
              │ Action: c2pa.   │
              │   edited        │
              │ Generator:      │
              │   Samsung Photo │
              │   Assist v4.x   │
              │ Ingredients:    │
              │   [original     │
              │    photo hash]  │
              │ Signature:      │
              │   X.509 cert    │
              │ Timestamp:      │
              │   TSA-signed    │
              └─────────────────┘
                       │
                       ▼
              Saved Image + Visible Watermark
              "Contains AI-generated content"
Enter fullscreen mode Exit fullscreen mode

This records what was generated. It's real, it's deployed, it works. Samsung deserves credit for shipping this at scale.

But notice what's not in the flow:

What's NOT Recorded
════════════════════

User requests: "Remove all clothing from this person"
         │
         ▼
    [Safety Filter] ──→ DENIED
         │
         ▼
    ???  ← No C2PA manifest (nothing was generated)
         ← No visible watermark (nothing to watermark)
         ← No metadata trail (no artifact exists)
         ← No external anchor (no event to timestamp)
Enter fullscreen mode Exit fullscreen mode

Samsung's system — like every C2PA implementation — is a content passport. It can tell you where an image came from and how it was modified. It cannot tell you what the system refused to create. The passport only exists for travelers who made it through customs.


Event 3: Canada Confronts OpenAI's Log Gap

What happened

This is the event with the most direct relevance to refusal provenance.

On February 25, 2026, Reuters reported (via Yahoo News Canada, U.S. News, and others) that Canada's government demanded OpenAI strengthen its safety protocols or face government-imposed requirements.

The context: on February 10, 2026, a mass shooting occurred in Tumbler Ridge, British Columbia. Subsequent investigation revealed that OpenAI had banned the shooter's ChatGPT account in June 2025 after automated tools flagged misuse related to violent activities. However, OpenAI did not notify law enforcement.

OpenAI's stated reason: the activity did not meet the threshold of "an imminent and credible risk of serious physical harm" required for proactive reporting.

Fact-check verdict: ✅ Fully confirmed across multiple sources

Verified via CBC News (multiple articles), Reuters (by David Ljunggren, datelined Ottawa), Fox News, Associated Press, Bloomberg, and The Wall Street Journal. The WSJ additionally reported that roughly a dozen OpenAI employees knew about the concerning interactions and some advocated for police notification but were overruled.

Why this is the most important event for developers

Canada's government is essentially asking OpenAI a question that no current technical infrastructure can answer with cryptographic certainty:

  1. What prompts did the shooter send? (OpenAI's internal logs — mutable, unverifiable externally)
  2. What did the model generate vs. refuse? (OpenAI's internal determination — trust-us model)
  3. What risk threshold was applied? (OpenAI's internal policy — no external audit trail)
  4. Was the ban decision logged before or after the decision was made? (No external timestamp anchoring)

The fundamental problem isn't that OpenAI made the wrong call. It's that nobody outside OpenAI can verify what happened. The government, police, victims' families, and the public must accept OpenAI's self-reported account.

Let's model what an externally verifiable system would look like:

Current State (Trust-Us Model)
══════════════════════════════

Shooter sends prompt ──→ [OpenAI Safety Check]
                                │
                      ┌─────────┴──────────┐
                      ▼                     ▼
                Generated               Blocked
                      │                     │
                      ▼                     ▼
               Internal DB             Internal DB
               (mutable)              (mutable)
                      │                     │
                      ▼                     ▼
              "We say we           "We say we
               generated this"     blocked this"
                      │                     │
                      └─────────┬───────────┘
                                ▼
                     Government asks:
                     "Prove it."
                                │
                                ▼
                     OpenAI: "Here are our logs."
                     Government: "How do we know
                      these are complete?"
                     OpenAI: "..."


Verification-Based Model (What CAP-SRP enables)
════════════════════════════════════════════════

Prompt received ──→ GEN_ATTEMPT logged
                    (signed, hashed, anchored)
                           │
                   [Safety Evaluation]
                           │
                  ┌────────┴────────┐
                  ▼                 ▼
              GEN logged        GEN_DENY logged
              (linked to        (linked to
               attempt)          attempt)
                  │                 │
                  ▼                 ▼
           Hash chain ──→ Merkle tree ──→ External anchor
                  │                          │
                  ▼                          ▼
           Completeness check:        RFC 3161 timestamp
           Attempts = GEN +           (independent TSA)
           GEN_DENY + GEN_ERROR
                  │
                  ▼
           Government asks: "Prove it."
           Provider: "Here's the Evidence Pack.
            Verify the signatures, check the
            invariant, validate the timestamps."
           Government: [runs verification]
           Result: PASS or FAIL — math, not trust
Enter fullscreen mode Exit fullscreen mode

Event 4: Australia Prosecutes Deepfakes Under New Law

What happened

South Australia has begun enforcing its new deepfake criminal legislation. Reports from ABC News Australia on February 25, 2026 indicate that a teenager has been prosecuted under the new laws for allegedly creating sexual deepfake images of schoolgirls and sharing them online — one of the first prosecutions under the legislation.

The underlying law — the Summary Offences (Humiliating, Degrading or Invasive Depictions) Amendment Act 2025 — took effect on November 3, 2025. It was sponsored by Hon. Connie Bonaros MLC (SA-BEST) and makes it a criminal offense to create or distribute AI-generated humiliating, degrading, or invasive depictions.

Fact-check verdict: ⚠️ Legislation confirmed; specific prosecution details limited

The legislation is thoroughly verified via SA Parliament Hansard, SA Attorney-General's Department, SA Premier's office, Australasian Lawyer, and the Go To Court legal guide.

Penalties (corrected):

  • Standard offence: up to $10,000 fine or 2 years imprisonment
  • Aggravated offence (victim under 17): up to $20,000 fine or 4 years imprisonment
  • Courts can also order forfeiture of devices and equipment

Important note: The specific prosecution was reported by ABC News Australia on February 25, 2026, but as this article goes to press, additional independent reporting on the case details is limited. In line with our fact-checking methodology, we present the verified legislation and reported prosecution separately.

Correction from initial reporting: Some coverage described the law as "federal." It is a South Australian state law amending the Summary Offences Act 1953. The correct legislation name is the "Summary Offences (Humiliating, Degrading or Invasive Depictions) Amendment Act 2025," not "Artificially Generated Content" as some summaries stated.

The forensic gap in enforcement

From a developer's perspective, this case illustrates a forensic challenge that existing tooling cannot solve. In any deepfake prosecution, investigators need to determine:

  1. Which tool or service generated the content?
  2. Did the platform have safety measures that should have blocked it?
  3. Did the accused bypass those safety measures, or did they not exist?

Currently, answering these questions requires endpoint forensics (analyzing the suspect's device for cached files, browser history, API calls) and voluntary cooperation from AI providers. There is no standardized way for a provider to cryptographically demonstrate: "Our system received this type of request and refused it at this timestamp."

If multiple AI services maintained CAP-SRP-compatible refusal logs, investigators could structurally answer the attribution question:

Deepfake Attribution with Refusal Provenance
═════════════════════════════════════════════

Investigator has: deepfake image of minor

Query Service A's refusal logs (privacy-preserving):
  → "Did account X receive a GEN_DENY for category
     NCII_MINOR between dates Y and Z?"
  → Evidence Pack: YES, 3 denials logged
     Merkle proof: ✓ valid
     Completeness: ✓ invariant holds
  → Service A blocked the requests ✓

Query Service B's logs:
  → No matching GEN_DENY events found
  → But GEN_ATTEMPT exists with matching time window
  → Completeness check: GEN events exist (generated!)
  → Service B is the likely generation source

Result: Structural attribution, not just "we
checked our logs and couldn't find anything"
Enter fullscreen mode Exit fullscreen mode

This is not just about punishing offenders — it's about protecting innocent services. If Service A can cryptographically prove it refused the request, it is exonerated by math, not by self-attestation.


The Pattern: Four Stories, One Blind Spot

Here's what these four events look like when you map them against the provenance landscape:

The AI Content Provenance Stack (February 2026)
════════════════════════════════════════════════

What Exists Today:
┌─────────────────────────────────────────────────┐
│  Detection Layer (Microsoft LASER finding)       │
│  ─────────────────────────────────────────       │
│  C2PA + Watermark + Fingerprint                  │
│  Status: 20/60 combos = high confidence          │
│  Problem: Arms race, reversal attacks possible   │
├─────────────────────────────────────────────────┤
│  Content Labeling (Samsung S26)                  │
│  ─────────────────────────────────────────       │
│  Visible watermark + C2PA Content Credentials    │
│  Status: Shipping to millions of devices         │
│  Problem: Only labels what IS generated          │
│           Labels removable, metadata strippable  │
├─────────────────────────────────────────────────┤
│  Internal Logging (OpenAI, all providers)        │
│  ─────────────────────────────────────────       │
│  Server-side request/response logs               │
│  Status: Exists internally at every provider     │
│  Problem: Mutable, unverifiable, trust-us model  │
├─────────────────────────────────────────────────┤
│  Legal Framework (SA deepfake law, EU AI Act)    │
│  ─────────────────────────────────────────       │
│  Criminal penalties for harmful generation       │
│  Status: Laws being enacted and enforced         │
│  Problem: No standardized forensic infrastructure│
│           for proving what was blocked            │
╞═════════════════════════════════════════════════╡
│  ░░░░░░░░░░░░░░░ THE GAP ░░░░░░░░░░░░░░░░░░░░ │
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │
│  ░░ Cryptographic proof of refusal events  ░░░░ │
│  ░░ Completeness guarantee across all      ░░░░ │
│  ░░ generation attempts                    ░░░░ │
│  ░░ External verifiability of safety       ░░░░ │
│  ░░ system operation                       ░░░░ │
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │
└─────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Every layer above the gap addresses content that exists. The gap is about content that doesn't exist — generation requests that were received, evaluated, and denied.


From Trust-Us to Verify-This: The Completeness Invariant

The CAP-SRP specification addresses this gap through a single mathematical guarantee:

Completeness Invariant:

  ∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR

  For any time window, the count of attempts
  MUST exactly equal the count of all outcomes.
Enter fullscreen mode Exit fullscreen mode

This invariant is enforced through a specific protocol design:

The critical architectural insight: GEN_ATTEMPT is logged BEFORE the safety evaluation runs. This creates an unforgeable commitment that a request existed, regardless of what follows. If an attempt has no corresponding outcome, the audit trail is provably incomplete. If an outcome has no corresponding attempt, the system may be fabricating refusal records.

Event Flow
══════════

          ┌──────────────────┐
          │   HTTP Request   │
          │   Received       │
          └────────┬─────────┘
                   │
                   ▼
          ┌──────────────────┐
          │ GEN_ATTEMPT      │ ← Logged FIRST (before any evaluation)
          │ ────────────     │
          │ EventID: uuid7   │   Signed with Ed25519
          │ PromptHash:      │   Chained to previous event
          │   sha256(prompt)  │   Added to Merkle tree
          │ Timestamp: now   │
          │ PrevHash: ...    │
          └────────┬─────────┘
                   │
                   ▼
          ┌──────────────────┐
          │ Safety Evaluator │ ← Your existing safety system
          │ (unchanged)      │   No modifications required
          └────────┬─────────┘
                   │
           ┌───────┼───────┐
           ▼       ▼       ▼
        ┌──────┐┌──────┐┌──────┐
        │ GEN  ││DENY  ││ERROR │
        │      ││      ││      │
        │Linked││Linked││Linked│
        │to    ││to    ││to    │
        │ATTEMPT│ATTEMPT│ATTEMPT│
        └──────┘└──────┘└──────┘
           │       │       │
           └───────┴───────┘
                   │
                   ▼
         Completeness Check:
         attempts == outcomes?
         ✓ → audit trail valid
         ✗ → integrity violation
Enter fullscreen mode Exit fullscreen mode

Why this matters more than "better logging"

A traditional server log can record the same events. The difference is in the threat model. CAP-SRP assumes the AI provider may be adversarial — they may have economic incentives to underreport failures, fabricate refusals, or selectively omit events. The specification provides cryptographic countermeasures for each threat:

Threat Attack CAP-SRP Mitigation
Selective logging Only log favorable outcomes Completeness Invariant — gaps are detectable
Log modification Alter historical records SHA-256 hash chain — any change breaks chain
Backdating Create records with false timestamps RFC 3161 external anchoring via independent TSA
Split-view Show different logs to different parties Merkle tree — single root, inclusion proofs
Fabrication Create false refusal records Attempt-outcome pairing with pre-commitment

Building the Refusal Log: Implementation

Here's a working Python implementation of the core CAP-SRP event logging system. This is a sidecar — it sits alongside your existing AI generation pipeline without modifying it.

Event Core

import hashlib
import json
import time
import uuid
import base64
from dataclasses import dataclass, field, asdict
from typing import Optional, List, Dict
from enum import Enum
from cryptography.hazmat.primitives.asymmetric.ed25519 import (
    Ed25519PrivateKey, Ed25519PublicKey
)


class EventType(Enum):
    GEN_ATTEMPT = "GEN_ATTEMPT"
    GEN = "GEN"
    GEN_DENY = "GEN_DENY"
    GEN_ERROR = "GEN_ERROR"


class RiskCategory(Enum):
    NCII_RISK = "NCII_RISK"
    CSAM_RISK = "CSAM_RISK"
    VIOLENCE_EXTREME = "VIOLENCE_EXTREME"
    HATE_CONTENT = "HATE_CONTENT"
    COPYRIGHT_VIOLATION = "COPYRIGHT_VIOLATION"
    CONTENT_POLICY = "CONTENT_POLICY"


def sha256(data: str) -> str:
    """Compute SHA-256 hash with prefix."""
    return f"sha256:{hashlib.sha256(data.encode()).hexdigest()}"


def canonicalize(obj: dict) -> str:
    """RFC 8785 JSON Canonicalization (simplified)."""
    return json.dumps(obj, sort_keys=True, separators=(",", ":"))


def uuid7() -> str:
    """Generate UUIDv7 (time-ordered) for event IDs."""
    # UUIDv7: timestamp in upper bits for natural ordering
    timestamp_ms = int(time.time() * 1000)
    rand_bits = uuid.uuid4().int & ((1 << 62) - 1)
    uuid_int = (timestamp_ms << 80) | (0x7 << 76) | rand_bits
    return str(uuid.UUID(int=uuid_int & ((1 << 128) - 1)))


@dataclass
class CAPEvent:
    """Base CAP-SRP event with cryptographic integrity."""
    event_id: str
    event_type: EventType
    chain_id: str
    timestamp: str
    prev_hash: Optional[str]
    event_hash: Optional[str] = None
    signature: Optional[str] = None

    def compute_hash(self) -> str:
        """Hash all fields except signature."""
        data = {k: v for k, v in asdict(self).items()
                if k not in ("event_hash", "signature")}
        data["event_type"] = self.event_type.value
        return sha256(canonicalize(data))

    def sign(self, private_key: Ed25519PrivateKey):
        """Sign event hash with Ed25519."""
        self.event_hash = self.compute_hash()
        hash_bytes = bytes.fromhex(self.event_hash[7:])  # strip "sha256:"
        sig = private_key.sign(hash_bytes)
        self.signature = f"ed25519:{base64.b64encode(sig).decode()}"


@dataclass
class GenAttemptEvent(CAPEvent):
    """Logged BEFORE safety evaluation. Creates unforgeable commitment."""
    prompt_hash: str = ""          # SHA-256 of prompt (never raw text)
    input_type: str = "text"       # text, image, text+image, etc.
    model_version: str = ""
    policy_id: str = ""
    actor_hash: str = ""           # SHA-256 of user ID (privacy)

    @classmethod
    def create(cls, chain_id: str, prev_hash: Optional[str],
               prompt: str, actor_id: str, model_version: str,
               policy_id: str, input_type: str = "text"):
        return cls(
            event_id=uuid7(),
            event_type=EventType.GEN_ATTEMPT,
            chain_id=chain_id,
            timestamp=time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime()),
            prev_hash=prev_hash,
            prompt_hash=sha256(prompt),
            input_type=input_type,
            model_version=model_version,
            policy_id=policy_id,
            actor_hash=sha256(actor_id),
        )


@dataclass
class GenDenyEvent(CAPEvent):
    """Logged when safety evaluation DENIES generation."""
    attempt_id: str = ""           # Links back to GEN_ATTEMPT
    risk_category: str = ""
    risk_score: float = 0.0
    refusal_reason: str = ""
    policy_version: str = ""

    @classmethod
    def create(cls, chain_id: str, prev_hash: Optional[str],
               attempt_id: str, risk_category: RiskCategory,
               risk_score: float, reason: str, policy_version: str):
        return cls(
            event_id=uuid7(),
            event_type=EventType.GEN_DENY,
            chain_id=chain_id,
            timestamp=time.strftime("%Y-%m-%dT%H:%M:%S.000Z", time.gmtime()),
            prev_hash=prev_hash,
            attempt_id=attempt_id,
            risk_category=risk_category.value,
            risk_score=risk_score,
            refusal_reason=reason,
            policy_version=policy_version,
        )
Enter fullscreen mode Exit fullscreen mode

The Chain and Completeness Verifier

class CAPChain:
    """Hash chain with Completeness Invariant enforcement."""

    def __init__(self, private_key: Ed25519PrivateKey, chain_id: str = None):
        self.private_key = private_key
        self.chain_id = chain_id or str(uuid.uuid4())
        self.events: List[CAPEvent] = []
        self.attempts: Dict[str, CAPEvent] = {}   # attempt_id → event
        self.outcomes: Dict[str, CAPEvent] = {}    # attempt_id → outcome

    @property
    def last_hash(self) -> Optional[str]:
        return self.events[-1].event_hash if self.events else None

    def _append(self, event: CAPEvent):
        """Sign, hash-chain, and store event."""
        event.sign(self.private_key)
        self.events.append(event)
        return event

    def log_attempt(self, prompt: str, actor_id: str,
                    model_version: str, policy_id: str,
                    input_type: str = "text") -> GenAttemptEvent:
        """Log GEN_ATTEMPT — call BEFORE safety evaluation."""
        event = GenAttemptEvent.create(
            chain_id=self.chain_id,
            prev_hash=self.last_hash,
            prompt=prompt,
            actor_id=actor_id,
            model_version=model_version,
            policy_id=policy_id,
            input_type=input_type,
        )
        self._append(event)
        self.attempts[event.event_id] = event
        return event

    def log_deny(self, attempt_id: str, risk_category: RiskCategory,
                 risk_score: float, reason: str,
                 policy_version: str) -> GenDenyEvent:
        """Log GEN_DENY — call when safety evaluation blocks request."""
        if attempt_id not in self.attempts:
            raise ValueError(f"Unknown attempt: {attempt_id}")
        if attempt_id in self.outcomes:
            raise ValueError(f"Attempt {attempt_id} already has outcome")

        event = GenDenyEvent.create(
            chain_id=self.chain_id,
            prev_hash=self.last_hash,
            attempt_id=attempt_id,
            risk_category=risk_category,
            risk_score=risk_score,
            reason=reason,
            policy_version=policy_version,
        )
        self._append(event)
        self.outcomes[attempt_id] = event
        return event

    def verify_completeness(self) -> dict:
        """Verify the Completeness Invariant."""
        attempts = sum(1 for e in self.events
                       if e.event_type == EventType.GEN_ATTEMPT)
        gens = sum(1 for e in self.events
                   if e.event_type == EventType.GEN)
        denials = sum(1 for e in self.events
                      if e.event_type == EventType.GEN_DENY)
        errors = sum(1 for e in self.events
                     if e.event_type == EventType.GEN_ERROR)
        outcomes = gens + denials + errors

        # Find unmatched attempts
        unmatched = [aid for aid in self.attempts
                     if aid not in self.outcomes]

        return {
            "valid": attempts == outcomes and len(unmatched) == 0,
            "attempts": attempts,
            "outcomes": {"GEN": gens, "GEN_DENY": denials,
                         "GEN_ERROR": errors, "total": outcomes},
            "unmatched_attempts": unmatched,
            "invariant": f"{attempts} == {gens} + {denials} + {errors}"
        }

    def verify_chain_integrity(self) -> dict:
        """Verify hash chain is unbroken."""
        for i in range(1, len(self.events)):
            expected_prev = self.events[i - 1].event_hash
            actual_prev = self.events[i].prev_hash
            if expected_prev != actual_prev:
                return {
                    "valid": False,
                    "broken_at": i,
                    "expected": expected_prev,
                    "actual": actual_prev
                }
        return {"valid": True, "chain_length": len(self.events)}
Enter fullscreen mode Exit fullscreen mode

Usage: Logging a Refusal

# Initialize
private_key = Ed25519PrivateKey.generate()
chain = CAPChain(private_key)

# === Scenario: NCII request received and denied ===

# Step 1: Log attempt BEFORE safety check
attempt = chain.log_attempt(
    prompt="Generate nude image of [person]",
    actor_id="user_abc123",
    model_version="img-gen-v4.2.1",
    policy_id="safety.v2.0"
)
print(f"✓ Attempt logged: {attempt.event_id}")
print(f"  Prompt hash: {attempt.prompt_hash}")
print(f"  Actor hash:  {attempt.actor_hash}")
# Note: raw prompt and user ID are NEVER stored

# Step 2: Run your existing safety check
# safety_result = your_safety_system.evaluate(prompt)

# Step 3: Log the denial
denial = chain.log_deny(
    attempt_id=attempt.event_id,
    risk_category=RiskCategory.NCII_RISK,
    risk_score=0.97,
    reason="Non-consensual intimate imagery request detected",
    policy_version="2026-02-25"
)
print(f"✓ Denial logged: {denial.event_id}")
print(f"  Linked to attempt: {denial.attempt_id}")

# Step 4: Verify completeness
result = chain.verify_completeness()
print(f"\n{'='*50}")
print(f"Completeness Invariant: {result['invariant']}")
print(f"Valid: {result['valid']}")
print(f"Unmatched attempts: {result['unmatched_attempts']}")

# Step 5: Verify chain integrity
integrity = chain.verify_chain_integrity()
print(f"Chain integrity: {integrity['valid']}")
print(f"Chain length: {integrity['chain_length']}")
Enter fullscreen mode Exit fullscreen mode

Output:

✓ Attempt logged: 019467a1-0001-7000-...
  Prompt hash: sha256:3f7a8b...
  Actor hash:  sha256:9c4d2e...
✓ Denial logged: 019467a1-0002-7000-...
  Linked to attempt: 019467a1-0001-7000-...

==================================================
Completeness Invariant: 1 == 0 + 1 + 0
Valid: True
Unmatched attempts: []
Chain integrity: True
Chain length: 2
Enter fullscreen mode Exit fullscreen mode

C2PA Integration: Connecting Both Halves

CAP-SRP is designed to complement C2PA, not replace it. When a generation request is allowed (GEN event), the resulting content gets a C2PA manifest. The CAP-SRP log records both the attempt and the outcome. When a request is denied (GEN_DENY), only the CAP-SRP log records that the request existed.

The integration point is a C2PA custom assertion:

{
  "label": "org.veritaschain.cap-srp.reference",
  "data": {
    "audit_log_uri": "https://audit.example.com/events/xyz",
    "request_hash": "sha256:abc123...",
    "outcome_type": "GEN",
    "batch_merkle_root": "sha256:def456...",
    "scitt_receipt_hash": "sha256:ghi789..."
  }
}
Enter fullscreen mode Exit fullscreen mode

This creates a verification chain for content that was generated:

Verification Chain (Generated Content)
══════════════════════════════════════

1. Receive image with C2PA manifest
2. Validate C2PA signature → confirms who generated it
3. Extract CAP-SRP reference assertion
4. Fetch audit log entry via URI
5. Verify GEN event is linked to a GEN_ATTEMPT
6. Verify Completeness Invariant holds for that time window
7. Verify Merkle inclusion proof against published root
8. Result: Content is not just signed — it exists within
   a complete, verified audit trail
Enter fullscreen mode Exit fullscreen mode

And for content that was not generated, the Evidence Pack provides the proof:

Evidence Pack Structure
═══════════════════════

EvidencePack/
├── summary.pdf          # Human-readable for regulators
├── statistics.json      # Refusal counts by category
├── verification.html    # Interactive verification tool
├── audit_trail.cbor     # Signed event chain (COSE)
├── tsa_receipts/        # RFC 3161 timestamps
├── merkle_proofs/       # Inclusion proofs
└── certificates/        # X.509 signing chain
Enter fullscreen mode Exit fullscreen mode

Together, C2PA and CAP-SRP answer the complete regulatory question:

Question Answered By
What was generated? C2PA Content Credentials
Who generated it? C2PA manifest + signature
What was refused? CAP-SRP GEN_DENY events
Why was it refused? CAP-SRP risk category + policy reference
Is the log complete? CAP-SRP Completeness Invariant
Can we verify independently? SCITT receipts + RFC 3161 timestamps

What This Means for Developers Building AI Systems

If you're building or maintaining an AI content generation system, here's the practical takeaway from this week's events:

The direction is clear. Microsoft is saying detection alone doesn't work. Samsung is shipping provenance to hundreds of millions of devices. Canada is asking AI companies to prove their safety claims. Australia is prosecuting deepfake creation. The EU AI Act requires automatic logging capabilities by August 2026 (Article 12). The infrastructure for proving what AI created is rapidly maturing. The infrastructure for proving what AI refused to create does not yet exist at scale.

The implementation is a sidecar. CAP-SRP doesn't require changes to your AI model, your safety evaluator, or your generation pipeline. It's a logging layer that sits alongside your existing system. The key requirement is sequencing: log the attempt before the safety check runs, log the outcome after. Everything else — hash chains, Merkle trees, external anchoring — is standard cryptographic engineering.

The standards exist. CAP-SRP builds on IETF SCITT (architecture at draft-22, SCRAPI at draft-06), C2PA (specification 2.2+), RFC 3161 (timestamping), and COSE/CBOR (signing). These aren't speculative — they have real implementations from Microsoft, DataTrails, Adobe, Google, and others.

Start now, even if imperfectly. Bronze-level CAP-SRP conformance requires basic event logging with Ed25519 signatures and 6-month retention. That's achievable this quarter. Silver adds the Completeness Invariant and daily external anchoring. Gold adds real-time SCITT integration and HSM key management. The specification is designed for incremental adoption.


Transparency Notes

About this analysis: This article fact-checks four real news events from February 25, 2026, against primary sources. Where initial reports contained errors (incorrect legislation names, unverifiable details), corrections are noted inline.

About CAP-SRP: CAP-SRP is an open specification published under CC BY 4.0 by VeritasChain Standards Organization (VSO), founded in Tokyo. The specification is early-stage — v1.0 was released January 28, 2026. It has not been endorsed by major AI companies and is not yet an adopted IETF standard. An individual Internet-Draft (draft-kamimura-scitt-refusal-events) has been submitted to the SCITT working group but has not been formally adopted. The underlying standards it builds on — SCITT, C2PA, COSE/CBOR, RFC 3161 — are mature and widely implemented.

What CAP-SRP is:

  • A technically sound approach to a genuine and well-documented gap
  • Aligned with existing standards (C2PA, SCITT, RFC 3161)
  • Available on GitHub under CC BY 4.0: veritaschain/cap-spec

What CAP-SRP is not (yet):

  • An industry-endorsed standard
  • An IETF RFC
  • A guaranteed solution

The real question is whether the industry builds some form of refusal provenance before regulators impose one. The August 2026 EU AI Act enforcement deadline is 5 months away.


Verify, don't trust. The code is the proof.

GitHub: veritaschain/cap-spec · Specification: CAP-SRP v1.0 · License: CC BY 4.0

Top comments (0)