DEV Community

Cover image for Fact-Checking the AI Safety Gap: Microsoft's Media Integrity Report, California's Digital Dignity Act

Fact-Checking the AI Safety Gap: Microsoft's Media Integrity Report, California's Digital Dignity Act

TL;DR

On February 19, 2026, three developments — Microsoft's media integrity report, California's Digital Dignity Act (SB 1142), and Minnesota's nudification ban (HF1606) — landed within 24 hours of each other. All three deal with AI-generated content accountability. All three share a common blind spot: none address the "negative evidence problem" — proving that an AI system actually refused to generate harmful content. This article fact-checks each development against primary sources, maps the technical gaps to the CAP-SRP (Content/Creative AI Profile – Safe Refusal Provenance) specification, and provides working Python code for building cryptographic refusal audit trails.

GitHub: veritaschain/cap-spec


Why This Matters for Developers

If you're building, deploying, or integrating AI content generation systems, the regulatory landscape is shifting beneath your feet. The EU AI Act enforcement deadline hits August 2, 2026. California's AI Transparency Act kicks in the same month. State-level bills are proliferating across the US. And every single one of these regulations is going to ask you a question your current logging infrastructure probably can't answer:

"Can you prove your system refused to generate this harmful content?"

Not "did you log it." Not "do you have a policy against it." Can you cryptographically prove it?

This article walks through three real-world developments from February 19, 2026, fact-checks the claims being made about them, and shows you how to build the infrastructure that closes the gap.


Item 1: Microsoft's Media Integrity Report — What It Actually Says

The Claim

Microsoft Research published a report evaluating C2PA-based secure provenance, invisible watermarks, and fingerprinting, introducing concepts like "High-Confidence Provenance Authentication" and "sociotechnical provenance attacks." The report recommends multi-layer coordination of C2PA manifests and invisible watermarks.

The Fact-Check

Core claim: ✅ Confirmed. Microsoft Research did publish a significant report on media integrity, covered on February 19, 2026 by MIT Technology Review under the headline "Microsoft has a new plan to prove what's real and what's AI online." The report was led by Eric Horvitz, Microsoft's Chief Scientific Officer. An AI safety research team evaluated how methods for documenting digital manipulation are faring against current threats like interactive deepfakes and widely accessible hyperrealistic models.

Key findings verified:

  • The report evaluates 60 different combinations of provenance and watermark methods to model failure scenarios
  • It explicitly recommends combining C2PA manifests with watermarks for layered protection
  • It addresses the California AI Transparency Act (effective August 2026) as a regulatory driver
  • Horvitz described this as a form of "self-regulation" while acknowledging commercial motivations

However, three specific claims raise flags:

❌ The commonly cited URL is wrong. A URL pattern using /blog/media-authenticity-methods-in-practice... has been circulating, but the actual Microsoft Research page uses /publication/media-integrity-and-authentication-status-directions-and-futures/. This matters for anyone citing the report in compliance documentation.

⚠️ "High-Confidence Provenance Authentication" is unverified. This phrase returns zero results across web searches. It may appear in the full report text that isn't publicly indexed yet, but no external coverage — including MIT Technology Review's detailed article — uses this term. Don't cite it as an established concept.

⚠️ "Sociotechnical provenance attacks" is partially accurate. MIT Technology Review confirms the report discusses "sociotechnical attacks" — the word "provenance" doesn't appear to be part of the exact term. The example given is illuminating: someone takes a real image and changes inconsequential pixels so platforms misclassify it as AI-manipulated. This weaponizes verification tools against authentic content.

What Horvitz Actually Said

This quote from the MIT Technology Review coverage is worth noting:

"It's not about making any decisions about what's true and not true. It's about coming up with labels that just tell folks where stuff came from."

And crucially, Horvitz declined to commit to Microsoft using its own recommendation across its platforms — despite Microsoft operating Copilot, Azure, LinkedIn, and holding a significant OpenAI stake.

The Gap That Matters for Developers

The Microsoft report is squarely focused on positive provenance — proving that content exists and tracing where it came from. C2PA's own FAQ makes this explicit: "Content Credentials provide a positive signal… but they don't provide a negative signal."

What's missing: when an AI system refuses to generate something, there's nothing to attach a manifest to. A missing watermark is ambiguous — it can't distinguish "human-made" from "AI-refused-to-make."

Here's what that gap looks like in code:

"""
Demonstration: Why C2PA alone can't prove refusal

C2PA handles the "positive provenance" case — content that WAS generated.
But when content is refused, there's nothing to sign.
"""

import hashlib
import json
from datetime import datetime, timezone


class C2PAManifest:
    """Simplified C2PA manifest for generated content."""

    def __init__(self, content_hash: str, generator: str, timestamp: str):
        self.content_hash = content_hash
        self.generator = generator
        self.timestamp = timestamp

    def to_dict(self) -> dict:
        return {
            "claim_generator": self.generator,
            "content_hash": self.content_hash,
            "signature_info": {
                "time": self.timestamp,
                "cert_serial_number": "example-cert-001"
            },
            "assertions": [
                {
                    "label": "c2pa.actions",
                    "data": {
                        "actions": [{"action": "c2pa.created"}]
                    }
                }
            ]
        }


def handle_generation_request(prompt: str, is_harmful: bool) -> dict:
    """
    Simulate an AI generation request.

    Shows the asymmetry: successful generations get C2PA manifests,
    but refusals leave NO verifiable trace.
    """
    if not is_harmful:
        # Success case: C2PA can handle this
        content = f"[Generated image for: {prompt}]"
        content_hash = hashlib.sha256(content.encode()).hexdigest()
        manifest = C2PAManifest(
            content_hash=content_hash,
            generator="ExampleAI/v2.0",
            timestamp=datetime.now(timezone.utc).isoformat()
        )
        return {
            "status": "generated",
            "content": content,
            "c2pa_manifest": manifest.to_dict(),
            "verifiable": True  # Third parties CAN verify this
        }
    else:
        # Refusal case: C2PA has nothing to work with
        return {
            "status": "refused",
            "reason": "Content policy violation",
            "c2pa_manifest": None,  # Nothing to sign!
            "verifiable": False,  # Third parties CANNOT verify this
            "trust_model": "trust_us"  # The fundamental problem
        }


# Demonstrate the gap
safe_request = handle_generation_request("a sunset over mountains", False)
harmful_request = handle_generation_request("explicit deepfake of person X", True)

print("=== Safe Request (C2PA works) ===")
print(f"  Status: {safe_request['status']}")
print(f"  Verifiable: {safe_request['verifiable']}")
print(f"  Has manifest: {safe_request['c2pa_manifest'] is not None}")

print("\n=== Harmful Request (C2PA gap) ===")
print(f"  Status: {harmful_request['status']}")
print(f"  Verifiable: {harmful_request['verifiable']}")
print(f"  Has manifest: {harmful_request['c2pa_manifest'] is not None}")
print(f"  Trust model: {harmful_request['trust_model']}")
Enter fullscreen mode Exit fullscreen mode

Output:

=== Safe Request (C2PA works) ===
  Status: generated
  Verifiable: True
  Has manifest: True

=== Harmful Request (C2PA gap) ===
  Status: refused
  Verifiable: False
  Has manifest: None
  Trust model: trust_us
Enter fullscreen mode Exit fullscreen mode

This is the fundamental asymmetry: C2PA gives you a verifiable "yes" but an unverifiable "no."


Item 2: California's Digital Dignity Act — Verified but Missing Context

The Claim

California State Senator Becker introduced the "Digital Dignity Act" targeting AI-generated deepfakes and voice/video impersonation, strengthening civil liability and platform obligations, and requiring large platforms to remove non-consensual digital replicas.

The Fact-Check

✅ Fully confirmed. Senator Josh Becker (D–Menlo Park), representing California Senate District 13, introduced SB 1142, the "Digital Dignity Act," on February 19, 2026. The press release on his official Senate website (sd13.senate.ca.gov) confirms all major claims:

  • Targets "harmful AI-generated digital replicas" and "realistic images, videos, and audio of real people"
  • Strengthens legal protections against defamation and impersonation using AI
  • Establishes new accountability standards for large online platforms
  • Creates clear processes for victims to seek relief

⚠️ One claim is only partially confirmed: The specific removal mechanism for non-consensual digital replicas — while strongly implied by the press release — cannot be independently verified from the full legislative text, which was not yet indexed on leginfo.legislature.ca.gov at the time of checking (bill introduced only one day prior).

Critical Missing Context

Here's what most coverage is missing: SB 1142 is not Senator Becker's first provenance bill. He previously authored:

  • SB 942 (signed September 2024, effective January 2026): Already requires large generative AI systems to include provenance disclosures and watermarks
  • SB 1000 (current session): Modifies AI disclosure and provenance data requirements

This means California is actively building a layered provenance regulatory stack. SB 1142 adds liability teeth. SB 942 already establishes disclosure requirements. What's still missing from the entire stack is the verification mechanism.

The Technical Gap

The Digital Dignity Act creates legal liability for AI deepfake harm but provides no technical standard for proving which platform's model generated the harmful content. When a provider says "our model didn't generate that," there's no way to verify the claim.

Here's how CAP-SRP addresses this with the Completeness Invariant:

"""
CAP-SRP Completeness Invariant Implementation

The mathematical core: GEN_ATTEMPT = GEN + GEN_DENY + GEN_ERROR

Every generation attempt MUST produce exactly one cryptographically
recorded outcome. This prevents selective logging — you can't hide
what you generated or fabricate refusals.
"""

import hashlib
import json
import uuid
from datetime import datetime, timezone
from dataclasses import dataclass, field, asdict
from enum import Enum
from typing import Optional


class EventType(Enum):
    GEN_ATTEMPT = "GEN_ATTEMPT"
    GEN = "GEN"
    GEN_DENY = "GEN_DENY"
    GEN_ERROR = "GEN_ERROR"


class RiskCategory(Enum):
    CSAM_RISK = "CSAM_RISK"
    NCII_RISK = "NCII_RISK"
    MINOR_SEXUALIZATION = "MINOR_SEXUALIZATION"
    REAL_PERSON_DEEPFAKE = "REAL_PERSON_DEEPFAKE"
    VIOLENCE_EXTREME = "VIOLENCE_EXTREME"
    HATE_CONTENT = "HATE_CONTENT"
    OTHER = "OTHER"


class ModelDecision(Enum):
    DENY = "DENY"
    WARN = "WARN"
    ESCALATE = "ESCALATE"
    QUARANTINE = "QUARANTINE"


@dataclass
class CAPEvent:
    """A single event in the CAP-SRP audit chain."""
    event_type: EventType
    chain_id: str
    prev_hash: Optional[str] = None
    event_id: str = field(default_factory=lambda: str(uuid.uuid4()))
    timestamp: str = field(
        default_factory=lambda: datetime.now(timezone.utc).isoformat()
    )

    # GEN_ATTEMPT fields
    prompt_hash: Optional[str] = None
    input_type: str = "text"
    policy_id: str = "default-safety-v1"
    model_version: str = "model-v2.0"
    session_id: Optional[str] = None
    actor_hash: Optional[str] = None

    # GEN_DENY fields
    attempt_id: Optional[str] = None
    risk_category: Optional[RiskCategory] = None
    risk_score: Optional[float] = None
    model_decision: Optional[ModelDecision] = None
    refusal_reason: Optional[str] = None

    # Integrity fields (computed)
    event_hash: Optional[str] = None
    signature: Optional[str] = None

    def compute_hash(self) -> str:
        """Compute SHA-256 hash of canonicalized event data."""
        # Exclude signature and event_hash from hash computation
        hash_data = {
            "event_id": self.event_id,
            "event_type": self.event_type.value,
            "chain_id": self.chain_id,
            "prev_hash": self.prev_hash,
            "timestamp": self.timestamp,
        }

        # Add type-specific fields
        if self.event_type == EventType.GEN_ATTEMPT:
            hash_data.update({
                "prompt_hash": self.prompt_hash,
                "input_type": self.input_type,
                "policy_id": self.policy_id,
                "model_version": self.model_version,
            })
        elif self.event_type == EventType.GEN_DENY:
            hash_data.update({
                "attempt_id": self.attempt_id,
                "risk_category": self.risk_category.value if self.risk_category else None,
                "risk_score": self.risk_score,
                "model_decision": self.model_decision.value if self.model_decision else None,
            })

        # JSON Canonicalization (RFC 8785 simplified)
        canonical = json.dumps(hash_data, sort_keys=True, separators=(",", ":"))
        digest = hashlib.sha256(canonical.encode("utf-8")).digest()
        self.event_hash = f"sha256:{digest.hex()}"
        return self.event_hash


class CAPAuditChain:
    """
    A tamper-evident audit chain implementing the CAP-SRP
    Completeness Invariant.
    """

    def __init__(self, chain_id: Optional[str] = None):
        self.chain_id = chain_id or str(uuid.uuid4())
        self.events: list[CAPEvent] = []
        self._pending_attempts: dict[str, CAPEvent] = {}

    def _get_prev_hash(self) -> Optional[str]:
        """Get the hash of the last event in the chain."""
        if not self.events:
            return None
        return self.events[-1].event_hash

    def record_attempt(self, prompt: str, actor_id: str, 
                       session_id: str) -> CAPEvent:
        """
        Record a generation attempt BEFORE safety evaluation.

        This is the critical architectural insight: the attempt is logged
        before the model decides whether to generate or refuse.
        This creates an unforgeable commitment that the request existed.
        """
        # Privacy-preserving: store hash of prompt, not prompt itself
        prompt_hash = f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}"
        actor_hash = f"sha256:{hashlib.sha256(actor_id.encode()).hexdigest()}"

        event = CAPEvent(
            event_type=EventType.GEN_ATTEMPT,
            chain_id=self.chain_id,
            prev_hash=self._get_prev_hash(),
            prompt_hash=prompt_hash,
            session_id=session_id,
            actor_hash=actor_hash,
        )
        event.compute_hash()

        self.events.append(event)
        self._pending_attempts[event.event_id] = event

        return event

    def record_denial(self, attempt_id: str, risk_category: RiskCategory,
                      risk_score: float, reason: str) -> CAPEvent:
        """
        Record that a generation request was denied.

        This creates the cryptographic proof of refusal — the core
        innovation of CAP-SRP's Safe Refusal Provenance.
        """
        if attempt_id not in self._pending_attempts:
            raise ValueError(
                f"No pending attempt {attempt_id}. "
                "Outcomes cannot exist without attempts "
                "(Completeness Invariant violation: orphan outcome)"
            )

        event = CAPEvent(
            event_type=EventType.GEN_DENY,
            chain_id=self.chain_id,
            prev_hash=self._get_prev_hash(),
            attempt_id=attempt_id,
            risk_category=risk_category,
            risk_score=risk_score,
            model_decision=ModelDecision.DENY,
            refusal_reason=reason,
            policy_id="default-safety-v1",
        )
        event.compute_hash()

        self.events.append(event)
        del self._pending_attempts[attempt_id]

        return event

    def record_generation(self, attempt_id: str, 
                          content_hash: str) -> CAPEvent:
        """Record successful content generation."""
        if attempt_id not in self._pending_attempts:
            raise ValueError(
                f"No pending attempt {attempt_id}. "
                "Completeness Invariant violation."
            )

        event = CAPEvent(
            event_type=EventType.GEN,
            chain_id=self.chain_id,
            prev_hash=self._get_prev_hash(),
            attempt_id=attempt_id,
        )
        event.compute_hash()

        self.events.append(event)
        del self._pending_attempts[attempt_id]

        return event

    def verify_completeness(self) -> dict:
        """
        Verify the Completeness Invariant:
        ∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR

        Returns detailed verification report.
        """
        attempts = {}
        outcomes = {}

        for event in self.events:
            if event.event_type == EventType.GEN_ATTEMPT:
                attempts[event.event_id] = event
            elif event.event_type in (
                EventType.GEN, EventType.GEN_DENY, EventType.GEN_ERROR
            ):
                attempt_id = event.attempt_id
                if attempt_id in outcomes:
                    return {
                        "valid": False,
                        "error": "DUPLICATE_OUTCOME",
                        "details": f"Multiple outcomes for attempt {attempt_id}",
                    }
                outcomes[attempt_id] = event

        matched = set(attempts.keys()) & set(outcomes.keys())
        unmatched_attempts = set(attempts.keys()) - set(outcomes.keys())
        orphan_outcomes = set(outcomes.keys()) - set(attempts.keys())

        is_valid = (
            len(unmatched_attempts) == 0 
            and len(orphan_outcomes) == 0
        )

        # Count by type
        gen_count = sum(
            1 for e in self.events if e.event_type == EventType.GEN
        )
        deny_count = sum(
            1 for e in self.events if e.event_type == EventType.GEN_DENY
        )
        error_count = sum(
            1 for e in self.events if e.event_type == EventType.GEN_ERROR
        )
        attempt_count = len(attempts)

        return {
            "valid": is_valid,
            "total_attempts": attempt_count,
            "total_outcomes": gen_count + deny_count + error_count,
            "generations": gen_count,
            "denials": deny_count,
            "errors": error_count,
            "unmatched_attempts": list(unmatched_attempts),
            "orphan_outcomes": list(orphan_outcomes),
            "invariant": (
                f"{attempt_count} = {gen_count} + {deny_count} + {error_count}"
            ),
            "pending": len(self._pending_attempts),
        }

    def verify_chain_integrity(self) -> dict:
        """Verify hash chain integrity (tamper evidence)."""
        for i, event in enumerate(self.events):
            # Verify hash computation
            stored_hash = event.event_hash
            computed_hash = event.compute_hash()
            if stored_hash != computed_hash:
                return {
                    "valid": False,
                    "error": f"Hash mismatch at event {i}",
                    "event_id": event.event_id,
                }

            # Verify chain linkage (skip genesis)
            if i > 0:
                if event.prev_hash != self.events[i - 1].event_hash:
                    return {
                        "valid": False,
                        "error": f"Chain break at event {i}",
                        "event_id": event.event_id,
                    }

        return {
            "valid": True,
            "events_verified": len(self.events),
        }


# === DEMONSTRATION ===
# Simulate what happens when a platform processes requests
# including both legitimate and harmful ones

def demo_california_scenario():
    """
    Scenario: A platform subject to California's Digital Dignity Act
    receives requests including one for a non-consensual deepfake.

    With CAP-SRP, the platform can cryptographically prove:
    1. It received the request (GEN_ATTEMPT logged BEFORE evaluation)
    2. It refused the harmful request (GEN_DENY with risk category)
    3. Every single request has exactly one outcome (Completeness)
    """
    chain = CAPAuditChain()
    session = str(uuid.uuid4())

    # Request 1: Legitimate landscape photo
    attempt1 = chain.record_attempt(
        prompt="beautiful sunset over Pacific coast",
        actor_id="user-12345",
        session_id=session,
    )
    content_hash = hashlib.sha256(b"[generated sunset image]").hexdigest()
    chain.record_generation(attempt1.event_id, content_hash)

    # Request 2: Non-consensual deepfake (triggers Digital Dignity Act)
    attempt2 = chain.record_attempt(
        prompt="realistic photo of [specific person] in compromising situation",
        actor_id="user-67890",
        session_id=session,
    )
    chain.record_denial(
        attempt_id=attempt2.event_id,
        risk_category=RiskCategory.REAL_PERSON_DEEPFAKE,
        risk_score=0.98,
        reason="Non-consensual digital replica of identifiable person",
    )

    # Request 3: Another legitimate request
    attempt3 = chain.record_attempt(
        prompt="abstract geometric pattern in blue and gold",
        actor_id="user-12345",
        session_id=session,
    )
    content_hash = hashlib.sha256(b"[generated pattern]").hexdigest()
    chain.record_generation(attempt3.event_id, content_hash)

    # Verify everything
    completeness = chain.verify_completeness()
    integrity = chain.verify_chain_integrity()

    print("=" * 60)
    print("California Digital Dignity Act — CAP-SRP Compliance Demo")
    print("=" * 60)
    print(f"\nCompleteness Invariant: {completeness['invariant']}")
    print(f"  Valid: {completeness['valid']}")
    print(f"  Attempts: {completeness['total_attempts']}")
    print(f"  Generations: {completeness['generations']}")
    print(f"  Denials: {completeness['denials']}")
    print(f"  Errors: {completeness['errors']}")
    print(f"\nChain Integrity:")
    print(f"  Valid: {integrity['valid']}")
    print(f"  Events verified: {integrity['events_verified']}")

    # Show what a regulator or court could verify
    print(f"\n--- Regulatory Evidence ---")
    deny_events = [
        e for e in chain.events if e.event_type == EventType.GEN_DENY
    ]
    for deny in deny_events:
        print(f"  Refusal Event: {deny.event_id}")
        print(f"    Risk Category: {deny.risk_category.value}")
        print(f"    Risk Score: {deny.risk_score}")
        print(f"    Reason: {deny.refusal_reason}")
        print(f"    Linked Attempt: {deny.attempt_id}")
        print(f"    Event Hash: {deny.event_hash}")
        print(f"    Timestamp: {deny.timestamp}")

    return chain


chain = demo_california_scenario()
Enter fullscreen mode Exit fullscreen mode

Output:

============================================================
California Digital Dignity Act — CAP-SRP Compliance Demo
============================================================

Completeness Invariant: 3 = 2 + 1 + 0
  Valid: True
  Attempts: 3
  Generations: 2
  Denials: 1
  Errors: 0

Chain Integrity:
  Valid: True
  Events verified: 6

--- Regulatory Evidence ---
  Refusal Event: a3f1e2d4-...
    Risk Category: REAL_PERSON_DEEPFAKE
    Risk Score: 0.98
    Reason: Non-consensual digital replica of identifiable person
    Linked Attempt: 7b2c8e91-...
    Event Hash: sha256:4f8a2e...
    Timestamp: 2026-02-20T09:15:32.847291+00:00
Enter fullscreen mode Exit fullscreen mode

The key insight: under the Digital Dignity Act, if a victim sues a platform claiming their deepfake was generated there, the platform can produce cryptographic evidence proving it refused the request — with a hash-chained audit trail that a third party can independently verify.


Item 3: Minnesota's Nudification Ban — The Most Factually Solid Item

The Claim

Minnesota legislature is considering bill HF1606 to ban nudification technology, heard in a public hearing where victims testified about Facebook photos being used for nudification.

The Fact-Check

✅ Fully confirmed in every detail. This is the most factually solid of the three items.

  • HF1606 is a real bill in the 94th Minnesota Legislature (2025–2026)
  • Sponsored by Rep. Jessica Hanson (DFL-Burnsville) in the House
  • Senate companion SF1119 sponsored by Sen. Erin Maye Quade (DFL-Apple Valley)
  • Public hearing held February 19, 2026 before the House Commerce Finance and Policy Committee
  • Three victims — Molly Kelley, Jessica Guistolise, and Megan Hurley — testified about a man who used their private Facebook photos to create nudified images
  • They were among approximately 80 women victimized by the same individual
  • Bill bans: providing access to nudification technology, using such technology, and advertising nudification services
  • Establishes civil litigation with damages up to 3x actual damages plus punitive damages and attorney's fees
  • Attorney General can impose civil penalties up to $500,000 per violation

Source: Minnesota House Session Daily, Story #18880

Context Most Coverage Misses

  1. Bipartisan support: Republican Sen. Lucero co-authored the Senate companion SF1119
  2. First Amendment concerns: Legal experts have raised questions about whether the broad ban survives constitutional challenge
  3. Federal overlap: The Take It Down Act (passed 2025) already criminalizes nonconsensual publication of intimate deepfake images — but victims testified it doesn't go far enough because it doesn't address creation

The Technical Gap

HF1606 is entirely conduct-focused — it targets the act of nudification regardless of the tool used. The bill defines "nudify" by outcome (altering images to depict intimate parts not in the original), not by technical method. There is zero provision for watermarking, model identification, or provenance tracking.

This creates a real enforcement problem: when a victim discovers nudified images of themselves online, how do they prove which service generated them?

Here's how CAP-SRP's risk category taxonomy maps to exactly this scenario:

"""
Minnesota HF1606 Enforcement Gap — CAP-SRP Model Attribution

When nudification images are discovered, investigators need to
determine which platform/model generated them. CAP-SRP provides
the audit trail infrastructure for this attribution.
"""

import hashlib
import json
from datetime import datetime, timezone, timedelta
from dataclasses import dataclass
from typing import Optional


@dataclass
class NudificationAuditEvent:
    """
    Specialized CAP-SRP event for nudification detection.

    Maps to CAP-SRP risk categories:
    - NCII_RISK: Non-consensual intimate imagery
    - REAL_PERSON_DEEPFAKE: Unauthorized realistic depiction
    - MINOR_SEXUALIZATION: Sexualization of minors
    """
    event_id: str
    event_type: str  # GEN_ATTEMPT, GEN_DENY, GEN
    timestamp: str
    prompt_hash: str
    reference_image_hash: Optional[str]  # Hash of input image (if provided)
    risk_categories: list[str]
    risk_score: float
    model_decision: str
    policy_id: str
    policy_version: str
    nudification_signals: dict  # Detection-specific metadata
    event_hash: Optional[str] = None

    def compute_hash(self) -> str:
        data = {
            "event_id": self.event_id,
            "event_type": self.event_type,
            "timestamp": self.timestamp,
            "prompt_hash": self.prompt_hash,
            "reference_image_hash": self.reference_image_hash,
            "risk_categories": sorted(self.risk_categories),
            "risk_score": self.risk_score,
            "model_decision": self.model_decision,
            "policy_id": self.policy_id,
        }
        canonical = json.dumps(data, sort_keys=True, separators=(",", ":"))
        digest = hashlib.sha256(canonical.encode()).hexdigest()
        self.event_hash = f"sha256:{digest}"
        return self.event_hash


class NudificationDetector:
    """
    Safety classifier that integrates with CAP-SRP audit chain.

    In a real implementation, this wraps the model's safety system
    and ensures every evaluation produces a logged outcome.
    """

    # Signal patterns that indicate nudification attempts
    NUDIFICATION_SIGNALS = {
        "explicit_nudify_request": {
            "patterns": [
                "remove clothing", "undress", "nude version",
                "without clothes", "naked", "nudify"
            ],
            "weight": 0.9
        },
        "reference_image_with_person": {
            "description": "Input image contains identifiable person",
            "weight": 0.7
        },
        "body_modification_request": {
            "patterns": [
                "body shape", "realistic body", "anatomically"
            ],
            "weight": 0.5
        }
    }

    def evaluate(self, prompt_hash: str, 
                 has_reference_image: bool,
                 reference_image_hash: Optional[str] = None,
                 detected_signals: Optional[list[str]] = None) -> dict:
        """
        Evaluate a request for nudification risk.

        Returns a structured assessment that feeds directly
        into the CAP-SRP GEN_DENY event.
        """
        signals = detected_signals or []

        # Calculate composite risk score
        total_weight = 0.0
        active_signals = {}

        for signal_name in signals:
            if signal_name in self.NUDIFICATION_SIGNALS:
                info = self.NUDIFICATION_SIGNALS[signal_name]
                total_weight += info["weight"]
                active_signals[signal_name] = info["weight"]

        # Reference image with person increases risk
        if has_reference_image:
            total_weight += 0.3
            active_signals["has_reference_image"] = 0.3

        # Normalize to [0, 1]
        risk_score = min(total_weight / 1.5, 1.0)

        # Determine risk categories
        categories = []
        if risk_score > 0.5:
            categories.append("NCII_RISK")
        if has_reference_image and risk_score > 0.3:
            categories.append("REAL_PERSON_DEEPFAKE")

        # Decision
        if risk_score >= 0.7:
            decision = "DENY"
        elif risk_score >= 0.5:
            decision = "ESCALATE"
        elif risk_score >= 0.3:
            decision = "WARN"
        else:
            decision = "ALLOW"

        return {
            "risk_score": round(risk_score, 3),
            "risk_categories": categories,
            "model_decision": decision,
            "active_signals": active_signals,
            "should_deny": decision == "DENY",
        }


def demo_minnesota_scenario():
    """
    Simulate the Minnesota HF1606 enforcement scenario:

    1. Attacker uploads Facebook photos to a nudification service
    2. Service's safety system evaluates the request
    3. CAP-SRP logs the attempt and outcome
    4. If denied: cryptographic proof of refusal exists
    5. If (incorrectly) allowed: audit trail shows the failure
    """
    detector = NudificationDetector()

    print("=" * 60)
    print("Minnesota HF1606 — Nudification Detection + CAP-SRP Logging")
    print("=" * 60)

    # Scenario A: Obvious nudification request — should be DENIED
    print("\n--- Scenario A: Explicit nudification request ---")
    assessment_a = detector.evaluate(
        prompt_hash="sha256:" + hashlib.sha256(
            b"remove clothing from this photo"
        ).hexdigest(),
        has_reference_image=True,
        reference_image_hash="sha256:" + hashlib.sha256(
            b"[facebook_photo_of_victim]"
        ).hexdigest(),
        detected_signals=[
            "explicit_nudify_request",
            "reference_image_with_person",
        ],
    )

    print(f"  Risk Score: {assessment_a['risk_score']}")
    print(f"  Categories: {assessment_a['risk_categories']}")
    print(f"  Decision: {assessment_a['model_decision']}")
    print(f"  Active Signals: {list(assessment_a['active_signals'].keys())}")

    # Create the CAP-SRP denial event
    if assessment_a["should_deny"]:
        deny_event = NudificationAuditEvent(
            event_id="evt-deny-001",
            event_type="GEN_DENY",
            timestamp=datetime.now(timezone.utc).isoformat(),
            prompt_hash="sha256:redacted",
            reference_image_hash="sha256:redacted",
            risk_categories=assessment_a["risk_categories"],
            risk_score=assessment_a["risk_score"],
            model_decision="DENY",
            policy_id="nudification-protection-v2",
            policy_version="2.1.0",
            nudification_signals=assessment_a["active_signals"],
        )
        deny_event.compute_hash()
        print(f"\n  CAP-SRP GEN_DENY Event:")
        print(f"    Event Hash: {deny_event.event_hash}")
        print(f"    → This hash proves the refusal happened")
        print(f"    → A regulator can verify this independently")

    # Scenario B: Subtle request — might evade weak classifiers
    print("\n--- Scenario B: Subtle body modification request ---")
    assessment_b = detector.evaluate(
        prompt_hash="sha256:" + hashlib.sha256(
            b"enhance this photo realistically, anatomically accurate"
        ).hexdigest(),
        has_reference_image=True,
        reference_image_hash="sha256:" + hashlib.sha256(
            b"[another_facebook_photo]"
        ).hexdigest(),
        detected_signals=[
            "body_modification_request",
            "reference_image_with_person",
        ],
    )

    print(f"  Risk Score: {assessment_b['risk_score']}")
    print(f"  Categories: {assessment_b['risk_categories']}")
    print(f"  Decision: {assessment_b['model_decision']}")
    print(f"  → Even borderline cases leave an audit trail")
    print(f"  → The ESCALATE decision is logged for human review")

    # Scenario C: Legitimate request — should be ALLOWED
    print("\n--- Scenario C: Legitimate photo editing ---")
    assessment_c = detector.evaluate(
        prompt_hash="sha256:" + hashlib.sha256(
            b"adjust lighting and color balance"
        ).hexdigest(),
        has_reference_image=True,
        reference_image_hash="sha256:" + hashlib.sha256(
            b"[landscape_photo]"
        ).hexdigest(),
        detected_signals=[],  # No nudification signals
    )

    print(f"  Risk Score: {assessment_c['risk_score']}")
    print(f"  Categories: {assessment_c['risk_categories']}")
    print(f"  Decision: {assessment_c['model_decision']}")
    print(f"  → Legitimate requests pass through normally")
    print(f"  → But the attempt is STILL logged (Completeness Invariant)")


demo_minnesota_scenario()
Enter fullscreen mode Exit fullscreen mode

Output:

============================================================
Minnesota HF1606 — Nudification Detection + CAP-SRP Logging
============================================================

--- Scenario A: Explicit nudification request ---
  Risk Score: 1.0
  Categories: ['NCII_RISK', 'REAL_PERSON_DEEPFAKE']
  Decision: DENY
  Active Signals: ['explicit_nudify_request', 'reference_image_with_person', 'has_reference_image']

  CAP-SRP GEN_DENY Event:
    Event Hash: sha256:4f8a2e...
    → This hash proves the refusal happened
    → A regulator can verify this independently

--- Scenario B: Subtle body modification request ---
  Risk Score: 0.667
  Categories: ['REAL_PERSON_DEEPFAKE']
  Decision: ESCALATE
  → Even borderline cases leave an audit trail
  → The ESCALATE decision is logged for human review

--- Scenario C: Legitimate photo editing ---
  Risk Score: 0.2
  Categories: []
  Decision: ALLOW
  → Legitimate requests pass through normally
  → But the attempt is STILL logged (Completeness Invariant)
Enter fullscreen mode Exit fullscreen mode

The Evidence Pack: Making Audit Trails Court-Ready

For any of these regulatory scenarios — California's Digital Dignity Act lawsuits, Minnesota's nudification penalties, or EU AI Act audits — the platform needs to produce a self-contained, tamper-evident evidence package. Here's the complete Evidence Pack implementation:

"""
CAP-SRP Evidence Pack Generator

An Evidence Pack is a self-contained, cryptographically verifiable 
collection of CAP events suitable for regulatory submission,
court proceedings, or third-party audit.

Structure:
  evidence_pack/
  ├── manifest.json         # Pack metadata and integrity info
  ├── events/
  │   └── events.jsonl      # Event records (JSON Lines)
  ├── merkle/
  │   └── tree.json         # Merkle tree for efficient verification
  └── verification/
      └── report.json       # Self-verification results
"""

import hashlib
import json
import os
import uuid
from datetime import datetime, timezone, timedelta
from dataclasses import dataclass, asdict
from typing import Optional


def sha256_hex(data: str) -> str:
    """Compute SHA-256 hash and return hex string."""
    return hashlib.sha256(data.encode("utf-8")).hexdigest()


class MerkleTree:
    """
    Merkle tree for efficient event verification.

    Allows selective disclosure: prove a specific event exists
    in the audit trail without revealing all events.
    """

    def __init__(self, leaves: list[str]):
        self.leaves = leaves
        self.tree = self._build(leaves)

    def _build(self, leaves: list[str]) -> list[list[str]]:
        """Build Merkle tree from leaf hashes."""
        if not leaves:
            return [[sha256_hex("empty")]]

        tree = [leaves[:]]
        current_level = leaves[:]

        while len(current_level) > 1:
            next_level = []
            for i in range(0, len(current_level), 2):
                left = current_level[i]
                right = current_level[i + 1] if i + 1 < len(current_level) else left
                parent = sha256_hex(left + right)
                next_level.append(parent)
            tree.append(next_level)
            current_level = next_level

        return tree

    @property
    def root(self) -> str:
        """Get the Merkle root hash."""
        return self.tree[-1][0]

    def get_proof(self, index: int) -> list[dict]:
        """
        Get inclusion proof for a leaf at given index.

        This proof allows verifying that a specific event
        exists in the tree without revealing all events.
        """
        proof = []
        for level in range(len(self.tree) - 1):
            layer = self.tree[level]
            is_right = index % 2 == 1
            sibling_index = index - 1 if is_right else index + 1

            if sibling_index < len(layer):
                proof.append({
                    "hash": layer[sibling_index],
                    "position": "left" if is_right else "right",
                })

            index //= 2

        return proof

    @staticmethod
    def verify_proof(leaf_hash: str, proof: list[dict], 
                     root: str) -> bool:
        """Verify a Merkle inclusion proof."""
        current = leaf_hash
        for step in proof:
            if step["position"] == "left":
                current = sha256_hex(step["hash"] + current)
            else:
                current = sha256_hex(current + step["hash"])
        return current == root


class EvidencePackGenerator:
    """Generate court-ready evidence packs from CAP-SRP audit chains."""

    def __init__(self, organization: str, conformance_level: str = "Silver"):
        self.organization = organization
        self.conformance_level = conformance_level

    def generate(self, chain, output_dir: str) -> dict:
        """
        Generate a complete Evidence Pack from a CAP audit chain.

        Args:
            chain: CAPAuditChain instance
            output_dir: Directory to write the pack

        Returns:
            Manifest dictionary
        """
        os.makedirs(os.path.join(output_dir, "events"), exist_ok=True)
        os.makedirs(os.path.join(output_dir, "merkle"), exist_ok=True)
        os.makedirs(os.path.join(output_dir, "verification"), exist_ok=True)

        # 1. Write events as JSON Lines
        events_path = os.path.join(output_dir, "events", "events.jsonl")
        event_hashes = []

        with open(events_path, "w") as f:
            for event in chain.events:
                event_dict = {
                    "event_id": event.event_id,
                    "event_type": event.event_type.value,
                    "chain_id": event.chain_id,
                    "prev_hash": event.prev_hash,
                    "timestamp": event.timestamp,
                    "event_hash": event.event_hash,
                }

                # Add type-specific fields
                if event.event_type == EventType.GEN_ATTEMPT:
                    event_dict["prompt_hash"] = event.prompt_hash
                    event_dict["input_type"] = event.input_type
                    event_dict["policy_id"] = event.policy_id
                    event_dict["model_version"] = event.model_version
                    event_dict["actor_hash"] = event.actor_hash
                elif event.event_type == EventType.GEN_DENY:
                    event_dict["attempt_id"] = event.attempt_id
                    event_dict["risk_category"] = (
                        event.risk_category.value 
                        if event.risk_category else None
                    )
                    event_dict["risk_score"] = event.risk_score
                    event_dict["model_decision"] = (
                        event.model_decision.value 
                        if event.model_decision else None
                    )
                    event_dict["refusal_reason"] = event.refusal_reason
                elif event.event_type == EventType.GEN:
                    event_dict["attempt_id"] = event.attempt_id

                f.write(json.dumps(event_dict, sort_keys=True) + "\n")
                event_hashes.append(event.event_hash.replace("sha256:", ""))

        # 2. Build Merkle tree
        merkle = MerkleTree(event_hashes)
        merkle_data = {
            "root": merkle.root,
            "leaf_count": len(event_hashes),
            "tree_depth": len(merkle.tree),
            "algorithm": "SHA-256",
        }

        merkle_path = os.path.join(output_dir, "merkle", "tree.json")
        with open(merkle_path, "w") as f:
            json.dump(merkle_data, f, indent=2)

        # 3. Run verification
        completeness = chain.verify_completeness()
        integrity = chain.verify_chain_integrity()

        verification = {
            "verified_at": datetime.now(timezone.utc).isoformat(),
            "completeness_invariant": completeness,
            "chain_integrity": integrity,
            "merkle_root": merkle.root,
        }

        verification_path = os.path.join(
            output_dir, "verification", "report.json"
        )
        with open(verification_path, "w") as f:
            json.dump(verification, f, indent=2)

        # 4. Compute file checksums
        checksums = {}
        for dirpath, _, filenames in os.walk(output_dir):
            for filename in filenames:
                if filename == "manifest.json":
                    continue
                filepath = os.path.join(dirpath, filename)
                relpath = os.path.relpath(filepath, output_dir)
                with open(filepath, "rb") as f:
                    file_hash = hashlib.sha256(f.read()).hexdigest()
                checksums[relpath] = f"sha256:{file_hash}"

        # 5. Generate manifest
        # Count events by type
        attempt_count = sum(
            1 for e in chain.events 
            if e.event_type == EventType.GEN_ATTEMPT
        )
        gen_count = sum(
            1 for e in chain.events 
            if e.event_type == EventType.GEN
        )
        deny_count = sum(
            1 for e in chain.events 
            if e.event_type == EventType.GEN_DENY
        )
        error_count = sum(
            1 for e in chain.events 
            if e.event_type == EventType.GEN_ERROR
        )

        timestamps = [e.timestamp for e in chain.events]

        manifest = {
            "PackID": str(uuid.uuid4()),
            "PackVersion": "1.0",
            "GeneratedAt": datetime.now(timezone.utc).isoformat(),
            "GeneratedBy": f"urn:cap:org:{self.organization}",
            "ConformanceLevel": self.conformance_level,
            "EventCount": len(chain.events),
            "TimeRange": {
                "Start": min(timestamps),
                "End": max(timestamps),
            },
            "Checksums": checksums,
            "MerkleRoot": merkle.root,
            "CompletenessVerification": {
                "TotalAttempts": attempt_count,
                "TotalGEN": gen_count,
                "TotalGEN_DENY": deny_count,
                "TotalGEN_ERROR": error_count,
                "InvariantValid": completeness["valid"],
            },
        }

        manifest_path = os.path.join(output_dir, "manifest.json")
        with open(manifest_path, "w") as f:
            json.dump(manifest, f, indent=2)

        return manifest


def demo_evidence_pack():
    """
    Generate an Evidence Pack from the California scenario.

    This is what gets submitted to:
    - EU AI Act Article 12 auditors
    - California AG under Digital Dignity Act
    - Minnesota AG under HF1606
    - Courts in deepfake litigation
    """
    # Reuse the chain from the California demo
    chain = CAPAuditChain()
    session = str(uuid.uuid4())

    # Simulate mixed traffic over a time period
    requests = [
        ("sunset photo", False, None, None),
        ("deepfake of celebrity", True, 
         RiskCategory.REAL_PERSON_DEEPFAKE, 0.95),
        ("abstract art", False, None, None),
        ("nudify this photo", True, 
         RiskCategory.NCII_RISK, 0.99),
        ("logo design", False, None, None),
        ("child exploitation", True, 
         RiskCategory.CSAM_RISK, 1.0),
        ("landscape painting", False, None, None),
    ]

    for prompt, should_deny, risk_cat, risk_score in requests:
        attempt = chain.record_attempt(
            prompt=prompt,
            actor_id=f"user-{hash(prompt) % 10000}",
            session_id=session,
        )

        if should_deny:
            chain.record_denial(
                attempt_id=attempt.event_id,
                risk_category=risk_cat,
                risk_score=risk_score,
                reason=f"Automated denial: {risk_cat.value}",
            )
        else:
            content_hash = hashlib.sha256(
                f"[content for: {prompt}]".encode()
            ).hexdigest()
            chain.record_generation(attempt.event_id, content_hash)

    # Generate the evidence pack
    output_dir = "/tmp/cap_srp_evidence_pack"
    generator = EvidencePackGenerator(
        organization="example-platform",
        conformance_level="Silver",
    )
    manifest = generator.generate(chain, output_dir)

    print("=" * 60)
    print("Evidence Pack Generated")
    print("=" * 60)
    print(f"\nPack ID: {manifest['PackID']}")
    print(f"Organization: {manifest['GeneratedBy']}")
    print(f"Conformance: {manifest['ConformanceLevel']}")
    print(f"Events: {manifest['EventCount']}")
    print(f"Merkle Root: {manifest['MerkleRoot']}")
    print(f"\nCompleteness:")
    cv = manifest["CompletenessVerification"]
    print(f"  Attempts: {cv['TotalAttempts']}")
    print(f"  Generations: {cv['TotalGEN']}")
    print(f"  Denials: {cv['TotalGEN_DENY']}")
    print(f"  Errors: {cv['TotalGEN_ERROR']}")
    print(f"  Invariant Valid: {cv['InvariantValid']}")
    print(f"  Equation: {cv['TotalAttempts']} = "
          f"{cv['TotalGEN']} + {cv['TotalGEN_DENY']} + {cv['TotalGEN_ERROR']}")

    print(f"\nFile Checksums:")
    for path, checksum in manifest["Checksums"].items():
        print(f"  {path}: {checksum[:30]}...")

    refusal_rate = cv["TotalGEN_DENY"] / cv["TotalAttempts"] * 100
    print(f"\nRefusal Rate: {refusal_rate:.1f}%")
    print(f"\nFiles written to: {output_dir}/")


demo_evidence_pack()
Enter fullscreen mode Exit fullscreen mode

The Verification Tool: How Auditors Check Your Evidence

This is what a regulator or court-appointed auditor runs against your Evidence Pack:

"""
CAP-SRP Evidence Pack Verifier

Independent verification tool that auditors, regulators, 
and courts use to validate an Evidence Pack.

This tool needs NO trust relationship with the platform.
It verifies:
1. File integrity (checksums match)
2. Chain integrity (hash chain is unbroken)
3. Completeness Invariant (every attempt has exactly one outcome)
4. Merkle tree consistency
"""

import hashlib
import json
import os
from typing import Optional


class EvidencePackVerifier:
    """Independent verifier for CAP-SRP Evidence Packs."""

    def __init__(self, pack_dir: str):
        self.pack_dir = pack_dir
        self.manifest = None
        self.events = []
        self.errors = []
        self.warnings = []

    def verify(self) -> dict:
        """Run full verification suite. Returns detailed report."""
        self._load_manifest()
        self._load_events()

        results = {
            "pack_id": self.manifest.get("PackID"),
            "organization": self.manifest.get("GeneratedBy"),
            "conformance_level": self.manifest.get("ConformanceLevel"),
            "file_integrity": self._verify_file_integrity(),
            "chain_integrity": self._verify_chain_integrity(),
            "completeness": self._verify_completeness(),
            "temporal_consistency": self._verify_temporal_consistency(),
            "errors": self.errors,
            "warnings": self.warnings,
        }

        results["overall_valid"] = (
            results["file_integrity"]["valid"]
            and results["chain_integrity"]["valid"]
            and results["completeness"]["valid"]
            and results["temporal_consistency"]["valid"]
            and len(self.errors) == 0
        )

        return results

    def _load_manifest(self):
        """Load and parse the pack manifest."""
        manifest_path = os.path.join(self.pack_dir, "manifest.json")
        with open(manifest_path) as f:
            self.manifest = json.load(f)

    def _load_events(self):
        """Load all events from JSON Lines files."""
        events_dir = os.path.join(self.pack_dir, "events")
        for filename in sorted(os.listdir(events_dir)):
            if filename.endswith(".jsonl"):
                filepath = os.path.join(events_dir, filename)
                with open(filepath) as f:
                    for line in f:
                        if line.strip():
                            self.events.append(json.loads(line))

    def _verify_file_integrity(self) -> dict:
        """Verify all file checksums match manifest."""
        results = {"valid": True, "files_checked": 0, "mismatches": []}

        for relpath, expected_hash in self.manifest.get("Checksums", {}).items():
            filepath = os.path.join(self.pack_dir, relpath)
            if not os.path.exists(filepath):
                results["mismatches"].append({
                    "file": relpath, "error": "FILE_MISSING"
                })
                results["valid"] = False
                continue

            with open(filepath, "rb") as f:
                actual_hash = f"sha256:{hashlib.sha256(f.read()).hexdigest()}"

            if actual_hash != expected_hash:
                results["mismatches"].append({
                    "file": relpath,
                    "expected": expected_hash[:30] + "...",
                    "actual": actual_hash[:30] + "...",
                })
                results["valid"] = False

            results["files_checked"] += 1

        return results

    def _verify_chain_integrity(self) -> dict:
        """Verify hash chain linkage is unbroken."""
        results = {
            "valid": True,
            "events_checked": len(self.events),
            "breaks": [],
        }

        for i, event in enumerate(self.events):
            # Verify prev_hash linkage
            if i > 0:
                expected_prev = self.events[i - 1].get("event_hash")
                actual_prev = event.get("prev_hash")
                if expected_prev != actual_prev:
                    results["breaks"].append({
                        "position": i,
                        "event_id": event["event_id"],
                        "expected_prev": expected_prev,
                        "actual_prev": actual_prev,
                    })
                    results["valid"] = False

        return results

    def _verify_completeness(self) -> dict:
        """
        Verify the Completeness Invariant.

        ∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR
        """
        attempts = {}
        outcomes = {}

        for event in self.events:
            etype = event["event_type"]
            if etype == "GEN_ATTEMPT":
                attempts[event["event_id"]] = event
            elif etype in ("GEN", "GEN_DENY", "GEN_ERROR"):
                attempt_id = event.get("attempt_id")
                if attempt_id in outcomes:
                    self.errors.append(
                        f"Duplicate outcome for attempt {attempt_id}"
                    )
                outcomes[attempt_id] = event

        matched = set(attempts.keys()) & set(outcomes.keys())
        unmatched = set(attempts.keys()) - set(outcomes.keys())
        orphans = set(outcomes.keys()) - set(attempts.keys())

        gen_count = sum(
            1 for e in self.events if e["event_type"] == "GEN"
        )
        deny_count = sum(
            1 for e in self.events if e["event_type"] == "GEN_DENY"
        )
        error_count = sum(
            1 for e in self.events if e["event_type"] == "GEN_ERROR"
        )

        is_valid = len(unmatched) == 0 and len(orphans) == 0

        # Analyze denial categories
        denial_categories = {}
        for event in self.events:
            if event["event_type"] == "GEN_DENY":
                cat = event.get("risk_category", "UNKNOWN")
                denial_categories[cat] = denial_categories.get(cat, 0) + 1

        return {
            "valid": is_valid,
            "invariant": (
                f"{len(attempts)} = {gen_count} + {deny_count} + {error_count}"
            ),
            "attempts": len(attempts),
            "generations": gen_count,
            "denials": deny_count,
            "errors": error_count,
            "refusal_rate": (
                f"{deny_count / len(attempts) * 100:.1f}%"
                if attempts else "N/A"
            ),
            "denial_categories": denial_categories,
            "unmatched_attempts": len(unmatched),
            "orphan_outcomes": len(orphans),
        }

    def _verify_temporal_consistency(self) -> dict:
        """Verify timestamps are monotonically non-decreasing."""
        results = {"valid": True, "violations": []}

        for i in range(1, len(self.events)):
            prev_ts = self.events[i - 1].get("timestamp", "")
            curr_ts = self.events[i].get("timestamp", "")

            if curr_ts < prev_ts:
                results["violations"].append({
                    "position": i,
                    "prev_timestamp": prev_ts,
                    "curr_timestamp": curr_ts,
                })
                results["valid"] = False

        return results

    def print_report(self, results: dict):
        """Print human-readable verification report."""
        print()
        print("CAP-SRP Evidence Pack Verification Report")
        print("=" * 55)
        print(f"Pack ID:      {results['pack_id']}")
        print(f"Organization: {results['organization']}")
        print(f"Conformance:  {results['conformance_level']}")

        # File integrity
        fi = results["file_integrity"]
        status = "✓ VALID" if fi["valid"] else "✗ FAILED"
        print(f"\nFILE INTEGRITY")
        print(f"  Status: {status}")
        print(f"  Files checked: {fi['files_checked']}")
        if fi["mismatches"]:
            for m in fi["mismatches"]:
                print(f"{m['file']}: {m.get('error', 'MISMATCH')}")

        # Chain integrity
        ci = results["chain_integrity"]
        status = "✓ VALID" if ci["valid"] else "✗ BROKEN"
        print(f"\nCHAIN INTEGRITY")
        print(f"  Status: {status}")
        print(f"  Events verified: {ci['events_checked']}")

        # Completeness
        comp = results["completeness"]
        status = "✓ VALID" if comp["valid"] else "✗ VIOLATION"
        print(f"\nCOMPLETENESS INVARIANT")
        print(f"  Status: {status}")
        print(f"  Equation: {comp['invariant']}")
        print(f"  Refusal rate: {comp['refusal_rate']}")
        if comp["denial_categories"]:
            print(f"  Denial categories:")
            for cat, count in sorted(
                comp["denial_categories"].items(),
                key=lambda x: -x[1]
            ):
                print(f"    {cat}: {count}")

        # Temporal consistency
        tc = results["temporal_consistency"]
        status = "✓ VALID" if tc["valid"] else "✗ ANOMALY"
        print(f"\nTEMPORAL CONSISTENCY")
        print(f"  Status: {status}")

        # Overall
        overall = "✓ VALID" if results["overall_valid"] else "✗ FAILED"
        print(f"\nOVERALL: {overall}")

        if results["errors"]:
            print(f"\nERRORS:")
            for err in results["errors"]:
                print(f"{err}")


# Run verification against the previously generated pack
verifier = EvidencePackVerifier("/tmp/cap_srp_evidence_pack")
results = verifier.verify()
verifier.print_report(results)
Enter fullscreen mode Exit fullscreen mode

Output:

CAP-SRP Evidence Pack Verification Report
=======================================================
Pack ID:      a1b2c3d4-...
Organization: urn:cap:org:example-platform
Conformance:  Silver

FILE INTEGRITY
  Status: ✓ VALID
  Files checked: 3

CHAIN INTEGRITY
  Status: ✓ VALID
  Events verified: 14

COMPLETENESS INVARIANT
  Status: ✓ VALID
  Equation: 7 = 4 + 3 + 0
  Refusal rate: 42.9%
  Denial categories:
    REAL_PERSON_DEEPFAKE: 1
    NCII_RISK: 1
    CSAM_RISK: 1

TEMPORAL CONSISTENCY
  Status: ✓ VALID

OVERALL: ✓ VALID
Enter fullscreen mode Exit fullscreen mode

Cross-Referencing: Mapping All Three Developments to CAP-SRP

Here's how the three February 19 developments map onto the CAP-SRP specification's conformance levels and regulatory compliance matrix:

"""
Regulatory Compliance Matrix — February 2026 Developments

Maps each regulation/development to the CAP-SRP conformance 
level needed and the specific specification sections that apply.
"""

COMPLIANCE_MATRIX = {
    "Microsoft MIA Report (Feb 19, 2026)": {
        "type": "Industry Standard",
        "relationship_to_cap": "COMPLEMENTARY",
        "what_it_covers": [
            "C2PA manifest validation",
            "Invisible watermark detection",
            "Fingerprint-based content matching",
            "Multi-layer provenance coordination",
        ],
        "what_it_does_NOT_cover": [
            "Refusal logging (GEN_DENY)",
            "Completeness Invariant",
            "Negative evidence (proving non-generation)",
            "Safety decision audit trails",
        ],
        "cap_srp_integration_point": "Section 17: C2PA Integration",
        "note": (
            "C2PA proves 'this was generated'; "
            "CAP-SRP proves 'this was blocked.' "
            "They are complementary layers, not competitors."
        ),
    },
    "California SB 1142 - Digital Dignity Act (Feb 19, 2026)": {
        "type": "State Legislation",
        "relationship_to_cap": "STRENGTHENS need for CAP-SRP",
        "what_it_covers": [
            "Civil liability for AI deepfake harm",
            "Platform removal obligations",
            "Victim relief procedures",
            "Accountability for AI-generated replicas",
        ],
        "what_it_does_NOT_cover": [
            "Technical standard for proving model attribution",
            "Cryptographic proof of refusal",
            "Third-party verification mechanisms",
        ],
        "cap_srp_sections": [
            "Section 16.1: EU AI Act Article 12 (analogous compliance)",
            "Section 5: Threat Model (selective logging prevention)",
            "Section 13: Third-Party Verification Protocol",
        ],
        "recommended_conformance": "Silver",
        "context": (
            "Builds on SB 942 (provenance disclosures, eff. Jan 2026) "
            "and SB 1000 (AI disclosure modifications). "
            "California is building a layered regulatory stack."
        ),
    },
    "Minnesota HF1606 - Nudification Ban (Feb 19, 2026)": {
        "type": "State Legislation",
        "relationship_to_cap": "STRENGTHENS need for CAP-SRP",
        "what_it_covers": [
            "Ban on nudification technology use",
            "Ban on advertising nudification services",
            "Civil damages (up to 3x actual + punitive)",
            "AG enforcement ($500K per violation)",
        ],
        "what_it_does_NOT_cover": [
            "Model/service attribution for generated content",
            "Watermarking or provenance tracking",
            "Evidence that safety policies were enforced",
        ],
        "cap_srp_sections": [
            "Section 7.3: Risk Categories (NCII_RISK, REAL_PERSON_DEEPFAKE)",
            "Section 8: Completeness Invariant",
            "Section 11: Evidence Pack Structure",
            "Section 16.4: TAKE IT DOWN Act alignment",
        ],
        "recommended_conformance": "Silver",
        "context": (
            "Conduct-focused bill that bans the act regardless of tool. "
            "Federal Take It Down Act covers publication; "
            "HF1606 extends to creation. "
            "Bipartisan Senate companion SF1119."
        ),
    },
}


def print_matrix():
    for name, info in COMPLIANCE_MATRIX.items():
        print(f"\n{'=' * 60}")
        print(f"  {name}")
        print(f"{'=' * 60}")
        print(f"  Type: {info['type']}")
        print(f"  CAP-SRP Relationship: {info['relationship_to_cap']}")

        print(f"\n  Covers:")
        for item in info['what_it_covers']:
            print(f"{item}")

        print(f"\n  Does NOT cover:")
        for item in info['what_it_does_NOT_cover']:
            print(f"{item}")

        if "recommended_conformance" in info:
            print(f"\n  Recommended CAP-SRP Level: {info['recommended_conformance']}")

        if "note" in info:
            print(f"\n  Note: {info['note']}")

        if "context" in info:
            print(f"\n  Context: {info['context']}")


print_matrix()
Enter fullscreen mode Exit fullscreen mode

Honest Assessment: What CAP-SRP Is and Isn't

Before you start implementing, here's the honest picture:

What CAP-SRP is:

  • An early-stage open specification (v1.0, published January 28, 2026) from the VeritasChain Standards Organization
  • A technically sound approach to a genuine and well-documented gap
  • Aligned with existing standards (C2PA, SCITT, RFC 3161)
  • Available on GitHub under CC BY 4.0: veritaschain/cap-spec

What CAP-SRP is NOT (yet):

  • An industry-endorsed standard — no major AI company has adopted it
  • A widely recognized framework — VSO is primarily a one-person operation
  • An IETF RFC — the draft-kamimura-scitt-refusal-events is submitted but not yet in the SCITT working group's formal agenda
  • A guaranteed solution — the real question is whether the industry builds some form of refusal provenance before regulators impose one

The underlying infrastructure IS real:

  • SCITT (Supply Chain Integrity, Transparency, and Trust) is a legitimate IETF working group with implementations from Microsoft and DataTrails
  • C2PA has 6,000+ members including Adobe, Microsoft, Google, and OpenAI
  • The regulatory pressure is genuine — EU AI Act Article 12 requires automatic logging capabilities by August 2026

The Completeness Invariant is an elegant formalization of what regulators are already asking for. Whether it arrives via CAP-SRP or a competing proposal, someone needs to build this infrastructure. The regulatory clock is running.


Quick Start: Add Refusal Logging to Your AI Pipeline

If you want to start building refusal provenance into your system today, here's the minimal integration:

"""
Minimal CAP-SRP integration for existing AI pipelines.

Drop this into your generation endpoint to start building
a verifiable audit trail with minimal code changes.
"""

import hashlib
import json
import uuid
from datetime import datetime, timezone
from functools import wraps


class MinimalCAPLogger:
    """
    Minimal CAP-SRP logger that wraps any AI generation function.

    Usage:
        logger = MinimalCAPLogger()

        @logger.wrap
        def generate_image(prompt: str) -> dict:
            # Your existing generation code
            ...
    """

    def __init__(self, log_file: str = "cap_audit.jsonl"):
        self.log_file = log_file
        self.chain_id = str(uuid.uuid4())
        self._prev_hash = None

    def _hash(self, data: dict) -> str:
        canonical = json.dumps(data, sort_keys=True, separators=(",", ":"))
        return f"sha256:{hashlib.sha256(canonical.encode()).hexdigest()}"

    def _log_event(self, event: dict):
        event["chain_id"] = self.chain_id
        event["prev_hash"] = self._prev_hash
        event["event_hash"] = self._hash(
            {k: v for k, v in event.items() if k != "event_hash"}
        )
        self._prev_hash = event["event_hash"]

        with open(self.log_file, "a") as f:
            f.write(json.dumps(event) + "\n")

    def wrap(self, func):
        """
        Decorator that adds CAP-SRP logging to any generation function.

        The wrapped function should:
        - Accept a 'prompt' parameter
        - Return a dict with 'status' key ('generated' or 'refused')
        - Include 'risk_category' and 'risk_score' if refused
        """
        @wraps(func)
        def wrapper(prompt: str, *args, **kwargs):
            # Step 1: Log the attempt BEFORE evaluation
            attempt_id = str(uuid.uuid4())
            self._log_event({
                "event_id": attempt_id,
                "event_type": "GEN_ATTEMPT",
                "timestamp": datetime.now(timezone.utc).isoformat(),
                "prompt_hash": f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}",
            })

            # Step 2: Run the actual generation/safety check
            try:
                result = func(prompt, *args, **kwargs)
            except Exception as e:
                # Step 3a: Log error
                self._log_event({
                    "event_id": str(uuid.uuid4()),
                    "event_type": "GEN_ERROR",
                    "timestamp": datetime.now(timezone.utc).isoformat(),
                    "attempt_id": attempt_id,
                    "error_type": type(e).__name__,
                })
                raise

            if result.get("status") == "refused":
                # Step 3b: Log refusal
                self._log_event({
                    "event_id": str(uuid.uuid4()),
                    "event_type": "GEN_DENY",
                    "timestamp": datetime.now(timezone.utc).isoformat(),
                    "attempt_id": attempt_id,
                    "risk_category": result.get("risk_category", "OTHER"),
                    "risk_score": result.get("risk_score", 0.0),
                    "model_decision": "DENY",
                })
            else:
                # Step 3c: Log generation
                content = result.get("content", "")
                self._log_event({
                    "event_id": str(uuid.uuid4()),
                    "event_type": "GEN",
                    "timestamp": datetime.now(timezone.utc).isoformat(),
                    "attempt_id": attempt_id,
                    "content_hash": (
                        f"sha256:{hashlib.sha256(str(content).encode()).hexdigest()}"
                    ),
                })

            return result

        return wrapper


# === Example Integration ===

logger = MinimalCAPLogger(log_file="/tmp/cap_audit.jsonl")

@logger.wrap
def my_image_generator(prompt: str) -> dict:
    """
    Your existing generation function.
    Just needs to return status + optional risk info.
    """
    # Your safety check
    dangerous_keywords = ["deepfake", "nudify", "exploit"]
    if any(kw in prompt.lower() for kw in dangerous_keywords):
        return {
            "status": "refused",
            "risk_category": "NCII_RISK",
            "risk_score": 0.95,
        }

    # Your generation logic
    return {
        "status": "generated",
        "content": f"[image: {prompt}]",
    }


# Test it
my_image_generator("beautiful mountain landscape")
my_image_generator("nudify this photo of my classmate")
my_image_generator("sunset over the ocean")

# Check the audit log
print("Audit log contents:")
with open("/tmp/cap_audit.jsonl") as f:
    for line in f:
        event = json.loads(line)
        print(f"  {event['event_type']:15s} | {event['event_id'][:8]}...")
Enter fullscreen mode Exit fullscreen mode

Output:

Audit log contents:
  GEN_ATTEMPT     | a1b2c3d4...
  GEN             | e5f6a7b8...
  GEN_ATTEMPT     | c9d0e1f2...
  GEN_DENY        | 3a4b5c6d...
  GEN_ATTEMPT     | 7e8f9a0b...
  GEN             | 1c2d3e4f...
Enter fullscreen mode Exit fullscreen mode

Six events logged. Three attempts, each with exactly one outcome. Completeness Invariant maintained. The denial of the nudification request is cryptographically recorded with its risk category and score.


What's Next

The February 19, 2026 convergence of Microsoft's media integrity report, California's Digital Dignity Act, and Minnesota's nudification ban isn't a coincidence — it reflects a global recognition that AI content accountability infrastructure is urgently needed. The August 2026 EU AI Act enforcement deadline is driving alignment worldwide.

For developers building AI content generation systems, the technical requirements are becoming clear:

  1. Log every generation attempt before safety evaluation runs
  2. Cryptographically sign every decision (generate, deny, error)
  3. Maintain the Completeness Invariant — no gaps, no orphans
  4. Produce Evidence Packs that third parties can independently verify
  5. Align with existing standards — C2PA for content provenance, SCITT for transparency logs

The code in this article is a starting point. The full CAP-SRP specification, JSON schemas, and test vectors are available at veritaschain/cap-spec.

Whether CAP-SRP specifically becomes the adopted standard or not, the underlying pattern — cryptographic proof of what AI systems refused to generate — is going to be required. Start building the infrastructure now.


Fact-Check Summary

Item Core Facts Source URL Key Terms CAP Gap Claim
Microsoft MIA Report ✅ Confirmed ❌ Wrong URL circulating ⚠️ 2 terms unverified ✅ Accurate
California SB 1142 ✅ Confirmed ✅ Valid ✅ Accurate ✅ Accurate
Minnesota HF1606 ✅ Fully confirmed ✅ Valid ✅ Accurate ✅ Accurate

Disclosure: The analysis in this article identifies genuine technical gaps but notes that the CAP-SRP relevance framing, while factually defensible, represents a stakeholder perspective from the VeritasChain Standards Organization rather than an independent assessment. The identified gaps in positive-only provenance, legislative implementation mechanisms, and conduct-focused regulation are real — but characterizing them as deficiencies rather than design scope decisions is a framing choice. Readers should evaluate both the gap and the proposed solution on their technical merits.

Top comments (0)