Between March 1 and March 5, 2026, five unrelated events — from Brussels, Cleveland, Redmond, Zurich, and Washington — converged on the same architectural truth: the industry is building increasingly sophisticated infrastructure to track what AI creates, but remains structurally unable to prove what AI refused to create. This article fact-checks each event against primary sources, corrects errors found in initial reporting, and walks you through building the cryptographic layer that fills the gap — updated for CAP-SRP v1.1.
TL;DR
The EU published its second draft Code of Practice on AI-generated content marking — but the "automatic logs" requirement everyone keeps citing is from a completely different article than they think. A Case Western Reserve/UCLA team demonstrated that C2PA manifests and watermarks can cryptographically contradict each other without breaking either system. Microsoft published a 60-method research blueprint — not the product launch some outlets reported. Google's SynthID-Text watermark got its first formal theoretical analysis, and it's not pretty. And NCMEC's "one million AI-CSAM reports" turned out to include 380,000 from Amazon that contained zero actual AI-generated material.
Every story reinforces the same structural gap: existing provenance infrastructure operates in the domain of content that exists. None of it addresses what AI systems refused to generate.
This article:
- Fact-checks all five events against primary sources (with corrections)
- Maps each gap to CAP-SRP's Completeness Invariant and v1.1 event model
- Provides working Python code for the new v1.1 event types — including
ACCOUNT_ACTION,POLICY_VERSION, and the three new invariants - Shows how the "Integrity Clash" paper validates CAP-SRP's core design assumption
GitHub: veritaschain/cap-spec · IETF Draft: draft-kamimura-scitt-refusal-events · License: CC BY 4.0
Table of Contents
- Event 1: EU Code of Practice Second Draft — And an Article 50 Misconception
- Event 2: C2PA's "Integrity Clash" — Cryptographic Truth Meets Structural Contradiction
- Event 3: Microsoft's Research Blueprint — Not a Product Launch
- Event 4: SynthID-Text Gets Its First Theoretical Break
- Event 5: NCMEC's Million AI-CSAM Reports — And the Zero That Matters
- The Architecture: CAP-SRP v1.1 Event Model
- Building the New Invariants in Code
- How the Integrity Clash Validates CAP-SRP's Design
- Regulatory Timeline: What Developers Need to Track
- Disclosure and Conclusion
Event 1: EU Code of Practice Second Draft — And an Article 50 Misconception {#event-1-eu-code-of-practice-second-draft}
What happened
On March 5, 2026 (not March 3 as some outlets reported), the European Commission published the second draft of its Code of Practice on marking and labelling of AI-generated content. This Code operationalizes Article 50 of the EU AI Act, which requires transparency obligations for AI systems — including marking AI-generated content as such.
The first draft, published in December 2025, proposed a "multilayered approach" combining metadata (C2PA-style), watermarking, fingerprinting, and logging. Analysis from Kirkland & Ellis, Cooley, and Bird & Bird documented that even the first draft already treated fingerprinting and logging as secondary commitments to the core metadata+watermarking pairing. The January 22, 2026 WG2 workshop was actively debating whether the AI-generated vs. AI-assisted taxonomy should be retained.
The second draft was published just one day before this article — detailed public analyses haven't appeared yet. Claims about specific changes (taxonomy removal, fingerprinting becoming optional, a "two-tier" rebranding) are plausible given the first draft's trajectory, but cannot be independently confirmed from any public source as of March 6.
Fact-check: the Article 50(2) "automatic logs" error
Here's the most important correction. Multiple commentaries — including initial Daily Watch reporting — claim that "Article 50(2) requires automatic logs." This is wrong. Let me show you why it matters.
Article 50 is about transparency obligations: making outputs detectable as AI-generated, disclosing to people that they're interacting with AI, and labeling deepfakes. It says nothing about logging.
The "automatic logs" requirement comes from Article 19, which applies to high-risk AI systems — a completely different regulatory track. Article 19 mandates that high-risk systems "shall be designed and developed with capabilities enabling the automatic recording of events ('logs')."
Why does this distinction matter for developers? Because the Code of Practice operationalizes Article 50, not Article 19. If you're building a general-purpose AI image generator, you fall under Article 50's transparency obligations (marking, labeling, disclosure). You fall under Article 19's logging obligations only if your system is classified as high-risk under Annex III. Conflating the two leads to either over-engineering (implementing logging you're not required to build) or under-engineering (assuming Article 50 covers your logging needs when it doesn't address logging at all).
What the EU Code does and doesn't cover
EU Code of Practice: Article 50 Scope
═══════════════════════════════════════
COVERED (Transparency Obligations)
├── Marking: Machine-readable metadata on AI outputs
├── Labeling: User-visible disclosure of AI generation
├── Detection: Technical measures to identify AI content
└── Watermarking: Robust signals surviving standard transformations
NOT COVERED
├── What happens when generation is REFUSED
├── Completeness of generation logs (attempt accounting)
├── Verifiability of safety system operation
├── Refusal event cryptographic provenance
└── Account-level enforcement audit trails
SEPARATE TRACK (Article 19 — High-Risk Only)
├── Automatic event logging
├── Log retention
└── Log accessibility for authorities
The Code of Practice is doing important work — defining how AI-generated content gets marked. But it's designing the passport for content that makes it through customs. It has no mechanism for recording what was stopped at the border.
Where CAP-SRP fits
CAP-SRP doesn't compete with the Code of Practice. It fills the space the Code explicitly doesn't occupy: proving what was refused, with cryptographic verifiability.
The enforcement date is confirmed: August 2, 2026. The final Code is targeted for June 2026. Feedback on the second draft closes in late March (the exact deadline for this specific Code, as distinct from the separate GPAI Code of Practice, hasn't been independently confirmed). The working group schedule shows meetings in mid-to-late March, though the Commission notes dates are "indicative."
Assessment: Strengthens the case for CAP-SRP — the EU's most detailed transparency framework explicitly scopes out refusal provenance, creating a well-defined gap.
Event 2: C2PA's "Integrity Clash" — Cryptographic Truth Meets Structural Contradiction {#event-2-c2pas-integrity-clash}
What happened
On March 2, 2026 (correcting the initially reported March 1), a research team from Case Western Reserve University and UCLA published "Authenticated Contradictions from Desynchronized Provenance and Watermarking" on arXiv (ID: 2603.02378). The authors — Alexander Nemecek, Hengzhi He, Guang Cheng, and Erman Ayday — demonstrated what they call an "Integrity Clash": a state where a cryptographically valid C2PA manifest and a valid watermark contradict each other about the same image.
The attack works like this: an AI system generates an image with both a C2PA manifest (claiming "AI-generated") and an embedded watermark. A user opens the image in a C2PA-compliant editing tool, makes edits, and saves. The new C2PA manifest — correctly, from the tool's perspective — records "edited by human." But the AI watermark persists in the pixel data. Both signals are cryptographically valid. Both pass independent verification. They contradict each other.
The critical finding: this is procedural, not cryptanalytic. No cryptographic primitive is compromised. The attack exploits the C2PA specification's allowance for omitting a single assertion field — specifically, the c2pa.ai_generative_training assertion is not mandatory. The paper calls this a "desynchronization" between two provenance systems that were never designed to interoperate.
Fact-check verdict: ✅ Core confirmed, with attribution corrections
The paper, its arXiv ID, and its central claims are all verified. Two details require correction:
1. "Adobe Photoshop" is editorial embellishment. The paper references "standard editing pipelines" and "C2PA-compliant editing tool(s)" but doesn't specifically name Photoshop. It's a reasonable inference — Photoshop is the canonical C2PA-compliant editor — but the paper doesn't make the specific claim.
2. The "38% watermarking / 18% C2PA" statistics are misattributed. These figures do not come from this paper. They're from Rijsbosch et al. (arXiv:2503.18156), "Adoption of Watermarking Measures for AI-Generated Content and Implications under the EU AI Act," which is a separate study measuring AI service transparency practices. Moreover, the "18%" figure specifically measured "deep fake labelling" practices broadly — not C2PA integration specifically. The Integrity Clash paper cites this study, but the statistics belong to the cited reference, not to the paper itself.
Why this is architecturally significant
The Integrity Clash demonstrates something CAP-SRP's design has assumed from day one: content-level provenance alone cannot provide complete accountability. Here's the attack formalized:
The Integrity Clash Attack
════════════════════════════
Step 1: AI generates image
┌──────────────────────────────────┐
│ C2PA Manifest: "AI-generated" │
│ Watermark: AI_SIGNATURE present │
│ Status: CONSISTENT ✓ │
└──────────────────────────────────┘
Step 2: User opens in C2PA-compliant editor, makes edits, saves
┌──────────────────────────────────────────────┐
│ NEW C2PA Manifest: "Edited by human" │
│ (c2pa.ai_generative_training omitted — VALID │
│ per C2PA spec, it's not mandatory) │
│ Watermark: AI_SIGNATURE still present │
│ Status: CONTRADICTION ✗ │
│ │
│ C2PA says: human-edited │
│ Watermark says: AI-generated │
│ Both pass verification independently │
└──────────────────────────────────────────────┘
Step 3: Verifier faces undecidable state
┌──────────────────────────────────────────────┐
│ Option A: Trust C2PA → "human content" │
│ Option B: Trust watermark → "AI content" │
│ Option C: Flag contradiction → manual review │
│ │
│ No protocol exists to resolve the conflict │
└──────────────────────────────────────────────┘
This is a provenance-layer problem. Both C2PA and watermarking operate on content that exists. When they disagree, there's no ground truth to appeal to — because the ground truth (what actually happened during generation) isn't recorded in a way that's independent of the content itself.
What CAP-SRP records that resolves the ambiguity
CAP-SRP's event log is independent of the content. It records the generation attempt, the outcome, and the policy applied — before any downstream editing happens:
from dataclasses import dataclass
from enum import Enum
from datetime import datetime, timezone
import hashlib
class EventType(Enum):
GEN_ATTEMPT = "GEN_ATTEMPT"
GEN = "GEN"
GEN_DENY = "GEN_DENY"
@dataclass
class CAPEvent:
event_id: str
event_type: EventType
timestamp: datetime
attempt_id: str
content_hash: str | None = None # SHA-256 of generated content
risk_category: str | None = None
def resolve_integrity_clash(
cap_events: list[CAPEvent],
content_hash: str
) -> dict:
"""
Given a content hash from a disputed image,
check the CAP log for the original generation event.
If a GEN event exists with matching content_hash,
the content is AI-generated regardless of what
the current C2PA manifest claims.
"""
matching_gen = [
e for e in cap_events
if e.event_type == EventType.GEN
and e.content_hash == content_hash
]
if matching_gen:
original = matching_gen[0]
return {
"resolution": "AI_GENERATED",
"evidence": {
"gen_event_id": original.event_id,
"gen_timestamp": original.timestamp.isoformat(),
"attempt_id": original.attempt_id,
"content_hash": content_hash,
},
"note": (
"CAP log contains GEN event with matching content hash. "
"Content was AI-generated regardless of current C2PA manifest state."
)
}
# Check if there's a GEN_DENY for a prompt that
# matches (via PromptHash) — meaning the system
# refused a similar request
return {
"resolution": "NO_MATCHING_GEN_EVENT",
"note": (
"No GEN event found with this content hash. "
"Content may be human-created, or the platform "
"may not have been running CAP-SRP at generation time."
)
}
The CAP log doesn't replace C2PA or watermarking. It provides the independent ground truth that resolves contradictions between them. When the C2PA manifest says "human-edited" and the watermark says "AI-generated," the CAP log can confirm: "Yes, content with this hash was generated by our AI system at timestamp T, in response to attempt A."
Assessment: Strengthens the case for CAP-SRP — the paper demonstrates exactly the class of failure that an independent, content-decoupled audit layer prevents.
Event 3: Microsoft's Research Blueprint — Not a Product Launch {#event-3-microsofts-research-blueprint}
What happened
On February 19, 2026 — not "March 4–5" as widely re-reported — Microsoft published a research report titled "Media Integrity and Authentication: Status, Directions, and Futures", shared exclusively with MIT Technology Review. The report, led by Microsoft CSO Eric Horvitz, evaluated 60 combinations of three technique families — metadata tracking (including C2PA), invisible watermarking, and cryptographic fingerprinting — for authenticating media provenance.
The "March 4–5" date that appeared in some coverage corresponds to when Fox News ran a secondary commentary piece by Kurt "CyberGuy" Knutsson, summarizing the report two weeks after its original publication. This is a reminder to always trace stories to their primary source.
Fact-check: Two claims appear fabricated
Two specific claims that circulated about this announcement have no basis in any source I could find — including the Fox News article that was cited as their origin:
1. "Edge browser verification badges": No evidence exists that Microsoft has started showing native content verification badges in the Edge browser. Third-party C2PA extensions exist for Chrome and Edge — from Digimarc and Adobe's Content Authenticity Initiative — but these are not Microsoft products. The Microsoft report discusses browser-level verification as a recommendation, not an implemented feature.
2. "95% deepfake detection accuracy in news media pilot": This claim does not appear in MIT Technology Review's original coverage, in Microsoft's own Signal Blog post, in the Fox News article cited as a source, or in any other coverage I reviewed. The report focuses on content provenance (where content came from), not deepfake detection (whether content is synthetic) — these are fundamentally different technical problems.
3. "Content Credentials for Bing and Office": Microsoft announced Content Credentials for Bing Image Creator in September 2023 — this is nearly three years old, not new. The claim about Office app-generated content receiving Content Credentials has no supporting evidence.
Notably, MIT Technology Review reported that Horvitz declined to commit to implementing the report's own recommendations across Microsoft products. This is a research blueprint, not a product announcement.
What the report actually says that matters
Despite the sourcing problems, the underlying research is genuinely significant. The report's core finding — that no single technique is sufficient and a layered approach combining multiple methods is necessary — aligns with both the EU Code of Practice's multilayered framework and CAP-SRP's design philosophy.
The key insight for our purposes: the report evaluates 60 combinations of provenance and detection methods. All 60 operate on content that exists. None address the refusal provenance gap.
Microsoft's 60-Method Evaluation Space
═══════════════════════════════════════
Metadata Watermark Fingerprint
(C2PA etc) (invisible) (perceptual)
│ │ │
▼ ▼ ▼
┌─────────────────────────────────────┐
│ 60 combinations evaluated │
│ across robustness, accuracy, │
│ deployability, and interoperability │
└─────────────────────────────────────┘
│
▼
ALL operate on:
┌──────────────────────┐
│ Content That Exists │
│ (generated outputs) │
└──────────────────────┘
NONE address:
┌──────────────────────┐
│ Content That Doesn't │
│ Exist (refused │
│ generation requests) │
└──────────────────────┘
Assessment: Complements CAP-SRP — the industry's most comprehensive evaluation of content provenance methods explicitly demonstrates the boundary where content-level provenance ends and refusal provenance needs to begin.
Event 4: SynthID-Text Gets Its First Theoretical Break {#event-4-synthid-text-theoretical-break}
What happened
A paper titled "On Google's LLM Watermarking System: Theoretical Analysis and Empirical Validation" by Romina Omidi, Yun Dong, and Binghui Wang was submitted to ICLR 2026 and appeared on arXiv in early March 2026. This is, by the authors' own description, the first theoretical analysis of Google's SynthID-Text watermarking system.
The key finding: a layer inflation attack can defeat SynthID-Text's mean score-based detection. The attack exploits the mathematical structure of how SynthID-Text aggregates watermark scores across tournament layers, proving that an adversary can inflate the number of layers to make watermarked text indistinguishable from unwatermarked text in terms of mean score.
This complements earlier empirical work. ETH Zurich's SRI Lab demonstrated scrubbing success rates above 90% against SynthID-Text. The SynGuard paper (Han et al., arXiv:2508.20228) showed vulnerability to paraphrasing, copy-paste, and back-translation — semantic-preserving transformations that degrade detection accuracy significantly. Even Google's own documentation acknowledges reduced confidence after "significant rewriting or translation."
Fact-check verdict: ✅ Confirmed with minor notes
The paper is real and verified via OpenReview. The specific arXiv ID (2603.03410) could not be independently confirmed through search engines as of March 6 — which is expected for papers posted just days earlier. The ICLR 2026 submission (OpenReview ID: 4AfWqR3quK) is the authoritative reference.
The "first comprehensive theoretical analysis" characterization adds "comprehensive" — the paper itself says "first theoretical analysis" without the qualifier. Minor embellishment, but worth noting.
What this means for provenance architecture
The theoretical vulnerability of watermarking isn't just an academic curiosity. Watermarking is one of the EU Code of Practice's core commitment pillars. If watermarks can be theoretically (and in practice, empirically) defeated, then the "detect AI content in the wild" approach has a fundamental reliability ceiling.
This is exactly why CAP-SRP's design is content-independent:
"""
Why watermark fragility doesn't affect CAP-SRP.
Watermarks are embedded in content → content transforms can remove them.
CAP-SRP logs are independent of content → content transforms are irrelevant.
"""
class ProvenanceDurability:
"""Compare the durability properties of different provenance methods."""
WATERMARK = {
"embedded_in": "content pixels/tokens",
"survives_paraphrase": False, # SynGuard: degraded
"survives_translation": False, # SynGuard: degraded
"survives_screenshot": False, # pixels resampled
"survives_copy_paste": False, # ETH Zurich: >90% scrub
"survives_layer_inflation": False, # Omidi et al.: broken
"proves_refusal": False, # N/A — nothing to embed in
"independent_of_content": False,
}
C2PA_METADATA = {
"embedded_in": "content metadata (JUMBF)",
"survives_paraphrase": False, # metadata stripped
"survives_translation": False, # metadata stripped
"survives_screenshot": False, # metadata stripped
"survives_copy_paste": False, # metadata stripped
"survives_layer_inflation": "N/A",
"proves_refusal": False, # no artifact to attach to
"independent_of_content": False,
}
CAP_SRP_LOG = {
"embedded_in": "independent cryptographic audit trail",
"survives_paraphrase": True, # log is separate
"survives_translation": True, # log is separate
"survives_screenshot": True, # log is separate
"survives_copy_paste": True, # log is separate
"survives_layer_inflation": True, # log is separate
"proves_refusal": True, # GEN_DENY events
"independent_of_content": True,
}
Watermarks and C2PA serve essential roles — they're the "passport" that travels with the content. CAP-SRP is the immigration database that records every entry and every refusal, independent of what happens to the passport after issuance.
Assessment: Strengthens the case for CAP-SRP — theoretical watermark fragility increases the value of content-independent audit trails.
Event 5: NCMEC's Million AI-CSAM Reports — And the Zero That Matters {#event-5-ncmecs-million-ai-csam-reports}
What happened
On February 28, 2026, NBC News published a comprehensive investigation titled "The AI child exploitation crisis is here," reporting that NCMEC (National Center for Missing & Exploited Children) received over one million AI-related CSAM reports in nine months (January–September 2025). The article, by journalist Bruna Horvath, covers 36 AI-CSAM prosecutions across 22 states and includes an interview with NCMEC executive director Fallon McNulty.
The headline number is dramatic. But the more important story is what the Stanford Center for Internet and Society found when they examined what "AI-related" actually means.
On January 29, 2026, Riana Pfefferkorn at Stanford CIS published a letter analyzing NCMEC's AI-CSAM reporting statistics, building on a Bloomberg investigation from the same day. The key finding: of Amazon's approximately 380,000 CyberTipline reports flagged as "AI-related," a Stanford/Bloomberg review found that zero actually contained AI-generated CSAM.
How is this possible? The CyberTipline reporting form includes a "Generative AI" checkbox, but what that checkbox means varies wildly by reporter:
- Actual AI-generated CSAM found (the intended meaning)
- Known CSAM detected in AI training data (a data pipeline issue)
- Text-only prompts flagged by content filters (no images involved)
- AI-assisted detection tools used in the reporting process (the tool is AI, not the content)
- Catch-all "we used AI somewhere in our pipeline"
The result: a million-report headline number that conflates fundamentally different phenomena into a single, misleading statistic.
Fact-check verdict: ✅ This is the strongest item — nearly everything confirmed
| Claim | Verdict |
|---|---|
| NBC article published Feb 28, 2026 | Confirmed at the exact URL |
| Over 1 million AI-related reports in 9 months | Confirmed by NCMEC's own executive director |
| Stanford CIS letter dated January 29, 2026 | Confirmed (note: Bloomberg was the original source) |
| Amazon's 380,000 reports, zero AI-generated | Confirmed |
| Checkbox meaning varies by reporter | Confirmed in detail by Stanford CIS |
One attribution nuance: the original investigation was by Bloomberg (January 29, 2026). Stanford CIS published a letter analyzing and amplifying Bloomberg's findings the same day. Describing it as Stanford CIS having "found" these statistics slightly misattributes the discovery.
Why self-reported logs are the core problem
This is the most powerful illustration of why CAP-SRP exists. The NCMEC reporting system is, structurally, a self-reported log. Platforms check boxes, submit numbers, and those numbers become the basis for policy and prosecution. When the numbers turn out to be unreliable — when 380,000 "AI-related" reports contain zero AI-generated content — the entire statistical foundation collapses.
AI safety logs have the same structural problem. When a platform says "we blocked 50,000 NCII generation attempts last quarter," there is no mechanism for anyone to verify:
- Whether those 50,000 attempts actually occurred
- Whether additional attempts were silently allowed
- Whether the "blocked" classification was applied correctly
- Whether the total is complete (no missing attempts)
CAP-SRP's Completeness Invariant addresses this directly:
∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR
Every attempt has exactly one outcome. The invariant is cryptographically enforced — not self-reported. An auditor can verify the math without trusting the platform.
def demonstrate_ncmec_problem():
"""
Show how self-reported statistics can be misleading
and how the Completeness Invariant prevents this.
"""
# NCMEC-style self-reporting: just a checkbox
self_reported = {
"total_ai_related_reports": 1_000_000,
"from_amazon": 380_000,
"amazon_actually_ai_generated": 0,
"checkbox_meaning": "varies by reporter",
"verifiable": False,
}
# CAP-SRP style: cryptographically enforced accounting
cap_srp_verified = {
"total_gen_attempts": 1_000_000,
"gen_success": 945_000,
"gen_deny": 50_000,
"gen_error": 5_000,
"completeness_invariant": (
1_000_000 == 945_000 + 50_000 + 5_000 # True
),
"each_deny_has": [
"cryptographic_signature",
"hash_chain_link",
"external_timestamp",
"merkle_proof",
"risk_category",
"policy_version_ref", # v1.1: which policy was applied
],
"verifiable": True,
}
return {
"self_reported": self_reported,
"cap_srp": cap_srp_verified,
"key_difference": (
"Self-reported numbers can conflate different phenomena. "
"CAP-SRP numbers are mathematically complete — every attempt "
"has exactly one cryptographically linked outcome."
)
}
Assessment: Strongest case for CAP-SRP — the NCMEC checkbox problem is the self-reported log problem at societal scale.
The Architecture: CAP-SRP v1.1 Event Model {#the-architecture-cap-srp-v11-event-model}
CAP-SRP v1.1, released March 5, 2026, adds three major capabilities motivated by recent events. Here's the complete event flow:
CAP-SRP v1.1 Complete Event Architecture
═════════════════════════════════════════
User Request
│
▼
┌──────────────┐
│ GEN_ATTEMPT │ ← Logged BEFORE safety evaluation
└──────┬───────┘
│
▼
┌──────────────┐
│ Safety Check │
└──────┬───────┘
│
┌────┼────┬──────────┬──────────────┐
│ │ │ │ │
▼ ▼ ▼ ▼ ▼
┌───┐┌────┐┌────────┐┌───────────┐┌──────────────┐
│GEN││DENY││GEN_WARN││GEN_ESCAL. ││GEN_QUARANTINE│
└───┘└────┘└────────┘└─────┬─────┘└──────┬───────┘
│ │ │
│ resolved by resolved by
│ GEN or DENY EXPORT or DENY
│
▼
┌──────────────┐
│ EXPORT │ ← Content delivered to user
└──────────────┘
Account-Level Events (independent track, v1.1):
Flagged Pattern ──→ ACCOUNT_ACTION ──→ LAW_ENFORCEMENT_REFERRAL
(suspend/ban/ (referred/not_referred/
flag/reinstate) pending)
Policy Events (v1.1):
Policy Change ──→ POLICY_VERSION ──→ External Anchor
(version hash, (MUST be anchored
effective date) BEFORE effective date)
v1.1 Completeness Invariants
v1.1 introduces four invariants (up from one in v1.0):
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional
from datetime import datetime, timezone
import hashlib
import json
class EventType(Enum):
GEN_ATTEMPT = "GEN_ATTEMPT"
GEN = "GEN"
GEN_DENY = "GEN_DENY"
GEN_ERROR = "GEN_ERROR"
GEN_WARN = "GEN_WARN"
GEN_ESCALATE = "GEN_ESCALATE"
GEN_QUARANTINE = "GEN_QUARANTINE"
EXPORT = "EXPORT"
ACCOUNT_ACTION = "ACCOUNT_ACTION"
LAW_ENFORCEMENT_REFERRAL = "LAW_ENFORCEMENT_REFERRAL"
POLICY_VERSION = "POLICY_VERSION"
@dataclass
class InvariantResult:
name: str
valid: bool
equation: str
details: dict
class CAPSRPv11Verifier:
"""Verify all four CAP-SRP v1.1 Completeness Invariants."""
def __init__(self, events: list[dict]):
self.events = events
def verify_primary_invariant(self) -> InvariantResult:
"""
Invariant 1 (Primary — unchanged from v1.0):
∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR
Note: GEN_WARN counts as GEN (generation occurred with warning).
"""
attempts = self._count(EventType.GEN_ATTEMPT)
gens = self._count(EventType.GEN) + self._count(EventType.GEN_WARN)
denials = self._count(EventType.GEN_DENY)
errors = self._count(EventType.GEN_ERROR)
outcomes = gens + denials + errors
return InvariantResult(
name="Primary Completeness Invariant",
valid=attempts == outcomes,
equation=f"{attempts} == {gens} + {denials} + {errors} = {outcomes}",
details={
"attempts": attempts,
"gen_including_warn": gens,
"deny": denials,
"error": errors,
"unmatched": attempts - outcomes,
}
)
def verify_escalation_invariant(self) -> InvariantResult:
"""
Invariant 2 (v1.1 — Silver+):
∑ GEN_ESCALATE = ∑ ESCALATION_RESOLVED
Every escalation must be resolved by GEN or GEN_DENY.
Unresolved escalations older than 72 hours are violations.
"""
escalations = self._count(EventType.GEN_ESCALATE)
resolved = sum(
1 for e in self.events
if e["event_type"] in (EventType.GEN.value, EventType.GEN_DENY.value)
and e.get("resolution_ref") is not None
)
unresolved_over_72h = self._count_unresolved_escalations(hours=72)
return InvariantResult(
name="Escalation Resolution Invariant",
valid=escalations == resolved and unresolved_over_72h == 0,
equation=f"{escalations} == {resolved}",
details={
"total_escalations": escalations,
"resolved": resolved,
"unresolved": escalations - resolved,
"unresolved_over_72h": unresolved_over_72h,
}
)
def verify_quarantine_invariant(self) -> InvariantResult:
"""
Invariant 3 (v1.1 — Silver+):
∑ GEN_QUARANTINE = ∑ RELEASED_TO_EXPORT + ∑ QUARANTINE_DENIED
Every quarantined item must be resolved by EXPORT or GEN_DENY.
"""
quarantines = self._count(EventType.GEN_QUARANTINE)
released = sum(
1 for e in self.events
if e["event_type"] == EventType.EXPORT.value
and e.get("release_ref") is not None
)
denied_after_quarantine = sum(
1 for e in self.events
if e["event_type"] == EventType.GEN_DENY.value
and e.get("release_ref") is not None
)
return InvariantResult(
name="Quarantine Resolution Invariant",
valid=quarantines == released + denied_after_quarantine,
equation=f"{quarantines} == {released} + {denied_after_quarantine}",
details={
"total_quarantines": quarantines,
"released_to_export": released,
"denied_after_review": denied_after_quarantine,
"unresolved": quarantines - released - denied_after_quarantine,
}
)
def verify_policy_anchoring_invariant(self) -> InvariantResult:
"""
Invariant 4 (v1.1 — Silver+):
Every POLICY_VERSION must have an external anchor timestamp
BEFORE the policy's EffectiveFrom date.
This prevents backdating policy changes — a provider can't claim
"we had this stricter policy all along" after an incident.
"""
policy_events = [
e for e in self.events
if e["event_type"] == EventType.POLICY_VERSION.value
]
violations = []
for pe in policy_events:
effective = pe.get("effective_from")
anchor_ts = pe.get("external_anchor_timestamp")
if anchor_ts is None or anchor_ts >= effective:
violations.append(pe.get("event_id"))
return InvariantResult(
name="Policy Anchoring Invariant",
valid=len(violations) == 0,
equation=f"All {len(policy_events)} policies anchored before effective date",
details={
"total_policies": len(policy_events),
"properly_anchored": len(policy_events) - len(violations),
"violations": violations,
}
)
def verify_all(self) -> dict:
"""Run all four invariants and return combined result."""
results = [
self.verify_primary_invariant(),
self.verify_escalation_invariant(),
self.verify_quarantine_invariant(),
self.verify_policy_anchoring_invariant(),
]
return {
"all_valid": all(r.valid for r in results),
"invariants": {r.name: {
"valid": r.valid,
"equation": r.equation,
"details": r.details,
} for r in results}
}
def _count(self, event_type: EventType) -> int:
return sum(
1 for e in self.events
if e["event_type"] == event_type.value
)
def _count_unresolved_escalations(self, hours: int) -> int:
now = datetime.now(timezone.utc)
cutoff = now.timestamp() - (hours * 3600)
return sum(
1 for e in self.events
if e["event_type"] == EventType.GEN_ESCALATE.value
and e.get("resolution_ref") is None
and e.get("timestamp", now.timestamp()) < cutoff
)
Building the New Invariants in Code {#building-the-new-invariants-in-code}
ACCOUNT_ACTION: The Tumbler Ridge Lesson
The v1.1 ACCOUNT_ACTION event type was directly motivated by the Tumbler Ridge mass shooting incident (February 2026), where OpenAI's handling of a banned account raised questions about the evidentiary chain between detection, account action, and law enforcement notification.
from dataclasses import dataclass
from enum import Enum
from datetime import datetime, timezone
import hashlib
import hmac
class AccountActionType(Enum):
SUSPEND = "SUSPEND"
BAN = "BAN"
RATE_LIMIT = "RATE_LIMIT"
FLAG_FOR_REVIEW = "FLAG_FOR_REVIEW"
REINSTATE = "REINSTATE"
class LEAssessment(Enum):
REFERRED = "REFERRED"
NOT_REFERRED = "NOT_REFERRED"
PENDING = "PENDING"
@dataclass
class AccountActionEvent:
"""
CAP-SRP v1.1 ACCOUNT_ACTION event.
Records account-level enforcement decisions with
cryptographic binding to the triggering evidence.
"""
event_id: str
timestamp: datetime
account_hash: str # HMAC-SHA256, not plain hash
action_type: AccountActionType
triggering_event_ids: list[str] # GEN_DENY events that led here
policy_version_ref: str # Which policy was applied
le_assessment: LEAssessment
jurisdiction_context: str
@staticmethod
def compute_account_hash(
account_id: str,
per_user_key: bytes
) -> str:
"""
Hash account ID with per-user HMAC key.
v1.1 uses HMAC (not plain SHA-256) so that
crypto-shredding is possible: destroy the key,
and the hash becomes unrecoverable.
"""
return hmac.new(
per_user_key,
account_id.encode(),
hashlib.sha256
).hexdigest()
def to_event_dict(self) -> dict:
return {
"event_type": "ACCOUNT_ACTION",
"event_id": self.event_id,
"timestamp": self.timestamp.isoformat(),
"account_hash": self.account_hash,
"action_type": self.action_type.value,
"triggering_event_ids": self.triggering_event_ids,
"applied_policy_version_ref": self.policy_version_ref,
"law_enforcement_assessment": {
"status": self.le_assessment.value,
"jurisdiction": self.jurisdiction_context,
},
}
def demonstrate_tumbler_ridge_counterfactual():
"""
What could have been verified had CAP-SRP v1.1
been deployed during the Tumbler Ridge incident.
Timeline:
1. User generates harmful content → GEN_DENY logged
2. Pattern triggers account review → ACCOUNT_ACTION logged
3. LE threshold assessed → LAW_ENFORCEMENT_REFERRAL logged
4. All events cryptographically linked and externally anchored
An auditor could verify:
- When the account was flagged (not self-reported)
- What policy was in effect (not retroactively claimed)
- Whether LE was notified and when (not "we don't comment")
"""
per_user_key = b"example-key-would-be-from-hsm"
account_hash = AccountActionEvent.compute_account_hash(
"user-12345", per_user_key
)
event = AccountActionEvent(
event_id="019467a1-ACCT-7000-0000-000000000001",
timestamp=datetime.now(timezone.utc),
account_hash=account_hash,
action_type=AccountActionType.BAN,
triggering_event_ids=[
"019467a1-DENY-7000-0000-000000000001",
"019467a1-DENY-7000-0000-000000000002",
"019467a1-DENY-7000-0000-000000000003",
],
policy_version_ref="019467a0-POLICY-001",
le_assessment=LEAssessment.REFERRED,
jurisdiction_context="CA-BC",
)
return event.to_event_dict()
POLICY_VERSION: Preventing Retrospective Claims
The policy anchoring invariant is subtle but powerful. When an incident occurs, platforms often claim they had stricter policies than they actually did. The POLICY_VERSION event creates a timestamped, externally anchored commitment: "This was our policy, effective from this date, anchored before it took effect."
@dataclass
class PolicyVersionEvent:
"""
CAP-SRP v1.1 POLICY_VERSION event.
Creates a tamper-evident record of which safety policy
was in effect at any given time. External anchoring
MUST occur BEFORE the effective date — preventing
backdated policy claims.
"""
event_id: str
timestamp: datetime
policy_document_hash: str # SHA-256 of full policy text
version_identifier: str
effective_from: datetime
external_anchor_timestamp: datetime | None = None
def validate_anchoring(self) -> bool:
"""
Verify that external anchor timestamp
is BEFORE the policy effective date.
Implementations MUST reject POLICY_VERSION events
where the anchor timestamp postdates the effective date.
"""
if self.external_anchor_timestamp is None:
return False
return self.external_anchor_timestamp < self.effective_from
def to_event_dict(self) -> dict:
return {
"event_type": "POLICY_VERSION",
"event_id": self.event_id,
"timestamp": self.timestamp.isoformat(),
"policy_document_hash": self.policy_document_hash,
"version_identifier": self.version_identifier,
"effective_from": self.effective_from.isoformat(),
"external_anchor_timestamp": (
self.external_anchor_timestamp.isoformat()
if self.external_anchor_timestamp else None
),
"anchoring_valid": self.validate_anchoring(),
}
def demonstrate_policy_anchoring():
"""
Show how POLICY_VERSION prevents backdated claims.
Scenario: Platform claims "we tightened NCII policy on Jan 1"
but the policy hash wasn't anchored until Jan 15.
"""
from datetime import timedelta
# Legitimate policy update: anchored BEFORE effective date
legitimate = PolicyVersionEvent(
event_id="019467a0-POLICY-002",
timestamp=datetime(2026, 2, 25, tzinfo=timezone.utc),
policy_document_hash=hashlib.sha256(
b"NCII policy v2: stricter thresholds..."
).hexdigest(),
version_identifier="safety-policy-v2.0",
effective_from=datetime(2026, 3, 1, tzinfo=timezone.utc),
external_anchor_timestamp=datetime(2026, 2, 26, tzinfo=timezone.utc),
)
assert legitimate.validate_anchoring() is True
# Backdated policy: anchor is AFTER claimed effective date
backdated = PolicyVersionEvent(
event_id="019467a0-POLICY-003",
timestamp=datetime(2026, 1, 1, tzinfo=timezone.utc),
policy_document_hash=hashlib.sha256(
b"NCII policy v3: claimed retroactive..."
).hexdigest(),
version_identifier="safety-policy-v3.0",
effective_from=datetime(2026, 1, 1, tzinfo=timezone.utc),
external_anchor_timestamp=datetime(2026, 1, 15, tzinfo=timezone.utc),
)
assert backdated.validate_anchoring() is False # REJECTED
return {
"legitimate": legitimate.to_event_dict(),
"backdated_rejected": backdated.to_event_dict(),
}
How the Integrity Clash Validates CAP-SRP's Design {#how-the-integrity-clash-validates-cap-srps-design}
The Nemecek et al. paper's Integrity Clash is the clearest empirical validation yet of CAP-SRP's core design assumption: provenance must be independent of content.
Here's the architectural argument:
Content-Coupled Provenance (C2PA + Watermark)
═════════════════════════════════════════════
Content is the carrier of provenance signals.
When content is transformed, signals can desynchronize.
AI Image ──[edit]──→ Edited Image
│ │
├─ C2PA: "AI-gen" ├─ C2PA: "human-edited" ← NEW manifest
└─ WM: AI_SIG └─ WM: AI_SIG ← PERSISTS
│
CONTRADICTION
(Integrity Clash)
Content-Decoupled Provenance (CAP-SRP)
═════════════════════════════════════════
Provenance exists in an independent log.
Content transforms cannot affect the log.
AI Image ──[edit]──→ Edited Image
│ │
│ └─ Content can be modified freely
│
└─ CAP Log (immutable, external):
├─ GEN_ATTEMPT: request received at T1
├─ GEN: content generated at T2
│ content_hash: sha256(original)
└─ EXPORT: delivered to user at T3
No contradiction possible — the log records what happened,
independent of what happens to the content afterward.
The Integrity Clash cannot occur in CAP-SRP because:
- The GEN event's
content_hashis computed at generation time and never updated - The log is hash-chained and externally anchored — it cannot be modified to match post-hoc editing
- A verifier can always check: "Was content with hash X generated by this system?" regardless of current C2PA manifest state
This doesn't mean CAP-SRP replaces C2PA. C2PA answers "where did this specific file come from?" CAP-SRP answers "what did the system do?" They're complementary — and the Integrity Clash shows exactly where the boundary between them matters.
Regulatory Timeline: What Developers Need to Track {#regulatory-timeline}
Based on fact-checked dates only — unverified dates are marked:
| Date | Event | CAP-SRP Relevance | Verified? |
|---|---|---|---|
| Late March 2026 | EU CoP WG meetings | Marking standards discussion; refusal logs absent | ⚠️ Dates "indicative" per EC |
| Late March 2026 | EU CoP 2nd draft feedback deadline | Opportunity to submit refusal provenance input | ⚠️ Exact date unconfirmed |
| May 19, 2026 | TAKE IT DOWN Act compliance deadline | 48-hour takedown obligation → refusal proof demand | ✅ Confirmed |
| June 2026 | EU CoP final version | Final marking/labeling framework | ✅ Target confirmed |
| August 2, 2026 | EU AI Act Article 50 enforcement | Transparency obligations become legally binding | ✅ Confirmed |
The May 19 and August 2 dates are hard deadlines. Developers building AI content generation systems should be thinking now about how their logging infrastructure will demonstrate compliance — and what evidence gaps remain when the only provenance infrastructure covers content that exists.
Disclosure and Conclusion {#disclosure-and-conclusion}
Full disclosure: I maintain the CAP-SRP specification and the IETF Internet-Draft on SCITT refusal events. CAP-SRP is published by the VeritasChain Standards Organization, which I founded. The specification is at v1.1 (released March 5, 2026). It has not been independently audited, peer-reviewed by external security researchers, or adopted by any major AI platform. The IETF draft has no formal IETF standing — it is an individual submission. I am transparent about this because the technical arguments should stand on their own merits, and readers deserve context to evaluate them appropriately.
What this week proved
Five events. Five different angles. One consistent finding:
- The EU is building the most detailed AI content transparency framework in history — and it explicitly scopes out refusal provenance
- Academic researchers demonstrated that content-level provenance systems (C2PA + watermarks) can contradict each other in ways that no existing protocol can resolve
- Microsoft evaluated 60 provenance methods — all operating on content that exists
- Theoretical analysis showed that watermarking has fundamental robustness limits — reinforcing the need for content-independent audit trails
- Real-world data showed that self-reported AI safety statistics (NCMEC checkbox) can be wildly misleading without structural verification
The common thread: the industry is investing heavily in tracking what AI creates. Nobody is investing in proving what AI refused to create. CAP-SRP proposes a specific, implementable answer — Ed25519 signatures, SHA-256 hash chains, Merkle trees, external anchoring, and four mathematically enforced completeness invariants.
The specification is open. The code is MIT-licensed. The argument is: verify, don't trust.
GitHub: veritaschain/cap-spec
IETF Draft: draft-kamimura-scitt-refusal-events
Spec: CC BY 4.0 · Code: MIT License
Found an error? Have a correction? Open an issue on GitHub or comment below. Fact-checking is an ongoing process, and I'd rather be corrected than wrong.
Top comments (0)