On March 16, 2026, three Tennessee teenagers filed the first class-action lawsuit against an AI company for generating CSAM. The core allegation: xAI "refused to implement" industry-standard safety measures — and no one can verify whether its system actually refuses dangerous requests. This article fact-checks the lawsuit and three related developments, maps the technical gap using CAP-SRP's Completeness Invariant, and provides updated v1.1 Python code for building the cryptographic refusal logs that would make "trust us" claims independently verifiable.
TL;DR
Three teenagers sued xAI in federal court alleging Grok generated child sexual abuse material from their real photographs. Meta's own Oversight Board declared the company's AI content labeling "neither robust nor comprehensive enough." The EU's Code of Practice feedback deadline is 6 days away. The TAKE IT DOWN Act's platform compliance deadline is 8 weeks away.
All four stories converge on the same blind spot: every AI company's safety claims depend on internal logs that no one outside the company can verify. The "trust us" model is now the subject of federal litigation, internal corporate rebellion, and regulatory enforcement — simultaneously.
This article:
- Fact-checks all four developments against primary sources
- Maps the verification gap using updated v1.1 architecture
- Provides working Python code for the new v1.1 event types (
ACCOUNT_ACTION,LAW_ENFORCEMENT_REFERRAL,POLICY_VERSION) - Shows how the four Completeness Invariants would have changed the evidentiary landscape in the Grok case
GitHub: veritaschain/cap-spec · veritaschain/cap-srp · License: CC BY 4.0 / Apache 2.0
Table of Contents
- Event 1: The Grok CSAM Class Action
- Event 2: Meta's Oversight Board Rebellion
- Event 3: EU Code of Practice — 6 Days Left
- Event 4: TAKE IT DOWN Act — 8 Weeks Left
- The Stack: What Exists vs. What's Missing
- From v1.0 to v1.1: What Changed and Why
- Building v1.1: Account Actions and Enforcement Logging
- The Four Completeness Invariants
- Grok Counterfactual: What the Audit Trail Would Show
- Regulatory Deadline Map
- What This Means for Developers
- Transparency Notes
Event 1: The Grok CSAM Class Action
What happened
On March 16, 2026, three Tennessee high school students filed a class-action complaint against X.AI Corp. in the U.S. District Court for the Northern District of California. Case No. 5:26-cv-02246. Thirteen counts including Masha's Law, TVPA, strict liability for design defect.
The complaint alleges xAI knowingly designed, marketed, and profited from Grok's image generation while "refusing to implement the industry-standard CSAM prevention measures used by every other major AI company." A perpetrator used Grok — accessed through a third-party application licensing xAI's technology — to generate sexually explicit, hyperrealistic AI images of the plaintiffs from their real social media photographs. The CSAM was distributed on Discord, Telegram, and Mega.
Fact-check verdict: ✅ Fully confirmed
Independently confirmed by The Hill, The Verge, Mashable, Fortune/AP, FindLaw, and Lieff Cabraser case page. Docket confirmed via CourtListener. All claims verified.
The technical details that matter
The lawsuit cites Grok's Safety Instructions (v8):
# Grok Safety Instructions v8 (cited in complaint)
"Do not enforce additional content policies."
"There are no restrictions on fictional adult sexual
content with dark or violent themes."
"Assume good intent... 'teenage' or 'girl' does not
necessarily imply underage"
The complaint argues that these instructions, combined with "Spicy Mode" (launched October 2025), made CSAM prevention structurally impossible — even though the instructions technically prohibited CSAM. The Center for Countering Digital Hate estimated ~3 million sexualized images generated in 11 days, ~23,000 depicting apparent minors.
Why this matters for the verification gap
The lawsuit's theory of liability isn't just "xAI generated harmful content." It's that xAI refused to implement safeguards — and that no one could verify this until after the harm occurred. When Musk claimed in January 2026 that Grok refuses illegal requests, Reuters retested and found an 82% failure rate (45/55 prompts still produced sexualized imagery). AI Forensics found 53% of images still contained individuals in minimal attire.
The evidentiary problem: xAI's safety logs are internal, mutable, and unverifiable without adversarial discovery. There is no cryptographic record of what Grok refused, when, or under which policy version.
Event 2: Meta's Oversight Board Rebellion
What happened
On March 10, 2026, Meta's Oversight Board — the semi-independent body Meta itself created in 2020 — published a ruling on an AI-generated video of alleged missile damage in Haifa during the June 2025 Israel-Iran war. The fabricated video accumulated 700,000+ views. Six users reported it. Meta took no action.
The Board overturned Meta's decision and declared the company's AI content labeling "neither robust nor comprehensive enough."
Fact-check verdict: ✅ Fully confirmed
Confirmed by Engadget, SiliconANGLE, Rest of World, WinBuzzer, The Information, WITNESS.
The structural pattern
Meta claims its AI detects ~5,000 scams/day (announced March 18-19 alongside new AI moderation rollout — confirmed via TechCrunch, CNBC). Those detection logs are entirely internal.
The Oversight Board has the same structural problem as the Grok plaintiffs: it can observe failures (content that slipped through), but it cannot audit the system — the denominator of how many requests were processed, how many were blocked, and whether blocks were correct. The platform marks its own homework.
Event 3: EU Code of Practice — 6 Days Left
What happened
The European Commission published the second draft of the Code of Practice on Marking and Labelling of AI-Generated Content on March 4, 2026. Feedback closes March 30 EOB. Final version expected early June. Article 50 obligations enforceable August 2, 2026.
Fact-check verdict: ✅ Fully confirmed
Confirmed via EC official page, Herbert Smith Freehills, BABL AI, CADE, The Legal Wire.
What the Code covers (and doesn't)
Key changes from first draft:
- Multi-layered marking: C2PA metadata + watermarking (required); fingerprinting/logging (optional)
- Removed the AI-generated vs. AI-assisted distinction
- Task force for a uniform EU icon for AI content identification
- 180+ stakeholders across industry, academia, civil society
What the Code doesn't address: what happens when a provider's safety system blocks a request. There's no content to mark. No watermark to embed. No metadata to sign. The refusal is invisible to the Code's entire framework.
Event 4: TAKE IT DOWN Act — 8 Weeks Left
What happened
The TAKE IT DOWN Act (S.146) was signed May 19, 2025. Platform compliance obligations — including the 48-hour removal window and reasonable-efforts copy removal — become enforceable May 19, 2026 (~56 days from today). FTC enforcement.
Fact-check verdict: ✅ Fully confirmed
Confirmed via Nelson Mullins, Latham & Watkins, UBalt Law Review, RAINN, Cozen O'Connor.
Important distinction: The law itself took effect upon signing (May 19, 2025). What kicks in on May 19, 2026 is the operational compliance infrastructure for platforms — the 48-hour clock, the copy-removal obligation, the FTC enforcement authority.
The Stack: What Exists vs. What's Missing
Here's the current state of AI content provenance infrastructure as of March 2026:
The AI Content Accountability Stack (March 2026)
═════════════════════════════════════════════════
What Exists and Is Working:
┌──────────────────────────────────────────────────┐
│ Content Credentials (C2PA 2.3) │
│ ───────────────────────────────── │
│ Signed metadata proving who generated what │
│ Status: Shipping at scale (Samsung S26, Adobe, │
│ Microsoft Copilot, Bing Image Creator) │
│ Covers: AI-generated content labeling │
│ Gap: No "negative signal" — can't prove │
│ what WASN'T generated │
├──────────────────────────────────────────────────┤
│ EU Code of Practice (2nd draft, feedback by 3/30)│
│ ───────────────────────────────── │
│ Multi-layer marking: metadata + watermarks │
│ Status: 180+ stakeholders, final June 2026 │
│ Covers: Labeling, detection, disclosure │
│ Gap: Refusal events are outside scope │
├──────────────────────────────────────────────────┤
│ Legal Framework (TAKE IT DOWN, EU AI Act, etc.) │
│ ───────────────────────────────── │
│ 48-hour removal, transparency obligations │
│ Status: Multiple deadlines converging in 2026 │
│ Covers: Legal requirements and penalties │
│ Gap: Assumes verification infra that doesn't │
│ exist yet │
├──────────────────────────────────────────────────┤
│ Internal Logging (every AI provider) │
│ ───────────────────────────────── │
│ Server-side request/response logs │
│ Status: Exists at every provider internally │
│ Covers: Operational visibility (internal only) │
│ Gap: Mutable, unverifiable, trust-us model │
╞══════════════════════════════════════════════════╡
│ ░░░░░░░░░░░░░░░░ THE GAP ░░░░░░░░░░░░░░░░░░░░ │
│ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │
│ ░░ Cryptographic proof of refusal events ░░░░ │
│ ░░ Account enforcement audit trails ░░░░ │
│ ░░ Policy version anchoring ░░░░ │
│ ░░ Completeness guarantee across all ░░░░ │
│ ░░ generation attempts ░░░░ │
│ ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ │
└──────────────────────────────────────────────────┘
From v1.0 to v1.1: What Changed and Why
CAP-SRP v1.0 (released January 28, 2026) provided the foundation: GEN_ATTEMPT, GEN, GEN_DENY, GEN_ERROR, and the primary Completeness Invariant.
Five weeks later, the Tumbler Ridge mass shooting (February 10, 2026) exposed a critical gap. OpenAI had banned the shooter's ChatGPT account in June 2025 after detecting violent activity — but never notified law enforcement. The v1.0 specification had no event type for account-level enforcement decisions.
v1.1 addresses this with three new event types and three formalized intermediate states:
CAP-SRP v1.0 → v1.1 Event Model
════════════════════════════════
v1.0 Events (unchanged):
GEN_ATTEMPT ─→ GEN | GEN_DENY | GEN_ERROR
v1.1 Formalized Intermediate States:
GEN_ATTEMPT ─→ GEN_WARN (pass with warning)
GEN_ATTEMPT ─→ GEN_ESCALATE (→ human review → GEN|DENY)
GEN_ATTEMPT ─→ GEN_QUARANTINE (→ hold → EXPORT|DENY)
v1.1 New Event Types:
ACCOUNT_ACTION (suspend/ban/reinstate)
│
└→ LAW_ENFORCEMENT_REFERRAL (referred/not/pending)
POLICY_VERSION (safety policy published)
│
└→ externally anchored BEFORE effective date
v1.1 New Completeness Invariants (total: 4):
1. Primary: ∑ ATTEMPT = ∑ GEN + ∑ DENY + ∑ ERROR
2. Escalation: ∑ ESCALATE = ∑ ESCALATION_RESOLVED
3. Quarantine: ∑ QUARANTINE = ∑ RELEASED + ∑ DENIED
4. Account: ∑ ACCOUNT_ATTEMPT = ∑ COMPLETED + ∑ FAILED
v1.1 New Risk Category:
VIOLENCE_PLANNING (Tumbler Ridge pattern)
Building v1.1: Account Actions and Enforcement Logging
Here's the working Python implementation for the v1.1 event types. This extends the v1.0 implementation with account enforcement and policy version logging.
Core Types (v1.1 Extended)
import hashlib
import json
import time
import uuid
from dataclasses import dataclass, field, asdict
from typing import Optional, List, Dict
from enum import Enum
from datetime import datetime, timezone
class EventType(Enum):
"""CAP-SRP v1.1 complete event taxonomy."""
# v1.0 core events
GEN_ATTEMPT = "GEN_ATTEMPT"
GEN = "GEN"
GEN_DENY = "GEN_DENY"
GEN_ERROR = "GEN_ERROR"
# v1.1 intermediate states (formalized)
GEN_WARN = "GEN_WARN"
GEN_ESCALATE = "GEN_ESCALATE"
GEN_QUARANTINE = "GEN_QUARANTINE"
# v1.1 new event types
ACCOUNT_ACTION = "ACCOUNT_ACTION"
LAW_ENFORCEMENT_REFERRAL = "LAW_ENFORCEMENT_REFERRAL"
POLICY_VERSION = "POLICY_VERSION"
class RiskCategory(Enum):
"""Risk taxonomy — v1.1 adds VIOLENCE_PLANNING."""
CSAM_RISK = "CSAM_RISK"
NCII_RISK = "NCII_RISK"
MINOR_SEXUALIZATION = "MINOR_SEXUALIZATION"
REAL_PERSON_DEEPFAKE = "REAL_PERSON_DEEPFAKE"
VIOLENCE_EXTREME = "VIOLENCE_EXTREME"
VIOLENCE_PLANNING = "VIOLENCE_PLANNING" # NEW: Tumbler Ridge
HATE_CONTENT = "HATE_CONTENT"
TERRORIST_CONTENT = "TERRORIST_CONTENT"
SELF_HARM_PROMOTION = "SELF_HARM_PROMOTION"
COPYRIGHT_VIOLATION = "COPYRIGHT_VIOLATION"
COPYRIGHT_STYLE_MIMICRY = "COPYRIGHT_STYLE_MIMICRY"
CONTENT_POLICY = "CONTENT_POLICY"
OTHER = "OTHER"
class AccountActionType(Enum):
"""v1.1: Account-level enforcement decisions."""
SUSPEND = "SUSPEND"
BAN = "BAN"
RATE_LIMIT = "RATE_LIMIT"
REINSTATE = "REINSTATE"
FLAG_FOR_REVIEW = "FLAG_FOR_REVIEW"
class ReferralStatus(Enum):
"""v1.1: Law enforcement referral outcomes."""
REFERRED = "REFERRED"
NOT_REFERRED = "NOT_REFERRED"
PENDING = "PENDING"
class RiskScoreBand(Enum):
"""v1.1: Account risk classification."""
LOW = "LOW"
MEDIUM = "MEDIUM"
HIGH = "HIGH"
CRITICAL = "CRITICAL"
def sha256(data: str) -> str:
"""SHA-256 with standard prefix."""
return f"sha256:{hashlib.sha256(data.encode()).hexdigest()}"
def canonicalize(obj: dict) -> str:
"""RFC 8785 JSON Canonicalization (simplified)."""
return json.dumps(obj, sort_keys=True, separators=(",", ":"))
def uuid7() -> str:
"""Generate UUIDv7 (time-ordered)."""
timestamp_ms = int(time.time() * 1000)
rand_bits = uuid.uuid4().int & ((1 << 62) - 1)
uuid_int = (timestamp_ms << 80) | (0x7 << 76) | rand_bits
return str(uuid.UUID(int=uuid_int & ((1 << 128) - 1)))
ACCOUNT_ACTION Event (v1.1 — Tumbler Ridge Response)
This is the event that v1.0 was missing. When OpenAI banned the Tumbler Ridge shooter's account in June 2025, there was no standardized way to log that decision in a verifiable audit trail.
@dataclass
class AccountActionEvent:
"""
v1.1: Records account-level enforcement decisions.
Motivated by Tumbler Ridge incident (Feb 2026):
OpenAI banned an account in June 2025 but never
notified law enforcement. This event type makes
such decisions externally verifiable.
"""
event_id: str
chain_id: str
prev_hash: Optional[str]
timestamp: str
event_type: str = "ACCOUNT_ACTION"
# Account identification (privacy-preserving)
account_hash: str = "" # HMAC(account_id, per-user key)
action_type: str = "" # SUSPEND | BAN | RATE_LIMIT | etc.
# Decision context
triggering_event_refs: List[str] = field(default_factory=list)
risk_score_band: str = "" # LOW | MEDIUM | HIGH | CRITICAL
risk_categories: List[str] = field(default_factory=list)
decision_mechanism: str = "" # AUTOMATED | HUMAN | HYBRID
# Law enforcement assessment (Gold level)
le_assessment: Optional[Dict] = None
# Policy reference (v1.1 requirement)
applied_policy_version_ref: str = ""
# Cryptographic fields
event_hash: Optional[str] = None
signature: Optional[str] = None
def compute_hash(self) -> str:
data = {k: v for k, v in asdict(self).items()
if k not in ("event_hash", "signature") and v is not None}
return sha256(canonicalize(data))
@classmethod
def create(cls, chain_id: str, prev_hash: Optional[str],
account_id: str, hmac_key: bytes,
action: AccountActionType,
risk_band: RiskScoreBand,
categories: List[RiskCategory],
triggering_refs: List[str],
decision: str = "AUTOMATED",
policy_ref: str = "",
le_assessment: Optional[Dict] = None) -> 'AccountActionEvent':
"""Factory method for creating account action events."""
import hmac as hmac_lib
# Privacy-preserving account hash
account_hash = sha256(
hmac_lib.new(hmac_key, account_id.encode(),
hashlib.sha256).hexdigest()
)
event = cls(
event_id=uuid7(),
chain_id=chain_id,
prev_hash=prev_hash,
timestamp=datetime.now(timezone.utc).isoformat(),
account_hash=account_hash,
action_type=action.value,
triggering_event_refs=triggering_refs,
risk_score_band=risk_band.value,
risk_categories=[c.value for c in categories],
decision_mechanism=decision,
le_assessment=le_assessment,
applied_policy_version_ref=policy_ref,
)
event.event_hash = event.compute_hash()
return event
LAW_ENFORCEMENT_REFERRAL Event
@dataclass
class LawEnforcementReferralEvent:
"""
v1.1: Records whether law enforcement was notified.
The Tumbler Ridge question: "Did the company evaluate
whether to report this account to authorities, and
what was the outcome?"
This event makes the referral decision verifiable —
not just the action taken on the account.
"""
event_id: str
chain_id: str
prev_hash: Optional[str]
timestamp: str
event_type: str = "LAW_ENFORCEMENT_REFERRAL"
# Link to triggering account action
account_action_ref: str = ""
# Referral decision
referral_status: str = "" # REFERRED | NOT_REFERRED | PENDING
assessment_criteria: str = "" # What threshold was applied
jurisdiction: str = "" # ISO 3166-1 alpha-2
# If referred
referral_timestamp: Optional[str] = None
referral_agency_type: Optional[str] = None # e.g., "NCMEC", "FBI"
# If not referred — the critical accountability field
non_referral_rationale: Optional[str] = None
# Cryptographic fields
event_hash: Optional[str] = None
signature: Optional[str] = None
def compute_hash(self) -> str:
data = {k: v for k, v in asdict(self).items()
if k not in ("event_hash", "signature") and v is not None}
return sha256(canonicalize(data))
POLICY_VERSION Event — Preventing Retroactive Threshold Manipulation
@dataclass
class PolicyVersionEvent:
"""
v1.1: Cryptographic proof that a safety policy existed
BEFORE its effective date.
Why this matters: Without this, a provider could:
1. Receive a CSAM generation request
2. Generate the content (no policy blocked it)
3. AFTER the fact, create a "policy" claiming to block it
4. Show regulators the backdated policy as proof of compliance
The Policy Anchoring Invariant prevents this:
anchor_timestamp(policy) <= policy.effective_from
The policy must be externally anchored BEFORE it takes effect.
"""
event_id: str
chain_id: str
prev_hash: Optional[str]
timestamp: str
event_type: str = "POLICY_VERSION"
# Policy identification
policy_id: str = ""
policy_version: str = ""
policy_hash: str = "" # SHA-256 of full policy document
# Temporal fields — the anchoring invariant enforces
# that external_anchor_timestamp <= effective_from
effective_from: str = ""
supersedes: Optional[str] = None # Previous policy version ref
# Scope
applicable_risk_categories: List[str] = field(default_factory=list)
applicable_jurisdictions: List[str] = field(default_factory=list)
# External anchoring reference
external_anchor_ref: Optional[str] = None
# Cryptographic fields
event_hash: Optional[str] = None
signature: Optional[str] = None
def compute_hash(self) -> str:
data = {k: v for k, v in asdict(self).items()
if k not in ("event_hash", "signature") and v is not None}
return sha256(canonicalize(data))
The Four Completeness Invariants
v1.1 expands from one mathematical guarantee to four:
The Four Completeness Invariants (CAP-SRP v1.1)
════════════════════════════════════════════════
Invariant 1 — Primary (v1.0, unchanged):
─────────────────────────────────────────
∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR
(GEN includes GEN_WARN; DENY includes quarantine-denied)
Every request has exactly one outcome. No exceptions.
Invariant 2 — Escalation Resolution (v1.1, Silver+):
─────────────────────────────────────────────────────
∑ GEN_ESCALATE = ∑ ESCALATION_RESOLVED
Every escalation to human review MUST be resolved.
Unresolved escalations >72h = compliance violation.
Invariant 3 — Quarantine Resolution (v1.1, Silver+):
─────────────────────────────────────────────────────
∑ GEN_QUARANTINE = ∑ RELEASED + ∑ QUARANTINE_DENIED
Every held item MUST be resolved. No permanent limbo.
Invariant 4 — Policy Anchoring (v1.1, all levels):
───────────────────────────────────────────────────
anchor_timestamp(POLICY) ≤ POLICY.effective_from
Every policy MUST be externally timestamped BEFORE
it takes effect. Prevents retroactive policy creation.
Here's the verification implementation:
from collections import defaultdict
from typing import Tuple
def verify_completeness_v1_1(
events: List[dict],
time_window: Tuple[datetime, datetime]
) -> dict:
"""
Verify all four Completeness Invariants.
Returns a dict with pass/fail for each invariant
and details of any violations found.
"""
start, end = time_window
window_events = [
e for e in events
if start <= datetime.fromisoformat(e["timestamp"]) <= end
]
results = {
"invariant_1_primary": {"pass": False, "details": {}},
"invariant_2_escalation": {"pass": False, "details": {}},
"invariant_3_quarantine": {"pass": False, "details": {}},
"invariant_4_policy": {"pass": False, "details": {}},
}
# === Invariant 1: Primary ===
attempts = [e for e in window_events
if e["event_type"] == "GEN_ATTEMPT"]
outcomes = [e for e in window_events
if e["event_type"] in (
"GEN", "GEN_WARN", "GEN_DENY", "GEN_ERROR"
)]
attempt_ids = {e["event_id"] for e in attempts}
outcome_attempt_refs = {e.get("attempt_id") for e in outcomes}
unmatched_attempts = attempt_ids - outcome_attempt_refs
orphan_outcomes = outcome_attempt_refs - attempt_ids
results["invariant_1_primary"] = {
"pass": len(unmatched_attempts) == 0
and len(orphan_outcomes) == 0,
"total_attempts": len(attempts),
"total_outcomes": len(outcomes),
"unmatched_attempts": list(unmatched_attempts),
"orphan_outcomes": list(orphan_outcomes),
}
# === Invariant 2: Escalation Resolution ===
escalations = [e for e in window_events
if e["event_type"] == "GEN_ESCALATE"]
unresolved = [
e for e in escalations
if e.get("resolution_ref") is None
and (datetime.now(timezone.utc) -
datetime.fromisoformat(e["timestamp"])).total_seconds()
> 72 * 3600 # 72-hour threshold
]
results["invariant_2_escalation"] = {
"pass": len(unresolved) == 0,
"total_escalations": len(escalations),
"unresolved_over_72h": len(unresolved),
}
# === Invariant 3: Quarantine Resolution ===
quarantines = [e for e in window_events
if e["event_type"] == "GEN_QUARANTINE"]
unresolved_q = [
q for q in quarantines
if q.get("release_ref") is None
]
results["invariant_3_quarantine"] = {
"pass": len(unresolved_q) == 0,
"total_quarantined": len(quarantines),
"unresolved": len(unresolved_q),
}
# === Invariant 4: Policy Anchoring ===
policies = [e for e in window_events
if e["event_type"] == "POLICY_VERSION"]
backdated = []
for p in policies:
anchor_ts = p.get("external_anchor_ref_timestamp")
effective = p.get("effective_from")
if anchor_ts and effective and anchor_ts > effective:
backdated.append(p["event_id"])
results["invariant_4_policy"] = {
"pass": len(backdated) == 0,
"total_policies": len(policies),
"retroactively_anchored": backdated,
}
return results
Grok Counterfactual: What the Audit Trail Would Show
Let's model what the Grok case would look like with CAP-SRP v1.1 in place:
# === Simulating the Grok timeline with CAP-SRP v1.1 ===
# October 2025: xAI launches "Spicy Mode"
policy_event = PolicyVersionEvent(
event_id=uuid7(),
chain_id="grok-safety-chain",
prev_hash=None,
timestamp="2025-10-15T00:00:00Z",
policy_id="GROK-SAFETY-POLICY",
policy_version="8.0",
policy_hash=sha256("assume good intent... no restrictions on fictional adult..."),
effective_from="2025-10-15T00:00:00Z",
applicable_risk_categories=["CSAM_RISK", "NCII_RISK"],
)
# This policy would be externally anchored BEFORE Oct 15.
# Regulators can verify: "This was the active policy when
# Spicy Mode launched."
# December 26, 2025: First wave of CSAM generation requests
# With CAP-SRP, EVERY request is logged BEFORE safety eval:
attempt = {
"event_type": "GEN_ATTEMPT",
"event_id": uuid7(),
"prompt_hash": sha256("[CSAM request content]"),
"timestamp": "2025-12-26T03:14:22Z",
"model_id": "grok-2-image-v1",
}
# The safety system would then produce either GEN or GEN_DENY.
# If it produced GEN (allowed the generation):
# → The audit trail shows the request AND the approval.
# → No way to retroactively delete the attempt record.
# → Regulators can count: how many CSAM-category requests
# were approved vs. denied?
# January 9, 2026: Musk claims Grok is fixed
# With CAP-SRP, regulators don't need to trust this claim.
# They query the audit trail:
def assess_grok_fix(events, before_fix, after_fix):
"""
What regulators could verify with CAP-SRP.
"""
pre_fix = [e for e in events
if e["timestamp"] < before_fix
and "CSAM" in str(e.get("risk_categories", []))]
post_fix = [e for e in events
if e["timestamp"] > after_fix
and "CSAM" in str(e.get("risk_categories", []))]
pre_deny_rate = sum(
1 for e in pre_fix if e["event_type"] == "GEN_DENY"
) / max(len(pre_fix), 1)
post_deny_rate = sum(
1 for e in post_fix if e["event_type"] == "GEN_DENY"
) / max(len(post_fix), 1)
return {
"pre_fix_csam_deny_rate": pre_deny_rate,
"post_fix_csam_deny_rate": post_deny_rate,
"fix_effective": post_deny_rate > pre_deny_rate,
# Reuters found 82% failure rate in February.
# With CAP-SRP, this would be mathematically provable.
}
Grok Timeline — Trust-Us vs. Verify-This
═════════════════════════════════════════
Without CAP-SRP (what actually happened):
─────────────────────────────────────────
Oct 2025 │ "Spicy Mode" launched
Dec 2025 │ ~3M sexualized images, ~23K depicting minors
Jan 2026 │ Musk: "Grok refuses illegal requests"
Feb 2026 │ Reuters: 82% failure rate (45/55 prompts)
Mar 2026 │ Class-action lawsuit filed
│
│ Evidentiary process: adversarial discovery,
│ expert testimony, contested interpretation
│ of internal logs the defendant controls.
│ Estimated timeline: 2-4 years.
With CAP-SRP (the counterfactual):
──────────────────────────────────
Oct 2025 │ POLICY_VERSION anchored externally
│ → Regulators can see active policy
Dec 2025 │ GEN_ATTEMPT + GEN (not GEN_DENY) logged
│ → CSAM denial rate: <5% (provable)
Jan 2026 │ New POLICY_VERSION anchored
│ → Regulators verify: is new policy stricter?
Feb 2026 │ GEN_ATTEMPT + outcomes still logged
│ → Post-fix denial rate: still low (provable)
│ → Reuters investigation unnecessary
Mar 2026 │ Evidence Pack submitted to court
│ → Cryptographic proof, not contested logs
│ Estimated verification: hours, not years.
C2PA Integration: Connecting Both Halves of Provenance
C2PA answers "what was generated?" CAP-SRP answers "what was refused?" Together they form a complete provenance chain. Here's how the integration works:
C2PA + CAP-SRP: Complete AI Provenance
════════════════════════════════════════
User Request
│
▼
┌──────────────────┐
│ GEN_ATTEMPT │ ← CAP-SRP logs FIRST
│ (hash-chained) │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Safety Check │ ← Your existing pipeline
└────────┬─────────┘
│
┌─────────┴─────────┐
│ │
▼ ▼
┌──────────────┐ ┌──────────────┐
│ GEN_DENY │ │ GEN │
│ (CAP-SRP) │ │ (CAP-SRP) │
│ │ │ + │
│ No content │ │ C2PA │
│ → Evidence │ │ Manifest │
│ Pack │ │ (attached) │
└──────────────┘ └──────┬───────┘
│
▼
┌──────────────────┐
│ C2PA Assertion: │
│ cap_srp.ref = │
│ {audit_log_uri, │
│ event_id, │
│ chain_id} │
└──────────────────┘
The critical integration point is a custom C2PA assertion that links the generated content back to its audit trail entry:
def create_c2pa_srp_assertion(gen_event: dict,
audit_log_uri: str) -> dict:
"""
Create a C2PA custom assertion linking generated
content to its CAP-SRP audit trail entry.
This assertion is embedded in the C2PA manifest
alongside standard assertions (c2pa.actions,
c2pa.hash.data, etc.).
"""
return {
"label": "veritaschain.cap-srp.ref",
"data": {
"version": "1.1",
"audit_log_uri": audit_log_uri,
"event_id": gen_event["event_id"],
"chain_id": gen_event["chain_id"],
"attempt_id": gen_event["attempt_id"],
"event_hash": gen_event["event_hash"],
"merkle_root": gen_event.get("merkle_root"),
"verification_endpoint": f"{audit_log_uri}/verify",
}
}
def verify_c2pa_srp_link(c2pa_manifest: dict,
srp_events: List[dict]) -> dict:
"""
Full verification flow:
1. Validate C2PA manifest signature
2. Extract cap-srp.ref assertion
3. Fetch corresponding CAP-SRP event
4. Verify GEN event links to a GEN_ATTEMPT
5. Verify Completeness Invariant holds
6. Verify Merkle inclusion proof
"""
assertion = next(
(a for a in c2pa_manifest.get("assertions", [])
if a["label"] == "veritaschain.cap-srp.ref"),
None
)
if not assertion:
return {"verified": False,
"reason": "No CAP-SRP assertion in manifest"}
ref = assertion["data"]
# Find the GEN event
gen_event = next(
(e for e in srp_events
if e["event_id"] == ref["event_id"]),
None
)
if not gen_event:
return {"verified": False,
"reason": "GEN event not found in audit trail"}
# Find the linked GEN_ATTEMPT
attempt = next(
(e for e in srp_events
if e["event_id"] == gen_event.get("attempt_id")),
None
)
if not attempt:
return {"verified": False,
"reason": "No GEN_ATTEMPT for this generation"}
# Verify hash chain integrity
expected_hash = sha256(canonicalize({
k: v for k, v in gen_event.items()
if k not in ("event_hash", "signature")
}))
if expected_hash != gen_event["event_hash"]:
return {"verified": False,
"reason": "Event hash mismatch — tampered"}
return {
"verified": True,
"content_origin": "verified",
"attempt_logged": True,
"attempt_timestamp": attempt["timestamp"],
"generation_timestamp": gen_event["timestamp"],
"policy_version": gen_event.get(
"applied_policy_version_ref", "unknown"
),
"completeness": "verifiable",
}
The verification answers the complete regulatory question:
| Question | Answered By |
|---|---|
| What was generated? | C2PA Content Credentials |
| Who generated it? | C2PA manifest + signature |
| What was refused? | CAP-SRP GEN_DENY events |
| Why was it refused? | CAP-SRP risk category + policy reference |
| Is the log complete? | CAP-SRP Completeness Invariant (4 checks) |
| Was the policy current? | CAP-SRP POLICY_VERSION anchoring |
| Was the account flagged? | CAP-SRP ACCOUNT_ACTION (v1.1) |
| Was law enforcement notified? | CAP-SRP LAW_ENFORCEMENT_REFERRAL (v1.1) |
| Can we verify independently? | SCITT receipts + RFC 3161 timestamps |
Structural Attribution: Exonerating Innocent Services
Here's a scenario the Grok lawsuit makes concrete. Suppose CSAM images are found online. Two AI services — Service A and Service B — both could theoretically have generated them. Today, both services say "it wasn't us" and point at their internal logs. With CAP-SRP:
Cross-Service Attribution (Structural)
═══════════════════════════════════════
Service A (CAP-SRP enabled):
→ Produces Evidence Pack for time window
→ GEN_ATTEMPT count: 47,231
→ GEN_DENY (CSAM_RISK): 892
→ GEN (matching content hash): 0
→ Completeness Invariant: ✓ PASSES
→ Merkle proof: verifiable
→ Result: Service A is EXONERATED by math
Service B (internal logs only):
→ Says "our logs show we didn't generate it"
→ Logs are internal, mutable, unverifiable
→ No Completeness Invariant
→ No external anchoring
→ Result: Trust-us claim, unverifiable
→ Service B is the likely generation source
Conclusion: Structural attribution without
examining the actual generated content.
This isn't just about punishing offenders — it's about protecting innocent services. If Service A can cryptographically prove it refused the request, it is exonerated by math, not by self-attestation.
Threat Model: Why "Better Logging" Isn't Enough
A traditional server log can record the same events as CAP-SRP. The difference is in the threat model. CAP-SRP assumes the AI provider may have economic incentives to misrepresent their safety record. The specification provides cryptographic countermeasures:
| Threat | Attack | CAP-SRP Mitigation |
|---|---|---|
| Selective logging | Only log favorable outcomes | Completeness Invariant — gaps are detectable |
| Log modification | Alter historical records | SHA-256 hash chain — any change breaks chain |
| Backdating | Create records with false timestamps | RFC 3161 external anchoring via independent TSA |
| Split-view | Show different logs to different parties | Merkle tree — single root, inclusion proofs |
| Fabrication | Create false refusal records | Attempt-outcome pairing with pre-commitment |
| Policy manipulation | Retroactively tighten thresholds | Policy Anchoring Invariant (v1.1) |
| Account laundering | Delete evidence of enforcement decisions | ACCOUNT_ACTION hash chain (v1.1) |
The Grok case illustrates several of these threats simultaneously. When Musk claimed the system was fixed, Reuters found an 82% failure rate. Without CAP-SRP, resolving that contradiction requires independent testing by journalists — expensive, adversarial, and only possible for organizations with the resources to run controlled experiments. With CAP-SRP, the contradiction would be mathematically provable from the audit trail itself.
Here's every relevant deadline between now and the end of 2026:
| Deadline | Jurisdiction | Requirement | CAP-SRP Relevance |
|---|---|---|---|
| Mar 30, 2026 | EU | Code of Practice 2nd draft feedback closes | Last input window for refusal logging |
| May 19, 2026 | US | TAKE IT DOWN Act platform compliance | 48-hour SLA needs audit trail |
| Jun 2026 | EU | Code of Practice finalized | Defines compliance benchmark |
| Jun 30, 2026 | Colorado | AI Act effective | 3-year record retention |
| Aug 2, 2026 | EU | AI Act Article 50 transparency | Machine-readable content marking |
| Aug 2, 2026 | California | AI Transparency Act | Provenance disclosures mandatory |
| 2026 Q3 | UK | Ofcom AI enforcement | Grok investigation ongoing |
| Ongoing | India | IT Rules Amendment (Feb 20, 2026) | 2-hour sexual content removal |
What This Means for Developers
If you're building or maintaining an AI content generation system, here's the practical takeaway:
The legal landscape changed this month. The Grok class-action is the first lawsuit where "refused to implement safety measures" is a standalone theory of liability in the CSAM context. The question "can you prove your system actually refuses dangerous requests?" is no longer hypothetical — it's being asked in federal court. The answer today is "trust us." As of March 16, that answer is being tested under oath.
The implementation is a sidecar. CAP-SRP v1.1 doesn't require changes to your AI model, your safety evaluator, or your generation pipeline. It's a logging layer that sits alongside your existing system. The critical architectural requirement: log the GEN_ATTEMPT before the safety check runs. Everything else — hash chains, Merkle trees, external anchoring — is standard cryptographic engineering.
Start with Bronze, target Silver. Bronze-level conformance requires basic event logging with Ed25519 signatures and 6-month retention. That's achievable in a sprint. Silver adds the Completeness Invariant, daily external anchoring, and the v1.1 intermediate states. Gold adds ACCOUNT_ACTION logging, SCITT integration, HSM key management, and 5-year retention.
Implementation Path
═══════════════════
Sprint 1 (Bronze):
✓ GEN_ATTEMPT → GEN | GEN_DENY | GEN_ERROR
✓ SHA-256 hash chain + Ed25519 signatures
✓ 6-month retention
Effort: ~1 week for a senior developer
Sprint 2 (Silver):
✓ Pre-evaluation logging (ATTEMPT before safety check)
✓ GEN_WARN, GEN_ESCALATE, GEN_QUARANTINE
✓ POLICY_VERSION with external anchoring
✓ Primary Completeness Invariant verification
✓ Daily RFC 3161 timestamping
✓ Evidence Pack generation
Effort: ~2-3 weeks
Sprint 3 (Gold):
✓ ACCOUNT_ACTION + LAW_ENFORCEMENT_REFERRAL
✓ All four Completeness Invariants
✓ Hourly SCITT anchoring
✓ HSM for signing keys
✓ 5-year retention + crypto-shredding (GDPR)
Effort: ~4-6 weeks
The standards exist. CAP-SRP builds on IETF SCITT (architecture at draft-22), C2PA (specification 2.3), RFC 3161 (timestamping), COSE/CBOR (signing), and RFC 8785 (JSON canonicalization). An IETF Internet-Draft (draft-kamimura-scitt-refusal-events-02) positions this as a SCITT application profile.
Transparency Notes
About this analysis: This article fact-checks four real developments from March 2026 against primary sources. All claims are independently verified with source links provided inline.
About CAP-SRP: CAP-SRP is an open specification published under CC BY 4.0 by VeritasChain Standards Organization (VSO), founded in Tokyo. The specification is at v1.1 (released March 5, 2026). It has not been endorsed by major AI companies and is not yet an adopted IETF standard. The underlying standards it builds on — SCITT, C2PA, COSE/CBOR, RFC 3161 — are mature and widely implemented.
What CAP-SRP is:
- A technically sound approach to a genuine and well-documented gap
- Aligned with existing standards (C2PA, SCITT, RFC 3161)
- Available on GitHub: veritaschain/cap-spec (CC BY 4.0) · veritaschain/cap-srp (Apache 2.0)
What CAP-SRP is not (yet):
- An industry-endorsed standard
- An IETF RFC
- A guaranteed solution
The question is whether the industry builds some form of refusal provenance — whether CAP-SRP, a C2PA extension, an IETF SCITT profile, or something entirely new — before the courts and regulators force the answer. The deadlines are not theoretical anymore. The first lawsuit has landed.
Verify, don't trust. The code is the proof.
GitHub: veritaschain/cap-spec · Specification: CAP-SRP v1.1 · IETF Draft: draft-kamimura-scitt-refusal-events-02 · License: CC BY 4.0
Top comments (0)