In January 2026, xAI's Grok was generating 6,700+ non-consensual sexual images per hour, including images of minors. Twelve jurisdictions launched investigations. California's Attorney General issued a cease-and-desist order.
But here's the technical problem no one talks about:
When regulators asked "Prove your safeguards worked," xAI couldn't answer. Because their refusals were never recorded.
This article shows you how to fix that with cryptographic proofs.
The Problem: Refusals Are a Black Hole
Every AI system today has the same blind spot:
| Current AI Systems | What We Need |
|---|---|
| Generated content → Logged ✓ | Generated content → Logged ✓ |
| Refused content → No record ❌ | Refused content → Cryptographically proven ✓ |
When your AI blocks a CSAM request, where's the proof? When your content filter catches a deepfake attempt, where's the audit trail?
Nowhere.
The refusals simply vanish. And when a regulator—or a lawsuit—comes knocking, "trust us, we blocked it" isn't going to cut it.
The Solution: Safe Refusal Provenance (SRP)
SRP is an extension of the Cryptographic Audit Protocol (CAP) that treats refusals as first-class cryptographic events.
The core insight: every generation attempt MUST have a recorded outcome.
┌───────────────────┐
│ GEN_ATTEMPT │ ← Always recorded FIRST
└─────────┬─────────┘
│
▼
┌───────────────────┐
│ Risk Assessment │
└─────────┬─────────┘
│
┌─────┴─────┐
│ │
▼ ▼
┌────────┐ ┌─────────┐
│ GEN │ │ GEN_DENY│
│(output)│ │(refusal)│
└────────┘ └─────────┘
This prevents the attack vector: "You only showed us DENYs—where are the ALLOWs you're hiding?"
Let's Build It: Core Implementation
Here's the minimal implementation. Full source at github.com/veritaschain/cap-safe-refusal-provenance.
Event Structure
First, define our event types:
import hashlib
import json
import uuid
from datetime import datetime, timezone
from dataclasses import dataclass, asdict
from typing import Optional, Literal
from enum import Enum
class RiskCategory(Enum):
CSAM_RISK = "CSAM_RISK"
NCII_RISK = "NCII_RISK"
MINOR_SEXUALIZATION = "MINOR_SEXUALIZATION"
REAL_PERSON_DEEPFAKE = "REAL_PERSON_DEEPFAKE"
VIOLENCE_EXTREME = "VIOLENCE_EXTREME"
HATE_CONTENT = "HATE_CONTENT"
@dataclass
class SRPEvent:
event_type: Literal["GEN_ATTEMPT", "GEN", "GEN_DENY"]
event_id: str
timestamp: str
previous_hash: str
event_hash: str = ""
signature: str = ""
def compute_hash(self) -> str:
"""Compute SHA-256 hash of event contents."""
data = {k: v for k, v in asdict(self).items()
if k not in ['event_hash', 'signature']}
canonical = json.dumps(data, sort_keys=True, separators=(',', ':'))
return f"sha256:{hashlib.sha256(canonical.encode()).hexdigest()}"
The Attempt Event
Every request starts here—before the safety check:
@dataclass
class GenAttemptEvent(SRPEvent):
prompt_hash: str # SHA-256 of original prompt (never store raw prompt!)
input_type: str
policy_id: str
model_version: str
session_id: str
def __post_init__(self):
self.event_type = "GEN_ATTEMPT"
def create_attempt(prompt: str, policy_id: str, model_version: str,
previous_hash: str, session_id: str) -> GenAttemptEvent:
"""Create a GEN_ATTEMPT event. Call BEFORE safety check."""
# Hash the prompt - NEVER store the original
prompt_hash = f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}"
event = GenAttemptEvent(
event_type="GEN_ATTEMPT",
event_id=str(uuid.uuid7()), # Time-ordered UUID
timestamp=datetime.now(timezone.utc).isoformat(),
previous_hash=previous_hash,
prompt_hash=prompt_hash,
input_type="text",
policy_id=policy_id,
model_version=model_version,
session_id=session_id
)
event.event_hash = event.compute_hash()
return event
Critical: The attempt is logged before you know if it will be allowed or denied. This makes selective logging mathematically impossible.
The Deny Event
When your safety system catches something:
@dataclass
class GenDenyEvent(SRPEvent):
attempt_id: str
risk_category: str
risk_score: float
refusal_reason: str
model_decision: str = "DENY"
def __post_init__(self):
self.event_type = "GEN_DENY"
def create_denial(attempt: GenAttemptEvent, risk_category: RiskCategory,
risk_score: float, reason: str) -> GenDenyEvent:
"""Create a GEN_DENY event linked to its attempt."""
event = GenDenyEvent(
event_type="GEN_DENY",
event_id=str(uuid.uuid7()),
timestamp=datetime.now(timezone.utc).isoformat(),
previous_hash=attempt.event_hash, # Chain to the attempt
attempt_id=attempt.event_id,
risk_category=risk_category.value,
risk_score=risk_score,
refusal_reason=reason
)
event.event_hash = event.compute_hash()
return event
Adding Cryptographic Signatures
Hash chains prove order and integrity. Signatures prove authenticity.
try:
from nacl.signing import SigningKey, VerifyKey
from nacl.encoding import HexEncoder
NACL_AVAILABLE = True
except ImportError:
NACL_AVAILABLE = False
class SRPSigner:
"""Ed25519 signing for SRP events."""
def __init__(self, private_key: Optional[bytes] = None):
if not NACL_AVAILABLE:
raise ImportError("PyNaCl required: pip install pynacl")
if private_key:
self.signing_key = SigningKey(private_key)
else:
self.signing_key = SigningKey.generate()
self.verify_key = self.signing_key.verify_key
def sign_event(self, event: SRPEvent) -> str:
"""Sign the event hash."""
message = event.event_hash.encode()
signed = self.signing_key.sign(message, encoder=HexEncoder)
return f"ed25519:{signed.signature.decode()}"
def verify_signature(self, event: SRPEvent) -> bool:
"""Verify event signature."""
if not event.signature.startswith("ed25519:"):
return False
sig_hex = event.signature[8:] # Remove "ed25519:" prefix
message = event.event_hash.encode()
try:
self.verify_key.verify(message, bytes.fromhex(sig_hex))
return True
except Exception:
return False
def get_public_key_hex(self) -> str:
"""Get public key for third-party verification."""
return self.verify_key.encode(encoder=HexEncoder).decode()
The Completeness Invariant
Here's the mathematical guarantee that makes SRP audit-proof:
def verify_completeness(events: list[SRPEvent]) -> tuple[bool, dict]:
"""
Verify the Completeness Invariant:
∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR
Returns (is_valid, statistics)
"""
attempts = set()
outcomes = set()
stats = {
"total_attempts": 0,
"total_generations": 0,
"total_denials": 0,
"total_errors": 0,
"unmatched_attempts": [],
"orphan_outcomes": []
}
for event in events:
if event.event_type == "GEN_ATTEMPT":
attempts.add(event.event_id)
stats["total_attempts"] += 1
elif event.event_type == "GEN":
outcomes.add(event.attempt_id)
stats["total_generations"] += 1
elif event.event_type == "GEN_DENY":
outcomes.add(event.attempt_id)
stats["total_denials"] += 1
elif event.event_type == "GEN_ERROR":
outcomes.add(event.attempt_id)
stats["total_errors"] += 1
# Find mismatches
stats["unmatched_attempts"] = list(attempts - outcomes)
stats["orphan_outcomes"] = list(outcomes - attempts)
# The invariant
is_valid = (
len(stats["unmatched_attempts"]) == 0 and
len(stats["orphan_outcomes"]) == 0
)
return is_valid, stats
Why this matters:
- If
unmatched_attempts > 0: You're hiding results - If
orphan_outcomes > 0: You're fabricating refusals - If both are zero: Your audit trail is complete
Regulators love this. It's not "trust us"—it's math.
Building Evidence Packs
When the auditors come, give them this:
import os
from pathlib import Path
def create_evidence_pack(events: list[SRPEvent],
output_dir: str,
signer: Optional[SRPSigner] = None) -> Path:
"""
Create a complete evidence package for regulatory submission.
Structure:
evidence-pack-{id}/
├── manifest.json
├── events/
│ ├── 0001-gen_attempt.json
│ └── 0002-gen_deny.json
├── chain/
│ └── hash_chain.json
├── statistics/
│ └── refusal_stats.json ← Auditors check this first
└── verification/
├── public_key.txt
└── instructions.md
"""
pack_id = f"evidence-pack-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
pack_path = Path(output_dir) / pack_id
# Create directories
(pack_path / "events").mkdir(parents=True)
(pack_path / "chain").mkdir()
(pack_path / "statistics").mkdir()
(pack_path / "verification").mkdir()
# Write events
for i, event in enumerate(events, 1):
filename = f"{i:04d}-{event.event_type.lower()}.json"
with open(pack_path / "events" / filename, 'w') as f:
json.dump(asdict(event), f, indent=2)
# Write hash chain
chain = [{"event_id": e.event_id, "event_hash": e.event_hash,
"previous_hash": e.previous_hash} for e in events]
with open(pack_path / "chain" / "hash_chain.json", 'w') as f:
json.dump(chain, f, indent=2)
# Write statistics (MOST IMPORTANT FOR AUDITORS)
is_valid, stats = verify_completeness(events)
stats["chain_integrity"] = "VALID" if is_valid else "INVALID"
stats["generated_at"] = datetime.now(timezone.utc).isoformat()
with open(pack_path / "statistics" / "refusal_stats.json", 'w') as f:
json.dump(stats, f, indent=2)
# Write verification instructions
instructions = """# Third-Party Verification Guide
## Quick Verification (2 minutes)
1. Recalculate each EventHash from event data
2. Verify hash chain linkage (each previousHash = prior eventHash)
3. Verify Ed25519 signatures against public key
4. Check completeness invariant in statistics/refusal_stats.json
## Automated Verification
bash
python -m srp_core --verify {pack_path}
## What to Look For
- chain_integrity: VALID
- unmatched_attempts: [] (empty)
- orphan_outcomes: [] (empty)
"""
with open(pack_path / "verification" / "instructions.md", 'w') as f:
f.write(instructions)
# Write public key if signer provided
if signer:
with open(pack_path / "verification" / "public_key.txt", 'w') as f:
f.write(signer.get_public_key_hex())
# Write manifest
manifest = {
"pack_id": pack_id,
"created_at": datetime.now(timezone.utc).isoformat(),
"event_count": len(events),
"chain_integrity": "VALID" if is_valid else "INVALID",
"has_signatures": signer is not None,
"spec_version": "CAP-SRP v0.1"
}
with open(pack_path / "manifest.json", 'w') as f:
json.dump(manifest, f, indent=2)
return pack_path
Real-World Demo: The Grok Scenario
Let's simulate what proper logging would have looked like:
def demo_grok_scenario():
"""
Simulate the Grok incident with proper SRP logging.
In December 2025, Grok lacked verifiable refusal records.
This shows what SHOULD have happened.
"""
signer = SRPSigner()
events = []
previous_hash = "sha256:" + "0" * 64 # Genesis hash
# Scenario 1: CSAM attempt blocked
print("=== Scenario 1: CSAM Request Blocked ===")
attempt1 = create_attempt(
prompt="[REDACTED - harmful content hash only]",
policy_id="cap.safety.child-protection.v1",
model_version="grok-2.0-image",
previous_hash=previous_hash,
session_id="sess-001"
)
attempt1.signature = signer.sign_event(attempt1)
events.append(attempt1)
denial1 = create_denial(
attempt=attempt1,
risk_category=RiskCategory.CSAM_RISK,
risk_score=0.97,
reason="Minor detected in reference context"
)
denial1.signature = signer.sign_event(denial1)
events.append(denial1)
print(f" ATTEMPT logged: {attempt1.event_id[:8]}...")
print(f" DENIAL logged: {denial1.event_id[:8]}...")
print(f" Risk category: {denial1.risk_category}")
print(f" Risk score: {denial1.risk_score}")
print()
# Scenario 2: NCII attempt blocked
print("=== Scenario 2: NCII Request Blocked ===")
attempt2 = create_attempt(
prompt="[REDACTED - harmful content hash only]",
policy_id="cap.safety.ncii-prevention.v1",
model_version="grok-2.0-image",
previous_hash=denial1.event_hash,
session_id="sess-002"
)
attempt2.signature = signer.sign_event(attempt2)
events.append(attempt2)
denial2 = create_denial(
attempt=attempt2,
risk_category=RiskCategory.NCII_RISK,
risk_score=0.94,
reason="Non-consensual intimate imagery detected"
)
denial2.signature = signer.sign_event(denial2)
events.append(denial2)
print(f" ATTEMPT logged: {attempt2.event_id[:8]}...")
print(f" DENIAL logged: {denial2.event_id[:8]}...")
print(f" Risk category: {denial2.risk_category}")
print()
# Verify and create evidence pack
print("=== Verification ===")
is_valid, stats = verify_completeness(events)
print(f" Completeness Invariant: {'✓ VALID' if is_valid else '✗ INVALID'}")
print(f" Total attempts: {stats['total_attempts']}")
print(f" Total denials: {stats['total_denials']}")
print()
# Create evidence pack
pack_path = create_evidence_pack(events, "./output", signer)
print(f"=== Evidence Pack Created ===")
print(f" Location: {pack_path}")
print()
print("This evidence pack can now be submitted to regulators.")
print("Third parties can verify integrity without accessing our systems.")
if __name__ == "__main__":
demo_grok_scenario()
Output:
=== Scenario 1: CSAM Request Blocked ===
ATTEMPT logged: 019467a1...
DENIAL logged: 019467a2...
Risk category: CSAM_RISK
Risk score: 0.97
=== Scenario 2: NCII Request Blocked ===
ATTEMPT logged: 019467a3...
DENIAL logged: 019467a4...
Risk category: NCII_RISK
=== Verification ===
Completeness Invariant: ✓ VALID
Total attempts: 2
Total denials: 2
=== Evidence Pack Created ===
Location: ./output/evidence-pack-20260115-142345
This evidence pack can now be submitted to regulators.
Third parties can verify integrity without accessing our systems.
Why This Matters for Developers
If you're building AI products:
You're going to face audits. The EU AI Act takes effect August 2026. The UK Online Safety Act is enforcing now. California just issued cease-and-desist orders.
"We have safety measures" is not a defense. "Here's cryptographic proof our safety measures worked" is.
If you're building safety systems:
Your filters catch harmful content. Great. But can you prove they worked? Can you prove completeness—that nothing slipped through the cracks?
SRP gives you that proof.
If you're in compliance/legal:
This is your new best friend. Evidence packs that third parties can verify. Statistics auditors actually trust. Mathematical guarantees instead of promises.
Quick Start
# Clone the repo
git clone https://github.com/veritaschain/cap-safe-refusal-provenance.git
cd cap-safe-refusal-provenance
# Install dependencies
pip install -r requirements.txt
# Run the demo
python examples/demo_scenarios.py
# Run tests
python tests/test_srp.py
# Verify an evidence pack
python src/srp_core.py --verify evidence-pack-xxx/
The Bigger Picture
This PoC is part of the Verifiable AI Provenance (VAP) Framework—an open standard for making AI systems auditable.
The philosophy is simple:
"We don't just block harmful generations. We prove that they never happened."
Current AI safety is trust-based: "We have filters. Trust us."
SRP is verification-based: "We have filters. Here's the cryptographic proof they worked."
Verify, Don't Trust.
Resources
- GitHub Repo: github.com/veritaschain/cap-safe-refusal-provenance
- CAP Specification: github.com/veritaschain/cap-spec
- IETF Draft: datatracker.ietf.org/doc/draft-kamimura-scitt-vcp
- Contact: standards@veritaschain.org
License
CC BY 4.0 International — Use it, build on it, make AI safer.
"AI needs a Flight Recorder."
What do you think? Have questions about implementing SRP in your AI system? Drop a comment below or open an issue on GitHub.
Top comments (0)