DEV Community

Cover image for Proving What AI Didn't Generate: A Cryptographic Solution to the Grok Crisis

Proving What AI Didn't Generate: A Cryptographic Solution to the Grok Crisis

In January 2026, xAI's Grok was generating 6,700+ non-consensual sexual images per hour, including images of minors. Twelve jurisdictions launched investigations. California's Attorney General issued a cease-and-desist order.

But here's the technical problem no one talks about:

When regulators asked "Prove your safeguards worked," xAI couldn't answer. Because their refusals were never recorded.

This article shows you how to fix that with cryptographic proofs.


The Problem: Refusals Are a Black Hole

Every AI system today has the same blind spot:

Current AI Systems What We Need
Generated content → Logged ✓ Generated content → Logged ✓
Refused content → No record Refused content → Cryptographically proven

When your AI blocks a CSAM request, where's the proof? When your content filter catches a deepfake attempt, where's the audit trail?

Nowhere.

The refusals simply vanish. And when a regulator—or a lawsuit—comes knocking, "trust us, we blocked it" isn't going to cut it.


The Solution: Safe Refusal Provenance (SRP)

SRP is an extension of the Cryptographic Audit Protocol (CAP) that treats refusals as first-class cryptographic events.

The core insight: every generation attempt MUST have a recorded outcome.

┌───────────────────┐
│   GEN_ATTEMPT     │  ← Always recorded FIRST
└─────────┬─────────┘
          │
          ▼
┌───────────────────┐
│  Risk Assessment  │
└─────────┬─────────┘
          │
    ┌─────┴─────┐
    │           │
    ▼           ▼
┌────────┐  ┌─────────┐
│  GEN   │  │ GEN_DENY│
│(output)│  │(refusal)│
└────────┘  └─────────┘
Enter fullscreen mode Exit fullscreen mode

This prevents the attack vector: "You only showed us DENYs—where are the ALLOWs you're hiding?"


Let's Build It: Core Implementation

Here's the minimal implementation. Full source at github.com/veritaschain/cap-safe-refusal-provenance.

Event Structure

First, define our event types:

import hashlib
import json
import uuid
from datetime import datetime, timezone
from dataclasses import dataclass, asdict
from typing import Optional, Literal
from enum import Enum

class RiskCategory(Enum):
    CSAM_RISK = "CSAM_RISK"
    NCII_RISK = "NCII_RISK"
    MINOR_SEXUALIZATION = "MINOR_SEXUALIZATION"
    REAL_PERSON_DEEPFAKE = "REAL_PERSON_DEEPFAKE"
    VIOLENCE_EXTREME = "VIOLENCE_EXTREME"
    HATE_CONTENT = "HATE_CONTENT"

@dataclass
class SRPEvent:
    event_type: Literal["GEN_ATTEMPT", "GEN", "GEN_DENY"]
    event_id: str
    timestamp: str
    previous_hash: str
    event_hash: str = ""
    signature: str = ""

    def compute_hash(self) -> str:
        """Compute SHA-256 hash of event contents."""
        data = {k: v for k, v in asdict(self).items() 
                if k not in ['event_hash', 'signature']}
        canonical = json.dumps(data, sort_keys=True, separators=(',', ':'))
        return f"sha256:{hashlib.sha256(canonical.encode()).hexdigest()}"
Enter fullscreen mode Exit fullscreen mode

The Attempt Event

Every request starts here—before the safety check:

@dataclass
class GenAttemptEvent(SRPEvent):
    prompt_hash: str  # SHA-256 of original prompt (never store raw prompt!)
    input_type: str
    policy_id: str
    model_version: str
    session_id: str

    def __post_init__(self):
        self.event_type = "GEN_ATTEMPT"

def create_attempt(prompt: str, policy_id: str, model_version: str, 
                   previous_hash: str, session_id: str) -> GenAttemptEvent:
    """Create a GEN_ATTEMPT event. Call BEFORE safety check."""

    # Hash the prompt - NEVER store the original
    prompt_hash = f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}"

    event = GenAttemptEvent(
        event_type="GEN_ATTEMPT",
        event_id=str(uuid.uuid7()),  # Time-ordered UUID
        timestamp=datetime.now(timezone.utc).isoformat(),
        previous_hash=previous_hash,
        prompt_hash=prompt_hash,
        input_type="text",
        policy_id=policy_id,
        model_version=model_version,
        session_id=session_id
    )
    event.event_hash = event.compute_hash()
    return event
Enter fullscreen mode Exit fullscreen mode

Critical: The attempt is logged before you know if it will be allowed or denied. This makes selective logging mathematically impossible.

The Deny Event

When your safety system catches something:

@dataclass
class GenDenyEvent(SRPEvent):
    attempt_id: str
    risk_category: str
    risk_score: float
    refusal_reason: str
    model_decision: str = "DENY"

    def __post_init__(self):
        self.event_type = "GEN_DENY"

def create_denial(attempt: GenAttemptEvent, risk_category: RiskCategory,
                  risk_score: float, reason: str) -> GenDenyEvent:
    """Create a GEN_DENY event linked to its attempt."""

    event = GenDenyEvent(
        event_type="GEN_DENY",
        event_id=str(uuid.uuid7()),
        timestamp=datetime.now(timezone.utc).isoformat(),
        previous_hash=attempt.event_hash,  # Chain to the attempt
        attempt_id=attempt.event_id,
        risk_category=risk_category.value,
        risk_score=risk_score,
        refusal_reason=reason
    )
    event.event_hash = event.compute_hash()
    return event
Enter fullscreen mode Exit fullscreen mode

Adding Cryptographic Signatures

Hash chains prove order and integrity. Signatures prove authenticity.

try:
    from nacl.signing import SigningKey, VerifyKey
    from nacl.encoding import HexEncoder
    NACL_AVAILABLE = True
except ImportError:
    NACL_AVAILABLE = False

class SRPSigner:
    """Ed25519 signing for SRP events."""

    def __init__(self, private_key: Optional[bytes] = None):
        if not NACL_AVAILABLE:
            raise ImportError("PyNaCl required: pip install pynacl")

        if private_key:
            self.signing_key = SigningKey(private_key)
        else:
            self.signing_key = SigningKey.generate()

        self.verify_key = self.signing_key.verify_key

    def sign_event(self, event: SRPEvent) -> str:
        """Sign the event hash."""
        message = event.event_hash.encode()
        signed = self.signing_key.sign(message, encoder=HexEncoder)
        return f"ed25519:{signed.signature.decode()}"

    def verify_signature(self, event: SRPEvent) -> bool:
        """Verify event signature."""
        if not event.signature.startswith("ed25519:"):
            return False

        sig_hex = event.signature[8:]  # Remove "ed25519:" prefix
        message = event.event_hash.encode()

        try:
            self.verify_key.verify(message, bytes.fromhex(sig_hex))
            return True
        except Exception:
            return False

    def get_public_key_hex(self) -> str:
        """Get public key for third-party verification."""
        return self.verify_key.encode(encoder=HexEncoder).decode()
Enter fullscreen mode Exit fullscreen mode

The Completeness Invariant

Here's the mathematical guarantee that makes SRP audit-proof:

def verify_completeness(events: list[SRPEvent]) -> tuple[bool, dict]:
    """
    Verify the Completeness Invariant:
    ∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR

    Returns (is_valid, statistics)
    """
    attempts = set()
    outcomes = set()

    stats = {
        "total_attempts": 0,
        "total_generations": 0,
        "total_denials": 0,
        "total_errors": 0,
        "unmatched_attempts": [],
        "orphan_outcomes": []
    }

    for event in events:
        if event.event_type == "GEN_ATTEMPT":
            attempts.add(event.event_id)
            stats["total_attempts"] += 1
        elif event.event_type == "GEN":
            outcomes.add(event.attempt_id)
            stats["total_generations"] += 1
        elif event.event_type == "GEN_DENY":
            outcomes.add(event.attempt_id)
            stats["total_denials"] += 1
        elif event.event_type == "GEN_ERROR":
            outcomes.add(event.attempt_id)
            stats["total_errors"] += 1

    # Find mismatches
    stats["unmatched_attempts"] = list(attempts - outcomes)
    stats["orphan_outcomes"] = list(outcomes - attempts)

    # The invariant
    is_valid = (
        len(stats["unmatched_attempts"]) == 0 and
        len(stats["orphan_outcomes"]) == 0
    )

    return is_valid, stats
Enter fullscreen mode Exit fullscreen mode

Why this matters:

  • If unmatched_attempts > 0: You're hiding results
  • If orphan_outcomes > 0: You're fabricating refusals
  • If both are zero: Your audit trail is complete

Regulators love this. It's not "trust us"—it's math.


Building Evidence Packs

When the auditors come, give them this:

import os
from pathlib import Path

def create_evidence_pack(events: list[SRPEvent], 
                         output_dir: str,
                         signer: Optional[SRPSigner] = None) -> Path:
    """
    Create a complete evidence package for regulatory submission.

    Structure:
    evidence-pack-{id}/
    ├── manifest.json
    ├── events/
    │   ├── 0001-gen_attempt.json
    │   └── 0002-gen_deny.json
    ├── chain/
    │   └── hash_chain.json
    ├── statistics/
    │   └── refusal_stats.json  ← Auditors check this first
    └── verification/
        ├── public_key.txt
        └── instructions.md
    """

    pack_id = f"evidence-pack-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
    pack_path = Path(output_dir) / pack_id

    # Create directories
    (pack_path / "events").mkdir(parents=True)
    (pack_path / "chain").mkdir()
    (pack_path / "statistics").mkdir()
    (pack_path / "verification").mkdir()

    # Write events
    for i, event in enumerate(events, 1):
        filename = f"{i:04d}-{event.event_type.lower()}.json"
        with open(pack_path / "events" / filename, 'w') as f:
            json.dump(asdict(event), f, indent=2)

    # Write hash chain
    chain = [{"event_id": e.event_id, "event_hash": e.event_hash, 
              "previous_hash": e.previous_hash} for e in events]
    with open(pack_path / "chain" / "hash_chain.json", 'w') as f:
        json.dump(chain, f, indent=2)

    # Write statistics (MOST IMPORTANT FOR AUDITORS)
    is_valid, stats = verify_completeness(events)
    stats["chain_integrity"] = "VALID" if is_valid else "INVALID"
    stats["generated_at"] = datetime.now(timezone.utc).isoformat()

    with open(pack_path / "statistics" / "refusal_stats.json", 'w') as f:
        json.dump(stats, f, indent=2)

    # Write verification instructions
    instructions = """# Third-Party Verification Guide

## Quick Verification (2 minutes)

1. Recalculate each EventHash from event data
2. Verify hash chain linkage (each previousHash = prior eventHash)
3. Verify Ed25519 signatures against public key
4. Check completeness invariant in statistics/refusal_stats.json

## Automated Verification

Enter fullscreen mode Exit fullscreen mode


bash
python -m srp_core --verify {pack_path}


## What to Look For

- chain_integrity: VALID
- unmatched_attempts: [] (empty)
- orphan_outcomes: [] (empty)
"""
    with open(pack_path / "verification" / "instructions.md", 'w') as f:
        f.write(instructions)

    # Write public key if signer provided
    if signer:
        with open(pack_path / "verification" / "public_key.txt", 'w') as f:
            f.write(signer.get_public_key_hex())

    # Write manifest
    manifest = {
        "pack_id": pack_id,
        "created_at": datetime.now(timezone.utc).isoformat(),
        "event_count": len(events),
        "chain_integrity": "VALID" if is_valid else "INVALID",
        "has_signatures": signer is not None,
        "spec_version": "CAP-SRP v0.1"
    }
    with open(pack_path / "manifest.json", 'w') as f:
        json.dump(manifest, f, indent=2)

    return pack_path
Enter fullscreen mode Exit fullscreen mode

Real-World Demo: The Grok Scenario

Let's simulate what proper logging would have looked like:

def demo_grok_scenario():
    """
    Simulate the Grok incident with proper SRP logging.

    In December 2025, Grok lacked verifiable refusal records.
    This shows what SHOULD have happened.
    """

    signer = SRPSigner()
    events = []
    previous_hash = "sha256:" + "0" * 64  # Genesis hash

    # Scenario 1: CSAM attempt blocked
    print("=== Scenario 1: CSAM Request Blocked ===")

    attempt1 = create_attempt(
        prompt="[REDACTED - harmful content hash only]",
        policy_id="cap.safety.child-protection.v1",
        model_version="grok-2.0-image",
        previous_hash=previous_hash,
        session_id="sess-001"
    )
    attempt1.signature = signer.sign_event(attempt1)
    events.append(attempt1)

    denial1 = create_denial(
        attempt=attempt1,
        risk_category=RiskCategory.CSAM_RISK,
        risk_score=0.97,
        reason="Minor detected in reference context"
    )
    denial1.signature = signer.sign_event(denial1)
    events.append(denial1)

    print(f"  ATTEMPT logged: {attempt1.event_id[:8]}...")
    print(f"  DENIAL logged:  {denial1.event_id[:8]}...")
    print(f"  Risk category:  {denial1.risk_category}")
    print(f"  Risk score:     {denial1.risk_score}")
    print()

    # Scenario 2: NCII attempt blocked
    print("=== Scenario 2: NCII Request Blocked ===")

    attempt2 = create_attempt(
        prompt="[REDACTED - harmful content hash only]",
        policy_id="cap.safety.ncii-prevention.v1",
        model_version="grok-2.0-image",
        previous_hash=denial1.event_hash,
        session_id="sess-002"
    )
    attempt2.signature = signer.sign_event(attempt2)
    events.append(attempt2)

    denial2 = create_denial(
        attempt=attempt2,
        risk_category=RiskCategory.NCII_RISK,
        risk_score=0.94,
        reason="Non-consensual intimate imagery detected"
    )
    denial2.signature = signer.sign_event(denial2)
    events.append(denial2)

    print(f"  ATTEMPT logged: {attempt2.event_id[:8]}...")
    print(f"  DENIAL logged:  {denial2.event_id[:8]}...")
    print(f"  Risk category:  {denial2.risk_category}")
    print()

    # Verify and create evidence pack
    print("=== Verification ===")
    is_valid, stats = verify_completeness(events)
    print(f"  Completeness Invariant: {'✓ VALID' if is_valid else '✗ INVALID'}")
    print(f"  Total attempts: {stats['total_attempts']}")
    print(f"  Total denials:  {stats['total_denials']}")
    print()

    # Create evidence pack
    pack_path = create_evidence_pack(events, "./output", signer)
    print(f"=== Evidence Pack Created ===")
    print(f"  Location: {pack_path}")
    print()
    print("This evidence pack can now be submitted to regulators.")
    print("Third parties can verify integrity without accessing our systems.")

if __name__ == "__main__":
    demo_grok_scenario()
Enter fullscreen mode Exit fullscreen mode

Output:

=== Scenario 1: CSAM Request Blocked ===
  ATTEMPT logged: 019467a1...
  DENIAL logged:  019467a2...
  Risk category:  CSAM_RISK
  Risk score:     0.97

=== Scenario 2: NCII Request Blocked ===
  ATTEMPT logged: 019467a3...
  DENIAL logged:  019467a4...
  Risk category:  NCII_RISK

=== Verification ===
  Completeness Invariant: ✓ VALID
  Total attempts: 2
  Total denials:  2

=== Evidence Pack Created ===
  Location: ./output/evidence-pack-20260115-142345

This evidence pack can now be submitted to regulators.
Third parties can verify integrity without accessing our systems.
Enter fullscreen mode Exit fullscreen mode

Why This Matters for Developers

If you're building AI products:

You're going to face audits. The EU AI Act takes effect August 2026. The UK Online Safety Act is enforcing now. California just issued cease-and-desist orders.

"We have safety measures" is not a defense. "Here's cryptographic proof our safety measures worked" is.

If you're building safety systems:

Your filters catch harmful content. Great. But can you prove they worked? Can you prove completeness—that nothing slipped through the cracks?

SRP gives you that proof.

If you're in compliance/legal:

This is your new best friend. Evidence packs that third parties can verify. Statistics auditors actually trust. Mathematical guarantees instead of promises.


Quick Start

# Clone the repo
git clone https://github.com/veritaschain/cap-safe-refusal-provenance.git
cd cap-safe-refusal-provenance

# Install dependencies
pip install -r requirements.txt

# Run the demo
python examples/demo_scenarios.py

# Run tests
python tests/test_srp.py

# Verify an evidence pack
python src/srp_core.py --verify evidence-pack-xxx/
Enter fullscreen mode Exit fullscreen mode

The Bigger Picture

This PoC is part of the Verifiable AI Provenance (VAP) Framework—an open standard for making AI systems auditable.

The philosophy is simple:

"We don't just block harmful generations. We prove that they never happened."

Current AI safety is trust-based: "We have filters. Trust us."

SRP is verification-based: "We have filters. Here's the cryptographic proof they worked."

Verify, Don't Trust.


Resources


License

CC BY 4.0 International — Use it, build on it, make AI safer.


"AI needs a Flight Recorder."


What do you think? Have questions about implementing SRP in your AI system? Drop a comment below or open an issue on GitHub.

Top comments (0)