VeritasChain Standards Organization (VSO)

Posted on Jan 22

Proving What AI Didn't Generate: A Cryptographic Solution to the Grok Crisis

#ai #python #opensource #security

In January 2026, xAI's Grok was generating 6,700+ non-consensual sexual images per hour, including images of minors. Twelve jurisdictions launched investigations. California's Attorney General issued a cease-and-desist order.

But here's the technical problem no one talks about:

When regulators asked "Prove your safeguards worked," xAI couldn't answer. Because their refusals were never recorded.

This article shows you how to fix that with cryptographic proofs.

The Problem: Refusals Are a Black Hole

Every AI system today has the same blind spot:

Current AI Systems	What We Need
Generated content → Logged ✓	Generated content → Logged ✓
Refused content → No record ❌	Refused content → Cryptographically proven ✓

When your AI blocks a CSAM request, where's the proof? When your content filter catches a deepfake attempt, where's the audit trail?

Nowhere.

The refusals simply vanish. And when a regulator—or a lawsuit—comes knocking, "trust us, we blocked it" isn't going to cut it.

The Solution: Safe Refusal Provenance (SRP)

SRP is an extension of the Cryptographic Audit Protocol (CAP) that treats refusals as first-class cryptographic events.

The core insight: every generation attempt MUST have a recorded outcome.

┌───────────────────┐
│   GEN_ATTEMPT     │  ← Always recorded FIRST
└─────────┬─────────┘
          │
          ▼
┌───────────────────┐
│  Risk Assessment  │
└─────────┬─────────┘
          │
    ┌─────┴─────┐
    │           │
    ▼           ▼
┌────────┐  ┌─────────┐
│  GEN   │  │ GEN_DENY│
│(output)│  │(refusal)│
└────────┘  └─────────┘

This prevents the attack vector: "You only showed us DENYs—where are the ALLOWs you're hiding?"

Let's Build It: Core Implementation

Here's the minimal implementation. Full source at github.com/veritaschain/cap-safe-refusal-provenance.

Event Structure

First, define our event types:

import hashlib
import json
import uuid
from datetime import datetime, timezone
from dataclasses import dataclass, asdict
from typing import Optional, Literal
from enum import Enum

class RiskCategory(Enum):
    CSAM_RISK = "CSAM_RISK"
    NCII_RISK = "NCII_RISK"
    MINOR_SEXUALIZATION = "MINOR_SEXUALIZATION"
    REAL_PERSON_DEEPFAKE = "REAL_PERSON_DEEPFAKE"
    VIOLENCE_EXTREME = "VIOLENCE_EXTREME"
    HATE_CONTENT = "HATE_CONTENT"

@dataclass
class SRPEvent:
    event_type: Literal["GEN_ATTEMPT", "GEN", "GEN_DENY"]
    event_id: str
    timestamp: str
    previous_hash: str
    event_hash: str = ""
    signature: str = ""

    def compute_hash(self) -> str:
        """Compute SHA-256 hash of event contents."""
        data = {k: v for k, v in asdict(self).items() 
                if k not in ['event_hash', 'signature']}
        canonical = json.dumps(data, sort_keys=True, separators=(',', ':'))
        return f"sha256:{hashlib.sha256(canonical.encode()).hexdigest()}"

The Attempt Event

Every request starts here—before the safety check:

@dataclass
class GenAttemptEvent(SRPEvent):
    prompt_hash: str  # SHA-256 of original prompt (never store raw prompt!)
    input_type: str
    policy_id: str
    model_version: str
    session_id: str

    def __post_init__(self):
        self.event_type = "GEN_ATTEMPT"

def create_attempt(prompt: str, policy_id: str, model_version: str, 
                   previous_hash: str, session_id: str) -> GenAttemptEvent:
    """Create a GEN_ATTEMPT event. Call BEFORE safety check."""

    # Hash the prompt - NEVER store the original
    prompt_hash = f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}"

    event = GenAttemptEvent(
        event_type="GEN_ATTEMPT",
        event_id=str(uuid.uuid7()),  # Time-ordered UUID
        timestamp=datetime.now(timezone.utc).isoformat(),
        previous_hash=previous_hash,
        prompt_hash=prompt_hash,
        input_type="text",
        policy_id=policy_id,
        model_version=model_version,
        session_id=session_id
    )
    event.event_hash = event.compute_hash()
    return event

Critical: The attempt is logged before you know if it will be allowed or denied. This makes selective logging mathematically impossible.

The Deny Event

When your safety system catches something:

@dataclass
class GenDenyEvent(SRPEvent):
    attempt_id: str
    risk_category: str
    risk_score: float
    refusal_reason: str
    model_decision: str = "DENY"

    def __post_init__(self):
        self.event_type = "GEN_DENY"

def create_denial(attempt: GenAttemptEvent, risk_category: RiskCategory,
                  risk_score: float, reason: str) -> GenDenyEvent:
    """Create a GEN_DENY event linked to its attempt."""

    event = GenDenyEvent(
        event_type="GEN_DENY",
        event_id=str(uuid.uuid7()),
        timestamp=datetime.now(timezone.utc).isoformat(),
        previous_hash=attempt.event_hash,  # Chain to the attempt
        attempt_id=attempt.event_id,
        risk_category=risk_category.value,
        risk_score=risk_score,
        refusal_reason=reason
    )
    event.event_hash = event.compute_hash()
    return event

Adding Cryptographic Signatures

Hash chains prove order and integrity. Signatures prove authenticity.

try:
    from nacl.signing import SigningKey, VerifyKey
    from nacl.encoding import HexEncoder
    NACL_AVAILABLE = True
except ImportError:
    NACL_AVAILABLE = False

class SRPSigner:
    """Ed25519 signing for SRP events."""

    def __init__(self, private_key: Optional[bytes] = None):
        if not NACL_AVAILABLE:
            raise ImportError("PyNaCl required: pip install pynacl")

        if private_key:
            self.signing_key = SigningKey(private_key)
        else:
            self.signing_key = SigningKey.generate()

        self.verify_key = self.signing_key.verify_key

    def sign_event(self, event: SRPEvent) -> str:
        """Sign the event hash."""
        message = event.event_hash.encode()
        signed = self.signing_key.sign(message, encoder=HexEncoder)
        return f"ed25519:{signed.signature.decode()}"

    def verify_signature(self, event: SRPEvent) -> bool:
        """Verify event signature."""
        if not event.signature.startswith("ed25519:"):
            return False

        sig_hex = event.signature[8:]  # Remove "ed25519:" prefix
        message = event.event_hash.encode()

        try:
            self.verify_key.verify(message, bytes.fromhex(sig_hex))
            return True
        except Exception:
            return False

    def get_public_key_hex(self) -> str:
        """Get public key for third-party verification."""
        return self.verify_key.encode(encoder=HexEncoder).decode()

The Completeness Invariant

Here's the mathematical guarantee that makes SRP audit-proof:

def verify_completeness(events: list[SRPEvent]) -> tuple[bool, dict]:
    """
    Verify the Completeness Invariant:
    ∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ GEN_ERROR

    Returns (is_valid, statistics)
    """
    attempts = set()
    outcomes = set()

    stats = {
        "total_attempts": 0,
        "total_generations": 0,
        "total_denials": 0,
        "total_errors": 0,
        "unmatched_attempts": [],
        "orphan_outcomes": []
    }

    for event in events:
        if event.event_type == "GEN_ATTEMPT":
            attempts.add(event.event_id)
            stats["total_attempts"] += 1
        elif event.event_type == "GEN":
            outcomes.add(event.attempt_id)
            stats["total_generations"] += 1
        elif event.event_type == "GEN_DENY":
            outcomes.add(event.attempt_id)
            stats["total_denials"] += 1
        elif event.event_type == "GEN_ERROR":
            outcomes.add(event.attempt_id)
            stats["total_errors"] += 1

    # Find mismatches
    stats["unmatched_attempts"] = list(attempts - outcomes)
    stats["orphan_outcomes"] = list(outcomes - attempts)

    # The invariant
    is_valid = (
        len(stats["unmatched_attempts"]) == 0 and
        len(stats["orphan_outcomes"]) == 0
    )

    return is_valid, stats

Why this matters:

If unmatched_attempts > 0: You're hiding results
If orphan_outcomes > 0: You're fabricating refusals
If both are zero: Your audit trail is complete

Regulators love this. It's not "trust us"—it's math.

Building Evidence Packs

When the auditors come, give them this:

import os
from pathlib import Path

def create_evidence_pack(events: list[SRPEvent], 
                         output_dir: str,
                         signer: Optional[SRPSigner] = None) -> Path:
    """
    Create a complete evidence package for regulatory submission.

    Structure:
    evidence-pack-{id}/
    ├── manifest.json
    ├── events/
    │   ├── 0001-gen_attempt.json
    │   └── 0002-gen_deny.json
    ├── chain/
    │   └── hash_chain.json
    ├── statistics/
    │   └── refusal_stats.json  ← Auditors check this first
    └── verification/
        ├── public_key.txt
        └── instructions.md
    """

    pack_id = f"evidence-pack-{datetime.now().strftime('%Y%m%d-%H%M%S')}"
    pack_path = Path(output_dir) / pack_id

    # Create directories
    (pack_path / "events").mkdir(parents=True)
    (pack_path / "chain").mkdir()
    (pack_path / "statistics").mkdir()
    (pack_path / "verification").mkdir()

    # Write events
    for i, event in enumerate(events, 1):
        filename = f"{i:04d}-{event.event_type.lower()}.json"
        with open(pack_path / "events" / filename, 'w') as f:
            json.dump(asdict(event), f, indent=2)

    # Write hash chain
    chain = [{"event_id": e.event_id, "event_hash": e.event_hash, 
              "previous_hash": e.previous_hash} for e in events]
    with open(pack_path / "chain" / "hash_chain.json", 'w') as f:
        json.dump(chain, f, indent=2)

    # Write statistics (MOST IMPORTANT FOR AUDITORS)
    is_valid, stats = verify_completeness(events)
    stats["chain_integrity"] = "VALID" if is_valid else "INVALID"
    stats["generated_at"] = datetime.now(timezone.utc).isoformat()

    with open(pack_path / "statistics" / "refusal_stats.json", 'w') as f:
        json.dump(stats, f, indent=2)

    # Write verification instructions
    instructions = """# Third-Party Verification Guide

## Quick Verification (2 minutes)

1. Recalculate each EventHash from event data
2. Verify hash chain linkage (each previousHash = prior eventHash)
3. Verify Ed25519 signatures against public key
4. Check completeness invariant in statistics/refusal_stats.json

## Automated Verification

bash
python -m srp_core --verify {pack_path}


## What to Look For

- chain_integrity: VALID
- unmatched_attempts: [] (empty)
- orphan_outcomes: [] (empty)
"""
    with open(pack_path / "verification" / "instructions.md", 'w') as f:
        f.write(instructions)

    # Write public key if signer provided
    if signer:
        with open(pack_path / "verification" / "public_key.txt", 'w') as f:
            f.write(signer.get_public_key_hex())

    # Write manifest
    manifest = {
        "pack_id": pack_id,
        "created_at": datetime.now(timezone.utc).isoformat(),
        "event_count": len(events),
        "chain_integrity": "VALID" if is_valid else "INVALID",
        "has_signatures": signer is not None,
        "spec_version": "CAP-SRP v0.1"
    }
    with open(pack_path / "manifest.json", 'w') as f:
        json.dump(manifest, f, indent=2)

    return pack_path

Real-World Demo: The Grok Scenario

Let's simulate what proper logging would have looked like:

def demo_grok_scenario():
    """
    Simulate the Grok incident with proper SRP logging.

    In December 2025, Grok lacked verifiable refusal records.
    This shows what SHOULD have happened.
    """

    signer = SRPSigner()
    events = []
    previous_hash = "sha256:" + "0" * 64  # Genesis hash

    # Scenario 1: CSAM attempt blocked
    print("=== Scenario 1: CSAM Request Blocked ===")

    attempt1 = create_attempt(
        prompt="[REDACTED - harmful content hash only]",
        policy_id="cap.safety.child-protection.v1",
        model_version="grok-2.0-image",
        previous_hash=previous_hash,
        session_id="sess-001"
    )
    attempt1.signature = signer.sign_event(attempt1)
    events.append(attempt1)

    denial1 = create_denial(
        attempt=attempt1,
        risk_category=RiskCategory.CSAM_RISK,
        risk_score=0.97,
        reason="Minor detected in reference context"
    )
    denial1.signature = signer.sign_event(denial1)
    events.append(denial1)

    print(f"  ATTEMPT logged: {attempt1.event_id[:8]}...")
    print(f"  DENIAL logged:  {denial1.event_id[:8]}...")
    print(f"  Risk category:  {denial1.risk_category}")
    print(f"  Risk score:     {denial1.risk_score}")
    print()

    # Scenario 2: NCII attempt blocked
    print("=== Scenario 2: NCII Request Blocked ===")

    attempt2 = create_attempt(
        prompt="[REDACTED - harmful content hash only]",
        policy_id="cap.safety.ncii-prevention.v1",
        model_version="grok-2.0-image",
        previous_hash=denial1.event_hash,
        session_id="sess-002"
    )
    attempt2.signature = signer.sign_event(attempt2)
    events.append(attempt2)

    denial2 = create_denial(
        attempt=attempt2,
        risk_category=RiskCategory.NCII_RISK,
        risk_score=0.94,
        reason="Non-consensual intimate imagery detected"
    )
    denial2.signature = signer.sign_event(denial2)
    events.append(denial2)

    print(f"  ATTEMPT logged: {attempt2.event_id[:8]}...")
    print(f"  DENIAL logged:  {denial2.event_id[:8]}...")
    print(f"  Risk category:  {denial2.risk_category}")
    print()

    # Verify and create evidence pack
    print("=== Verification ===")
    is_valid, stats = verify_completeness(events)
    print(f"  Completeness Invariant: {'✓ VALID' if is_valid else '✗ INVALID'}")
    print(f"  Total attempts: {stats['total_attempts']}")
    print(f"  Total denials:  {stats['total_denials']}")
    print()

    # Create evidence pack
    pack_path = create_evidence_pack(events, "./output", signer)
    print(f"=== Evidence Pack Created ===")
    print(f"  Location: {pack_path}")
    print()
    print("This evidence pack can now be submitted to regulators.")
    print("Third parties can verify integrity without accessing our systems.")

if __name__ == "__main__":
    demo_grok_scenario()

Output:

=== Scenario 1: CSAM Request Blocked ===
  ATTEMPT logged: 019467a1...
  DENIAL logged:  019467a2...
  Risk category:  CSAM_RISK
  Risk score:     0.97

=== Scenario 2: NCII Request Blocked ===
  ATTEMPT logged: 019467a3...
  DENIAL logged:  019467a4...
  Risk category:  NCII_RISK

=== Verification ===
  Completeness Invariant: ✓ VALID
  Total attempts: 2
  Total denials:  2

=== Evidence Pack Created ===
  Location: ./output/evidence-pack-20260115-142345

This evidence pack can now be submitted to regulators.
Third parties can verify integrity without accessing our systems.

Why This Matters for Developers

If you're building AI products:

You're going to face audits. The EU AI Act takes effect August 2026. The UK Online Safety Act is enforcing now. California just issued cease-and-desist orders.

"We have safety measures" is not a defense. "Here's cryptographic proof our safety measures worked" is.

If you're building safety systems:

Your filters catch harmful content. Great. But can you prove they worked? Can you prove completeness—that nothing slipped through the cracks?

SRP gives you that proof.

If you're in compliance/legal:

This is your new best friend. Evidence packs that third parties can verify. Statistics auditors actually trust. Mathematical guarantees instead of promises.

Quick Start

# Clone the repo
git clone https://github.com/veritaschain/cap-safe-refusal-provenance.git
cd cap-safe-refusal-provenance

# Install dependencies
pip install -r requirements.txt

# Run the demo
python examples/demo_scenarios.py

# Run tests
python tests/test_srp.py

# Verify an evidence pack
python src/srp_core.py --verify evidence-pack-xxx/

The Bigger Picture

This PoC is part of the Verifiable AI Provenance (VAP) Framework—an open standard for making AI systems auditable.

The philosophy is simple:

"We don't just block harmful generations. We prove that they never happened."

Current AI safety is trust-based: "We have filters. Trust us."

SRP is verification-based: "We have filters. Here's the cryptographic proof they worked."

Verify, Don't Trust.

Resources

GitHub Repo: github.com/veritaschain/cap-safe-refusal-provenance
CAP Specification: github.com/veritaschain/cap-spec
IETF Draft: datatracker.ietf.org/doc/draft-kamimura-scitt-vcp
Contact: standards@veritaschain.org

License

CC BY 4.0 International — Use it, build on it, make AI safer.

"AI needs a Flight Recorder."

What do you think? Have questions about implementing SRP in your AI system? Drop a comment below or open an issue on GitHub.

DEV Community