VeritasChain Standards Organization (VSO)

Posted on Jan 13

Proving What AI Didn't Generate: A Cryptographic Solution to the Grok Crisis

#ai #security #cryptography #opensource

The Problem No One Saw Coming

In late December 2025, something broke. Not in the usual way software breaks—with error messages and stack traces—but in a way that exposed a fundamental gap in how we think about AI accountability.

Grok, the generative AI integrated into X (formerly Twitter), became the center of an international crisis. Users discovered they could use the image editing feature to generate non-consensual intimate images (NCII) of real people. The UK's Internet Watch Foundation reported finding sexual images of children aged 11-13 on dark web forums. By January 2026, Indonesia and Malaysia had blocked access to the service, and the UK's Ofcom launched a formal investigation with the authority to impose fines up to 10% of global revenue.

But here's what really matters for us as developers: xAI couldn't prove their safeguards worked.

They could claim millions of harmful requests were blocked. They could point to their content policies. They could show internal dashboards. But they couldn't provide cryptographic proof that any specific request was actually refused. And neither could anyone else verify it.

This is the "negative proof" problem. And it's not unique to Grok—it's baked into how every major AI system handles content moderation today.

Why Internal Logs Aren't Enough

Let's be honest about what "logging" means in most AI systems:

# Typical AI content moderation logging
def generate_image(prompt: str) -> Image:
    if is_harmful(prompt):
        logger.info(f"Blocked harmful prompt: {hash(prompt)}")
        raise ContentPolicyViolation()
    return model.generate(prompt)

This looks reasonable. But consider what an external auditor or regulator actually sees:

Logs can be modified. There's no cryptographic chain linking entries.
Logs can be selectively deleted. Remove the embarrassing ones, keep the good ones.
Logs can be fabricated. Add entries for refusals that never happened.
Completeness is unverifiable. How do you prove nothing was omitted?

When Ofcom investigates X, they have to trust the data X provides. There's no independent verification mechanism. The platform marks its own homework.

This isn't a criticism of any specific company—it's a structural problem in how we've designed AI systems. We optimized for "what did we generate?" and completely ignored "what did we refuse to generate?"

Introducing CAP-SRP: Cryptographic Proof of Non-Generation

CAP-SRP (Content/Creative AI Profile – Safe Refusal Provenance) is an open specification that treats non-generation as a first-class, provable event.

The core insight is simple: if you want to prove you refused something, you need to record the refusal with the same rigor you'd record a financial transaction.

The Event Model

Every generation attempt produces exactly one of three outcomes:

GEN_ATTEMPT → GEN (successful generation)
            → GEN_DENY (refusal)
            → ERROR (system failure)

This is enforced through a Completeness Invariant:

∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ ERROR

If this equation doesn't balance, the audit trail is invalid. Period.

The Data Structure

Here's what a CAP-SRP refusal event looks like:

{
  "eventId": "01941a2b-3c4d-7e8f-9a0b-1c2d3e4f5a6b",
  "eventType": "GEN_DENY",
  "timestamp": "2026-01-13T14:32:17.847Z",
  "attemptRef": "01941a2b-3c4d-7e8f-9a0b-1c2d3e4f5a6a",
  "promptHash": "sha256:9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08",
  "refusalReason": "NCII_DETECTED",
  "confidenceScore": 0.97,
  "modelVersion": "grok-3.1-vision",
  "prevHash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
  "signature": "ed25519:..."
}

Key points:

promptHash: The harmful prompt itself isn't stored—only its SHA-256 hash. This enables verification without exposing the actual content.
attemptRef: Links back to the original GEN_ATTEMPT event, enforcing the completeness invariant.
prevHash: Creates a hash chain linking all events, making insertion/deletion detectable.
signature: Ed25519 signature over the entire event, proving authenticity.

Hash Chain Verification

Events are linked in a tamper-evident chain:

Event[0] → Event[1] → Event[2] → ... → Event[n]
   ↓          ↓          ↓                ↓
 hash₀  ←  hash₁  ←  hash₂  ←  ...  ←  hashₙ

Each event includes the hash of the previous event. To verify integrity:

def verify_chain(events: List[Event]) -> bool:
    for i in range(1, len(events)):
        expected_prev = sha256(serialize(events[i-1]))
        if events[i].prev_hash != expected_prev:
            return False
    return True

If anyone modifies, inserts, or deletes an event, the chain breaks. This is the same principle behind blockchain, but without the consensus overhead—you're not trying to achieve distributed agreement, just tamper-evidence.

External Anchoring

For additional assurance, CAP-SRP supports anchoring Merkle roots to external timestamping services:

                    Root Hash
                    /       \
               Hash₀₁       Hash₂₃
               /    \       /    \
            H(E₀)  H(E₁)  H(E₂)  H(E₃)

By publishing the Merkle root to a public timestamping service (RFC 3161) or a transparency log, you can prove that a specific set of events existed at a specific time—without revealing the events themselves.

Implementation Example

Here's a minimal Python implementation of the core concepts:

import hashlib
import json
from datetime import datetime, timezone
from typing import Optional
from dataclasses import dataclass, asdict
from enum import Enum
from nacl.signing import SigningKey, VerifyKey

class EventType(Enum):
    GEN_ATTEMPT = "GEN_ATTEMPT"
    GEN = "GEN"
    GEN_DENY = "GEN_DENY"
    ERROR = "ERROR"

@dataclass
class CAPEvent:
    event_id: str
    event_type: EventType
    timestamp: str
    attempt_ref: Optional[str]
    prompt_hash: Optional[str]
    prev_hash: str
    signature: Optional[bytes] = None

    def serialize_for_signing(self) -> bytes:
        data = asdict(self)
        data['event_type'] = self.event_type.value
        del data['signature']
        return json.dumps(data, sort_keys=True).encode()

    def compute_hash(self) -> str:
        return f"sha256:{hashlib.sha256(self.serialize_for_signing()).hexdigest()}"

class CAPLogger:
    def __init__(self, signing_key: SigningKey):
        self.signing_key = signing_key
        self.events: list[CAPEvent] = []
        self.prev_hash = "sha256:" + "0" * 64  # Genesis

    def _create_event(
        self, 
        event_type: EventType, 
        attempt_ref: Optional[str] = None,
        prompt_hash: Optional[str] = None
    ) -> CAPEvent:
        event = CAPEvent(
            event_id=self._generate_uuid7(),
            event_type=event_type,
            timestamp=datetime.now(timezone.utc).isoformat(),
            attempt_ref=attempt_ref,
            prompt_hash=prompt_hash,
            prev_hash=self.prev_hash
        )

        # Sign the event
        signed = self.signing_key.sign(event.serialize_for_signing())
        event.signature = signed.signature

        # Update chain
        self.prev_hash = event.compute_hash()
        self.events.append(event)

        return event

    def log_attempt(self, prompt: str) -> CAPEvent:
        prompt_hash = f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}"
        return self._create_event(
            EventType.GEN_ATTEMPT,
            prompt_hash=prompt_hash
        )

    def log_generation(self, attempt_event: CAPEvent) -> CAPEvent:
        return self._create_event(
            EventType.GEN,
            attempt_ref=attempt_event.event_id
        )

    def log_refusal(self, attempt_event: CAPEvent, reason: str) -> CAPEvent:
        return self._create_event(
            EventType.GEN_DENY,
            attempt_ref=attempt_event.event_id
        )

    def verify_completeness(self) -> bool:
        attempts = sum(1 for e in self.events if e.event_type == EventType.GEN_ATTEMPT)
        outcomes = sum(1 for e in self.events if e.event_type in 
                      [EventType.GEN, EventType.GEN_DENY, EventType.ERROR])
        return attempts == outcomes

    def _generate_uuid7(self) -> str:
        import uuid
        return str(uuid.uuid4())  # Simplified; real impl uses UUIDv7

# Usage
signing_key = SigningKey.generate()
logger = CAPLogger(signing_key)

# Log an attempt
attempt = logger.log_attempt("generate an image of...")

# Log the refusal
if is_harmful(prompt):
    refusal = logger.log_refusal(attempt, "POLICY_VIOLATION")
else:
    generation = logger.log_generation(attempt)

# Verify completeness invariant
assert logger.verify_completeness()

Privacy-Preserving Verification

One of the critical design goals is enabling verification without exposing harmful content. Here's how it works:

Scenario: A regulator wants to verify that a specific harmful prompt was refused.

Regulator has the original prompt (from a complaint)
Regulator computes: hash = sha256(prompt)
Regulator queries the CAP-SRP audit trail for events with matching promptHash
If a GEN_DENY event exists with that hash, the refusal is verified
The platform never sees the original prompt; the regulator never sees other events

def verify_refusal(audit_trail: List[CAPEvent], prompt: str) -> bool:
    target_hash = f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}"

    for event in audit_trail:
        if (event.event_type == EventType.GEN_DENY and 
            event.prompt_hash == target_hash):
            return True
    return False

This is similar to how password verification works—you never store the password, just the hash. But here, we're applying it to content moderation at scale.

Regulatory Alignment

CAP-SRP isn't just a technical exercise—it's designed to meet real regulatory requirements:

EU AI Act Article 12 (Logging)

High-risk AI systems shall technically allow for the automatic recording of events ('logs') over the lifetime of the system.

CAP-SRP provides exactly this: automatic, tamper-evident recording of all generation events, including refusals.

EU Digital Services Act Article 35 (Audits)

Very Large Online Platforms (VLOPs) like X face annual independent audit obligations. CAP-SRP's Evidence Pack format provides auditors with:

Cryptographically verifiable event chains
Merkle proofs for selective disclosure
Machine-readable schemas for automated analysis

US TAKE IT DOWN Act

Platforms must remove NCII within 48 hours. CAP-SRP provides the evidence structure to prove compliance—not just "we removed it" but "here's the cryptographic proof we never generated it in the first place."

Integration with Existing Standards

CAP-SRP doesn't exist in isolation. It's designed to complement existing transparency standards:

IETF SCITT

The Supply Chain Integrity, Transparency, and Trust (SCITT) architecture provides a general-purpose transparency framework. CAP-SRP is being submitted as a domain-specific profile: draft-kamimura-scitt-refusal-events.

C2PA (Content Credentials)

C2PA answers "Is this content authentic?" CAP-SRP answers "Why did the AI make this decision?" They're complementary:

C2PA: Passport for content (provenance)
CAP-SRP: Flight recorder for AI systems (accountability)

A complete transparency stack might look like:

Generated Image
    ├── C2PA Manifest (content provenance)
    │   └── Creator, modifications, AI tool used
    └── CAP-SRP Reference (system accountability)
        └── Link to audit trail proving generation was policy-compliant

The Theoretical Foundation

For those interested in the formal aspects, we've published a preprint:

"Proving Non-Generation: Cryptographic Completeness Guarantees for AI Content Moderation Logs"

DOI: 10.5281/zenodo.18213616

The paper formalizes the completeness invariant, analyzes attack vectors (selective omission, log forking, timestamp manipulation), and proves security properties under standard cryptographic assumptions.

Key theoretical results:

Completeness Detection: Any violation of the completeness invariant is detectable in O(n) time.
Omission Resistance: Selective log omission requires breaking the hash chain, which is computationally infeasible under SHA-256 collision resistance.
Temporal Ordering: UUIDv7 timestamps combined with external anchoring prevent backdating attacks.

Getting Started

Everything is open source under CC BY 4.0:

Specification & Schemas: github.com/veritaschain/cap-spec
Reference Implementation: github.com/veritaschain/cap-safe-refusal-provenance
Academic Paper: doi.org/10.5281/zenodo.18213616

To integrate CAP-SRP into your AI system:

Replace your existing logging with the CAP event model
Implement the hash chain for tamper-evidence
Add signature generation for authenticity
(Optional) Set up external anchoring for additional assurance
Expose an Evidence Pack endpoint for auditors

The specification is intentionally minimal—we're defining the data model and verification procedures, not mandating specific implementations.

The Bigger Picture

The Grok crisis wasn't really about Grok. It was about the gap between "trust me" and "verify me" in AI governance.

Right now, when an AI company says "we have safeguards," we have to take their word for it. When they say "we blocked millions of harmful requests," there's no way to verify it. When regulators investigate, they're dependent on data provided by the platform being investigated.

CAP-SRP doesn't solve AI safety. It doesn't prevent misuse. What it does is create the infrastructure for accountability—the ability to prove, cryptographically and to third parties, what your AI system actually did.

As AI systems become more capable and more integrated into critical infrastructure, this kind of verifiable accountability isn't optional. It's table stakes.

The era of "Trust Me" is over. It's time to build systems that can say "Verify Me."

About the Author

This article was written by the VeritasChain Standards Organization (VSO), an independent international standards body developing open specifications for algorithmic accountability. VSO is headquartered in Tokyo and maintains strict vendor neutrality.

Have questions or feedback? Drop a comment below or open an issue on GitHub. We're actively seeking implementers and contributors.

DEV Community