The Problem No One Saw Coming
In late December 2025, something broke. Not in the usual way software breaks—with error messages and stack traces—but in a way that exposed a fundamental gap in how we think about AI accountability.
Grok, the generative AI integrated into X (formerly Twitter), became the center of an international crisis. Users discovered they could use the image editing feature to generate non-consensual intimate images (NCII) of real people. The UK's Internet Watch Foundation reported finding sexual images of children aged 11-13 on dark web forums. By January 2026, Indonesia and Malaysia had blocked access to the service, and the UK's Ofcom launched a formal investigation with the authority to impose fines up to 10% of global revenue.
But here's what really matters for us as developers: xAI couldn't prove their safeguards worked.
They could claim millions of harmful requests were blocked. They could point to their content policies. They could show internal dashboards. But they couldn't provide cryptographic proof that any specific request was actually refused. And neither could anyone else verify it.
This is the "negative proof" problem. And it's not unique to Grok—it's baked into how every major AI system handles content moderation today.
Why Internal Logs Aren't Enough
Let's be honest about what "logging" means in most AI systems:
# Typical AI content moderation logging
def generate_image(prompt: str) -> Image:
if is_harmful(prompt):
logger.info(f"Blocked harmful prompt: {hash(prompt)}")
raise ContentPolicyViolation()
return model.generate(prompt)
This looks reasonable. But consider what an external auditor or regulator actually sees:
- Logs can be modified. There's no cryptographic chain linking entries.
- Logs can be selectively deleted. Remove the embarrassing ones, keep the good ones.
- Logs can be fabricated. Add entries for refusals that never happened.
- Completeness is unverifiable. How do you prove nothing was omitted?
When Ofcom investigates X, they have to trust the data X provides. There's no independent verification mechanism. The platform marks its own homework.
This isn't a criticism of any specific company—it's a structural problem in how we've designed AI systems. We optimized for "what did we generate?" and completely ignored "what did we refuse to generate?"
Introducing CAP-SRP: Cryptographic Proof of Non-Generation
CAP-SRP (Content/Creative AI Profile – Safe Refusal Provenance) is an open specification that treats non-generation as a first-class, provable event.
The core insight is simple: if you want to prove you refused something, you need to record the refusal with the same rigor you'd record a financial transaction.
The Event Model
Every generation attempt produces exactly one of three outcomes:
GEN_ATTEMPT → GEN (successful generation)
→ GEN_DENY (refusal)
→ ERROR (system failure)
This is enforced through a Completeness Invariant:
∑ GEN_ATTEMPT = ∑ GEN + ∑ GEN_DENY + ∑ ERROR
If this equation doesn't balance, the audit trail is invalid. Period.
The Data Structure
Here's what a CAP-SRP refusal event looks like:
{
"eventId": "01941a2b-3c4d-7e8f-9a0b-1c2d3e4f5a6b",
"eventType": "GEN_DENY",
"timestamp": "2026-01-13T14:32:17.847Z",
"attemptRef": "01941a2b-3c4d-7e8f-9a0b-1c2d3e4f5a6a",
"promptHash": "sha256:9f86d081884c7d659a2feaa0c55ad015a3bf4f1b2b0b822cd15d6c15b0f00a08",
"refusalReason": "NCII_DETECTED",
"confidenceScore": 0.97,
"modelVersion": "grok-3.1-vision",
"prevHash": "sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855",
"signature": "ed25519:..."
}
Key points:
-
promptHash: The harmful prompt itself isn't stored—only its SHA-256 hash. This enables verification without exposing the actual content. -
attemptRef: Links back to the originalGEN_ATTEMPTevent, enforcing the completeness invariant. -
prevHash: Creates a hash chain linking all events, making insertion/deletion detectable. -
signature: Ed25519 signature over the entire event, proving authenticity.
Hash Chain Verification
Events are linked in a tamper-evident chain:
Event[0] → Event[1] → Event[2] → ... → Event[n]
↓ ↓ ↓ ↓
hash₀ ← hash₁ ← hash₂ ← ... ← hashₙ
Each event includes the hash of the previous event. To verify integrity:
def verify_chain(events: List[Event]) -> bool:
for i in range(1, len(events)):
expected_prev = sha256(serialize(events[i-1]))
if events[i].prev_hash != expected_prev:
return False
return True
If anyone modifies, inserts, or deletes an event, the chain breaks. This is the same principle behind blockchain, but without the consensus overhead—you're not trying to achieve distributed agreement, just tamper-evidence.
External Anchoring
For additional assurance, CAP-SRP supports anchoring Merkle roots to external timestamping services:
Root Hash
/ \
Hash₀₁ Hash₂₃
/ \ / \
H(E₀) H(E₁) H(E₂) H(E₃)
By publishing the Merkle root to a public timestamping service (RFC 3161) or a transparency log, you can prove that a specific set of events existed at a specific time—without revealing the events themselves.
Implementation Example
Here's a minimal Python implementation of the core concepts:
import hashlib
import json
from datetime import datetime, timezone
from typing import Optional
from dataclasses import dataclass, asdict
from enum import Enum
from nacl.signing import SigningKey, VerifyKey
class EventType(Enum):
GEN_ATTEMPT = "GEN_ATTEMPT"
GEN = "GEN"
GEN_DENY = "GEN_DENY"
ERROR = "ERROR"
@dataclass
class CAPEvent:
event_id: str
event_type: EventType
timestamp: str
attempt_ref: Optional[str]
prompt_hash: Optional[str]
prev_hash: str
signature: Optional[bytes] = None
def serialize_for_signing(self) -> bytes:
data = asdict(self)
data['event_type'] = self.event_type.value
del data['signature']
return json.dumps(data, sort_keys=True).encode()
def compute_hash(self) -> str:
return f"sha256:{hashlib.sha256(self.serialize_for_signing()).hexdigest()}"
class CAPLogger:
def __init__(self, signing_key: SigningKey):
self.signing_key = signing_key
self.events: list[CAPEvent] = []
self.prev_hash = "sha256:" + "0" * 64 # Genesis
def _create_event(
self,
event_type: EventType,
attempt_ref: Optional[str] = None,
prompt_hash: Optional[str] = None
) -> CAPEvent:
event = CAPEvent(
event_id=self._generate_uuid7(),
event_type=event_type,
timestamp=datetime.now(timezone.utc).isoformat(),
attempt_ref=attempt_ref,
prompt_hash=prompt_hash,
prev_hash=self.prev_hash
)
# Sign the event
signed = self.signing_key.sign(event.serialize_for_signing())
event.signature = signed.signature
# Update chain
self.prev_hash = event.compute_hash()
self.events.append(event)
return event
def log_attempt(self, prompt: str) -> CAPEvent:
prompt_hash = f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}"
return self._create_event(
EventType.GEN_ATTEMPT,
prompt_hash=prompt_hash
)
def log_generation(self, attempt_event: CAPEvent) -> CAPEvent:
return self._create_event(
EventType.GEN,
attempt_ref=attempt_event.event_id
)
def log_refusal(self, attempt_event: CAPEvent, reason: str) -> CAPEvent:
return self._create_event(
EventType.GEN_DENY,
attempt_ref=attempt_event.event_id
)
def verify_completeness(self) -> bool:
attempts = sum(1 for e in self.events if e.event_type == EventType.GEN_ATTEMPT)
outcomes = sum(1 for e in self.events if e.event_type in
[EventType.GEN, EventType.GEN_DENY, EventType.ERROR])
return attempts == outcomes
def _generate_uuid7(self) -> str:
import uuid
return str(uuid.uuid4()) # Simplified; real impl uses UUIDv7
# Usage
signing_key = SigningKey.generate()
logger = CAPLogger(signing_key)
# Log an attempt
attempt = logger.log_attempt("generate an image of...")
# Log the refusal
if is_harmful(prompt):
refusal = logger.log_refusal(attempt, "POLICY_VIOLATION")
else:
generation = logger.log_generation(attempt)
# Verify completeness invariant
assert logger.verify_completeness()
Privacy-Preserving Verification
One of the critical design goals is enabling verification without exposing harmful content. Here's how it works:
Scenario: A regulator wants to verify that a specific harmful prompt was refused.
- Regulator has the original prompt (from a complaint)
- Regulator computes:
hash = sha256(prompt) - Regulator queries the CAP-SRP audit trail for events with matching
promptHash - If a
GEN_DENYevent exists with that hash, the refusal is verified - The platform never sees the original prompt; the regulator never sees other events
def verify_refusal(audit_trail: List[CAPEvent], prompt: str) -> bool:
target_hash = f"sha256:{hashlib.sha256(prompt.encode()).hexdigest()}"
for event in audit_trail:
if (event.event_type == EventType.GEN_DENY and
event.prompt_hash == target_hash):
return True
return False
This is similar to how password verification works—you never store the password, just the hash. But here, we're applying it to content moderation at scale.
Regulatory Alignment
CAP-SRP isn't just a technical exercise—it's designed to meet real regulatory requirements:
EU AI Act Article 12 (Logging)
High-risk AI systems shall technically allow for the automatic recording of events ('logs') over the lifetime of the system.
CAP-SRP provides exactly this: automatic, tamper-evident recording of all generation events, including refusals.
EU Digital Services Act Article 35 (Audits)
Very Large Online Platforms (VLOPs) like X face annual independent audit obligations. CAP-SRP's Evidence Pack format provides auditors with:
- Cryptographically verifiable event chains
- Merkle proofs for selective disclosure
- Machine-readable schemas for automated analysis
US TAKE IT DOWN Act
Platforms must remove NCII within 48 hours. CAP-SRP provides the evidence structure to prove compliance—not just "we removed it" but "here's the cryptographic proof we never generated it in the first place."
Integration with Existing Standards
CAP-SRP doesn't exist in isolation. It's designed to complement existing transparency standards:
IETF SCITT
The Supply Chain Integrity, Transparency, and Trust (SCITT) architecture provides a general-purpose transparency framework. CAP-SRP is being submitted as a domain-specific profile: draft-kamimura-scitt-refusal-events.
C2PA (Content Credentials)
C2PA answers "Is this content authentic?" CAP-SRP answers "Why did the AI make this decision?" They're complementary:
- C2PA: Passport for content (provenance)
- CAP-SRP: Flight recorder for AI systems (accountability)
A complete transparency stack might look like:
Generated Image
├── C2PA Manifest (content provenance)
│ └── Creator, modifications, AI tool used
└── CAP-SRP Reference (system accountability)
└── Link to audit trail proving generation was policy-compliant
The Theoretical Foundation
For those interested in the formal aspects, we've published a preprint:
"Proving Non-Generation: Cryptographic Completeness Guarantees for AI Content Moderation Logs"
The paper formalizes the completeness invariant, analyzes attack vectors (selective omission, log forking, timestamp manipulation), and proves security properties under standard cryptographic assumptions.
Key theoretical results:
- Completeness Detection: Any violation of the completeness invariant is detectable in O(n) time.
- Omission Resistance: Selective log omission requires breaking the hash chain, which is computationally infeasible under SHA-256 collision resistance.
- Temporal Ordering: UUIDv7 timestamps combined with external anchoring prevent backdating attacks.
Getting Started
Everything is open source under CC BY 4.0:
- Specification & Schemas: github.com/veritaschain/cap-spec
- Reference Implementation: github.com/veritaschain/cap-safe-refusal-provenance
- Academic Paper: doi.org/10.5281/zenodo.18213616
To integrate CAP-SRP into your AI system:
- Replace your existing logging with the CAP event model
- Implement the hash chain for tamper-evidence
- Add signature generation for authenticity
- (Optional) Set up external anchoring for additional assurance
- Expose an Evidence Pack endpoint for auditors
The specification is intentionally minimal—we're defining the data model and verification procedures, not mandating specific implementations.
The Bigger Picture
The Grok crisis wasn't really about Grok. It was about the gap between "trust me" and "verify me" in AI governance.
Right now, when an AI company says "we have safeguards," we have to take their word for it. When they say "we blocked millions of harmful requests," there's no way to verify it. When regulators investigate, they're dependent on data provided by the platform being investigated.
CAP-SRP doesn't solve AI safety. It doesn't prevent misuse. What it does is create the infrastructure for accountability—the ability to prove, cryptographically and to third parties, what your AI system actually did.
As AI systems become more capable and more integrated into critical infrastructure, this kind of verifiable accountability isn't optional. It's table stakes.
The era of "Trust Me" is over. It's time to build systems that can say "Verify Me."
About the Author
This article was written by the VeritasChain Standards Organization (VSO), an independent international standards body developing open specifications for algorithmic accountability. VSO is headquartered in Tokyo and maintains strict vendor neutrality.
- Website: veritaschain.org
- GitHub: github.com/veritaschain
- Contact: info@veritaschain.org
Have questions or feedback? Drop a comment below or open an issue on GitHub. We're actively seeking implementers and contributors.
Top comments (0)