The Problem: AI Content Without Receipts
You ship an AI-powered feature. Three months later, your legal team forwards an email:
"Your generated character design infringes our client's copyright. Produce all training data and generation logs within 14 days."
You check your logs. They show... nothing useful. Timestamps, maybe. Model names. But nothing that proves what went in or what came out of your AI pipeline.
This isn't hypothetical. In 2024-2025, we've seen:
- Getty Images v. Stability AI (ongoing)
- Major game studios hit with IP claims over AI-assisted art
- The EU AI Act mandating "logging capabilities" (Article 12)
The question isn't if you'll need provenance records. It's when.
Enter CAP: Content AI Profile
CAP is a domain profile of the Verifiable AI Provenance (VAP) Framework, designed specifically for content creation workflows. Think of it as a flight recorder for your AI pipeline.
┌─────────────────────────────────────────────────────────┐
│ VAP Framework │
│ ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐ │
│ │ VCP │ │ CAP │ │ DVP │ │ MAP │ │
│ │ Finance │ │ Content │ │ Auto │ │ Medical │ │
│ └─────────┘ └─────────┘ └─────────┘ └─────────┘ │
└─────────────────────────────────────────────────────────┘
CAP doesn't block AI usage or judge content. It records what happened so you can prove it later.
Core Concepts: 4 Events, 1 Chain
CAP tracks four event types across any AI content workflow:
| Event | What It Captures |
|---|---|
INGEST |
Asset enters the pipeline (training data, reference images) |
TRAIN |
Model training/fine-tuning occurs |
GEN |
Content generation happens |
EXPORT |
Asset leaves the pipeline (delivery, publication) |
All events are linked via a hash chain, making tampering detectable.
INGEST₁ → INGEST₂ → TRAIN₁ → GEN₁ → GEN₂ → EXPORT₁
│ │ │ │ │ │
└──────────┴─────────┴───────┴───────┴────────┘
Hash Chain
Let's Build It: Minimal Implementation
Here's a working Python implementation you can drop into your pipeline today.
Step 1: Define the Event Schema
from dataclasses import dataclass, field, asdict
from datetime import datetime, timezone
from typing import Optional, List, Literal
from enum import Enum
import hashlib
import json
import uuid
class EventType(str, Enum):
INGEST = "INGEST"
TRAIN = "TRAIN"
GEN = "GEN"
EXPORT = "EXPORT"
class RightsBasis(str, Enum):
OWNED = "OWNED"
LICENSED = "LICENSED"
PUBLIC_DOMAIN = "PUBLIC_DOMAIN"
CREATIVE_COMMONS = "CREATIVE_COMMONS"
FAIR_USE = "FAIR_USE"
UNKNOWN = "UNKNOWN"
class ConfidentialityLevel(str, Enum):
PUBLIC = "PUBLIC"
INTERNAL = "INTERNAL"
CONFIDENTIAL = "CONFIDENTIAL"
SECRET = "SECRET"
PRE_RELEASE = "PRE_RELEASE"
@dataclass
class CAPEvent:
event_type: EventType
timestamp: str = field(default_factory=lambda: datetime.now(timezone.utc).isoformat())
event_id: str = field(default_factory=lambda: str(uuid.uuid4()))
chain_id: str = ""
prev_hash: str = ""
# Asset identification
asset_id: Optional[str] = None
asset_type: Optional[str] = None
asset_hash: Optional[str] = None
# Rights and consent
rights_basis: Optional[RightsBasis] = None
confidentiality_level: Optional[ConfidentialityLevel] = None
# Context
user_id: Optional[str] = None
role: Optional[str] = None
model_id: Optional[str] = None
# Event-specific fields
input_asset_ids: Optional[List[str]] = None
output_asset_id: Optional[str] = None
destination: Optional[str] = None
def to_canonical_json(self) -> str:
"""RFC 8785 compliant canonical JSON for hashing"""
data = {k: v for k, v in asdict(self).items() if v is not None}
return json.dumps(data, sort_keys=True, separators=(',', ':'))
def compute_hash(self) -> str:
"""SHA-256 hash of canonical representation"""
return hashlib.sha256(self.to_canonical_json().encode()).hexdigest()
Step 2: Build the Hash Chain
class CAPChain:
def __init__(self, chain_id: Optional[str] = None):
self.chain_id = chain_id or str(uuid.uuid4())
self.events: List[CAPEvent] = []
self.current_hash = "0" * 64 # Genesis hash
def append(self, event: CAPEvent) -> CAPEvent:
"""Add event to chain with hash linking"""
event.chain_id = self.chain_id
event.prev_hash = self.current_hash
self.current_hash = event.compute_hash()
self.events.append(event)
return event
def verify(self) -> bool:
"""Verify chain integrity"""
prev_hash = "0" * 64
for event in self.events:
if event.prev_hash != prev_hash:
return False
prev_hash = event.compute_hash()
return True
def to_evidence_pack(self) -> dict:
"""Export as Evidence Pack"""
return {
"manifest": {
"chain_id": self.chain_id,
"created": datetime.now(timezone.utc).isoformat(),
"chain_length": len(self.events),
"head_hash": self.current_hash
},
"events": [asdict(e) for e in self.events]
}
Step 3: Integrate with Your Pipeline
def hash_file(filepath: str) -> str:
"""Compute SHA-256 of file contents"""
sha256 = hashlib.sha256()
with open(filepath, 'rb') as f:
for chunk in iter(lambda: f.read(8192), b''):
sha256.update(chunk)
return sha256.hexdigest()
# Initialize chain for a project
chain = CAPChain()
# === INGEST: Log training data intake ===
for image_path in training_images:
event = CAPEvent(
event_type=EventType.INGEST,
asset_id=f"training-{Path(image_path).stem}",
asset_type="IMAGE",
asset_hash=hash_file(image_path),
rights_basis=RightsBasis.OWNED,
confidentiality_level=ConfidentialityLevel.INTERNAL,
user_id="artist-001",
role="CREATOR"
)
chain.append(event)
# === TRAIN: Log fine-tuning ===
train_event = CAPEvent(
event_type=EventType.TRAIN,
model_id="sd-xl-lora-v1",
input_asset_ids=[e.asset_id for e in chain.events if e.event_type == EventType.INGEST],
user_id="ml-engineer-001",
role="ENGINEER"
)
chain.append(train_event)
# === GEN: Log generation ===
gen_event = CAPEvent(
event_type=EventType.GEN,
model_id="sd-xl-lora-v1",
output_asset_id="generated-hero-001",
asset_hash=hash_file("output/hero_001.png"),
user_id="artist-001",
role="CREATOR"
)
chain.append(gen_event)
# === EXPORT: Log delivery ===
export_event = CAPEvent(
event_type=EventType.EXPORT,
asset_id="generated-hero-001",
destination="publisher-review",
user_id="manager-001",
role="MANAGER"
)
chain.append(export_event)
# Verify and export
assert chain.verify(), "Chain integrity compromised!"
evidence = chain.to_evidence_pack()
The Killer Feature: Negative Proof
Here's what makes CAP different from regular logging:
CAP enables not only proof of use, but also negative proof — the ability to demonstrate that specific assets were NOT ingested, trained on, or referenced.
When someone claims "you trained on my art," you can:
- Export the Evidence Pack for the relevant time period
- Show the complete, hash-chained INGEST log
- Prove the absence of their asset in your pipeline
def prove_non_ingestion(chain: CAPChain, disputed_asset_hash: str) -> dict:
"""Generate negative proof report"""
ingest_events = [e for e in chain.events if e.event_type == EventType.INGEST]
all_hashes = {e.asset_hash for e in ingest_events}
return {
"disputed_asset_hash": disputed_asset_hash,
"found_in_chain": disputed_asset_hash in all_hashes,
"chain_coverage": {
"start": ingest_events[0].timestamp if ingest_events else None,
"end": ingest_events[-1].timestamp if ingest_events else None,
"total_assets": len(ingest_events)
},
"chain_integrity": chain.verify(),
"chain_head_hash": chain.current_hash
}
# Usage
report = prove_non_ingestion(chain, "abc123...disputed_hash...")
# Returns proof that disputed asset was never ingested
This is the "devil's proof" problem solved. You can now prove a negative.
Real-World Integration Patterns
Pattern 1: Sidecar Architecture
Don't modify your existing pipeline. Run CAP as a sidecar:
┌─────────────────────────────────────────────────────────┐
│ Existing Pipeline │
│ ┌──────┐ ┌──────┐ ┌──────┐ ┌──────┐ │
│ │Ingest│───▶│Train │───▶│ Gen │───▶│Export│ │
│ └──┬───┘ └──┬───┘ └──┬───┘ └──┬───┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────┐ │
│ │ CAP Sidecar Logger │ │
│ │ (Listens to events, builds chain) │ │
│ └──────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Pattern 2: Webhook Integration
from flask import Flask, request, jsonify
app = Flask(__name__)
chains = {} # In production, use persistent storage
@app.route('/cap/event', methods=['POST'])
def log_event():
data = request.json
chain_id = data.get('chain_id') or str(uuid.uuid4())
if chain_id not in chains:
chains[chain_id] = CAPChain(chain_id)
event = CAPEvent(
event_type=EventType(data['event_type']),
asset_id=data.get('asset_id'),
asset_hash=data.get('asset_hash'),
user_id=data.get('user_id'),
# ... other fields
)
chains[chain_id].append(event)
return jsonify({
"event_id": event.event_id,
"chain_id": chain_id,
"chain_length": len(chains[chain_id].events)
})
@app.route('/cap/verify/<chain_id>', methods=['GET'])
def verify_chain(chain_id):
if chain_id not in chains:
return jsonify({"error": "Chain not found"}), 404
chain = chains[chain_id]
return jsonify({
"chain_id": chain_id,
"valid": chain.verify(),
"length": len(chain.events),
"head_hash": chain.current_hash
})
Pattern 3: ComfyUI / Stable Diffusion Integration
# comfyui_cap_node.py
class CAPLoggerNode:
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"image": ("IMAGE",),
"event_type": (["INGEST", "GEN", "EXPORT"],),
"asset_id": ("STRING", {"default": ""}),
},
"optional": {
"rights_basis": (["OWNED", "LICENSED", "UNKNOWN"],),
}
}
RETURN_TYPES = ("IMAGE", "STRING")
RETURN_NAMES = ("image", "event_id")
FUNCTION = "log_event"
CATEGORY = "CAP/Logging"
def log_event(self, image, event_type, asset_id, rights_basis="UNKNOWN"):
# Convert image to hash
import torch
image_bytes = image.cpu().numpy().tobytes()
asset_hash = hashlib.sha256(image_bytes).hexdigest()
# Log to CAP chain (via API or direct)
event = CAPEvent(
event_type=EventType(event_type),
asset_id=asset_id or f"comfy-{uuid.uuid4().hex[:8]}",
asset_hash=asset_hash,
rights_basis=RightsBasis(rights_basis)
)
# ... append to chain
return (image, event.event_id)
A Note on Similarity Scores
You might wonder: "Can CAP detect if output is similar to copyrighted work?"
No, and that's intentional.
CAP does not define similarity thresholds for legality or compliance. Similarity metrics MAY be used as investigative signals, but MUST NOT be treated as determinative evidence without provenance records.
Why? Because:
- High similarity ≠ infringement (independent creation exists)
- Low similarity ≠ clean (style transfer can obscure sources)
- Legal determinations require human judgment
CAP provides the evidence for those judgments. It doesn't make them.
What's Next?
CAP is part of the broader VAP (Verifiable AI Provenance) framework. The specification is open and available:
- CAP Specification: CAP Basic Specification v0.1
- CAP Homepage: veritaschain.org/vap/cap
- GitHub: github.com/veritaschain
- IETF Draft: draft-kamimura-scitt-vcp
The core principle:
"Verify, Don't Trust" — Every AI decision should leave a cryptographically verifiable trail.
TL;DR
- AI content needs audit trails — Legal claims are coming, regulations are here
- CAP tracks 4 events: INGEST → TRAIN → GEN → EXPORT
- Hash chains make tampering detectable
- Negative proof is the killer feature — Prove what you didn't use
- It's a sidecar — No pipeline modifications required
Drop the code above into your pipeline. Start logging. Future-you will thank present-you when that legal email arrives.
Have questions or want to contribute? Find us on GitHub or reach out at developers@veritaschain.org
Top comments (0)