VeritasChain Standards Organization (VSO)

Posted on Jan 18

Why Content Provenance Needs Deletion Detection: Introducing CPP v1.0

#cryptography #security #opensource #webdev

The Problem Nobody's Talking About

You've probably heard about C2PA (Coalition for Content Provenance and Authenticity). Adobe, Microsoft, BBC, and 300+ organizations are backing it. It's becoming an industry standard.

But there's a problem.

C2PA cannot detect deleted events.

Think about that for a moment. An attacker captures 10 photos, deletes 3 compromising ones, and publishes the remaining 7 with valid C2PA signatures. The verification passes. The audit trail looks complete.

It's not.

This isn't a bug—it's a fundamental architectural limitation. And it's not the only one.

The Self-Attestation Trap

Here's how C2PA works:

Creator signs content → "Trust me" → Verifier accepts

The creator is signing their own claims. There's no independent third-party verification required. A $289/year certificate from any CA, combined with malicious intent, produces "verified" misinformation that looks completely legitimate.

The verification checkmark tells users the signature is cryptographically valid. Users interpret this as "the content is true."

It's not the same thing.

We've essentially created a system for producing cryptographically valid lies.

Introducing CPP: A Different Approach

Today, VeritasChain Standards Organization releases CPP v1.0 (Capture Provenance Profile)—an open specification that addresses these fundamental gaps.

CPP doesn't replace C2PA. It complements it.

Question	C2PA	CPP
"How was this edited?"	✅ Yes	❌ Not the focus
"Was this actually captured?"	⚠️ Partial	✅ Yes
"Are any events missing?"	❌ No	✅ Yes
"Independent timestamp?"	⚠️ Optional	✅ Required

The philosophy shift is fundamental:

"Verify, Don't Trust"

The Completeness Invariant: Mathematical Deletion Detection

This is CPP's core innovation.

Every collection of capture events includes a Completeness Invariant—a mathematical property that makes deletion detectable.

How It Works

When you seal a collection of events, CPP computes:

hash_sum = H(E₁) ⊕ H(E₂) ⊕ H(E₃) ⊕ ... ⊕ H(Eₙ)

Where:

H(Eᵢ) = SHA-256 hash of event i
⊕ = XOR operation

This hash_sum is stored in the SEAL event along with expected_count.

The Magic of XOR

XOR has beautiful properties for this use case:

Commutative: Order doesn't matter for the sum
Self-inverse: A ⊕ A = 0
Identity: A ⊕ 0 = A

Attack Scenario: Deletion

Original events: E1, E2, E3, E4
Stored hash_sum: H(E1) ⊕ H(E2) ⊕ H(E3) ⊕ H(E4)
Stored count: 4

Attacker deletes E3, re-links chain: E1 → E2 → E4

Verification:
  Computed hash_sum: H(E1) ⊕ H(E2) ⊕ H(E4)
  Stored hash_sum:   H(E1) ⊕ H(E2) ⊕ H(E3) ⊕ H(E4)

  Result: MISMATCH → COMPLETENESS VIOLATION DETECTED

Even if the attacker perfectly re-links the hash chain, the Completeness Invariant catches the deletion.

Reference Implementation

import hashlib
import json

def compute_event_hash(event: dict) -> bytes:
    """Hash event without signature field."""
    event_copy = {k: v for k, v in event.items() if k != 'signature'}
    canonical = json.dumps(event_copy, sort_keys=True, separators=(',', ':'))
    return hashlib.sha256(canonical.encode()).digest()

def xor_bytes(a: bytes, b: bytes) -> bytes:
    """XOR two byte arrays."""
    return bytes(x ^ y for x, y in zip(a, b))

def compute_completeness_invariant(events: list) -> dict:
    """Compute Completeness Invariant for event list."""
    hash_sum = bytes(32)  # Start with zeros

    for event in events:
        event_hash = compute_event_hash(event)
        hash_sum = xor_bytes(hash_sum, event_hash)

    return {
        'expected_count': len(events),
        'hash_sum': f"sha256:{hash_sum.hex()}",
        'first_timestamp': min(e['timestamp'] for e in events),
        'last_timestamp': max(e['timestamp'] for e in events)
    }

def verify_completeness(events: list, stored_ci: dict) -> tuple:
    """Verify Completeness Invariant."""
    # Check count
    if len(events) != stored_ci['expected_count']:
        return False, f"Count mismatch: {len(events)} vs {stored_ci['expected_count']}"

    # Compute and compare hash_sum
    computed = compute_completeness_invariant(events)
    if computed['hash_sum'] != stored_ci['hash_sum']:
        return False, "Hash sum mismatch - events may be missing or modified"

    return True, "Completeness verified"

External Timestamps: No More Self-Attestation

CPP requires RFC 3161 TSA (Time Stamp Authority) anchoring for Silver and Gold conformance levels.

Creator signs → TSA countersigns → INDEPENDENT VERIFICATION

The timestamp comes from an external third party. The creator cannot forge or backdate it.

Free TSA Options

You don't need expensive certificates:

FreeTSA.org - Free, public TSA
DigiStamp - Free tier available
Self-hosted - Run your own TSA

import requests

def anchor_to_tsa(merkle_root: bytes, tsa_url="https://freetsa.org/tsr"):
    """Get RFC 3161 timestamp for Merkle root."""
    # Build timestamp request (simplified)
    ts_request = build_timestamp_request(merkle_root)

    response = requests.post(
        tsa_url,
        data=ts_request,
        headers={'Content-Type': 'application/timestamp-query'}
    )

    return response.content  # Timestamp token

Privacy by Design

Unlike approaches that require identity disclosure, CPP is privacy-first:

Location: OFF by Default

{
  "location": null  // Default - no location collected
}

Users must explicitly opt-in:

{
  "location": {
    "precision": "CITY",
    "latitude": 35.68,
    "longitude": 139.76,
    "consent": {
      "granted_at": "2026-01-18T10:00:00.000Z",
      "scope": "THIS_COLLECTION"
    }
  }
}

Identity Modes

Anonymous: Device key only
Pseudonymous: User-controlled identifier
Identified: Verified identity (opt-in only)

Crypto-Shredding for GDPR

Need to delete content? CPP supports true deletion:

Content encrypted with unique key
DELETE event recorded in chain
Encryption key destroyed
Content mathematically unrecoverable
Merkle proof retained (proves deletion happened)

The ACE Extension: Zero-Knowledge Biometric Attestation

For high-assurance scenarios, CPP includes the Attested Capture Extension (ACE).

The principle:

"We prove authentication was attempted. We store ZERO biometric data."

What ACE Records

✅ Recorded	❌ NOT Recorded
Auth method (Face ID, etc.)	Facial geometry
Result (SUCCESS/FAILURE)	Fingerprint data
Duration (45ms)	Biometric templates
Device attestation	Raw sensor data

{
  "auth_attempt": {
    "method": "FACE_ID",
    "result": "SUCCESS",
    "duration_ms": 45,
    "timestamp": "2026-01-18T10:29:59.955Z"
  },
  "device_attestation": {
    "type": "APPLE_DEVICE_ATTESTATION",
    "secure_enclave_verified": true
  },
  "privacy_declaration": "NO_BIOMETRIC_DATA_STORED"
}

Three-Layer Architecture

CPP uses a three-layer integrity model:

┌─────────────────────────────────────────────────────────┐
│ Layer 3: External Verifiability                         │
│   RFC 3161 TSA / SCITT / Blockchain                     │
│   → Independent third-party timestamp                   │
├─────────────────────────────────────────────────────────┤
│ Layer 2: Collection Integrity                           │
│   RFC 6962 Merkle Tree + Completeness Invariant         │
│   → Deletion detection                                  │
├─────────────────────────────────────────────────────────┤
│ Layer 1: Event Integrity                                │
│   SHA-256 + Ed25519                                     │
│   → Individual event tamper-evidence                    │
└─────────────────────────────────────────────────────────┘

Layer 1: Event Integrity

Each event is individually signed:

{
  "cpp_version": "1.0",
  "event_id": "01932f5a-7b8c-7def-8abc-123456789012",
  "event_type": "CPP_CAPTURE",
  "timestamp": "2026-01-18T10:30:00.000Z",
  "device_id": "urn:uuid:550e8400-e29b-41d4-a716-446655440000",
  "sequence_number": 1,
  "prev_hash": "sha256:0000000000000000000000000000000000000000000000000000000000000000",
  "payload": {
    "media_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb924...",
    "media_type": "image/heic",
    "capture_device": {
      "manufacturer": "Apple",
      "model": "iPhone 16 Pro"
    }
  },
  "signature": {
    "algorithm": "Ed25519",
    "public_key": "base64:MCowBQYDK2VwAyEA...",
    "value": "base64:xyz789..."
  }
}

Critical: NO EXCLUSION LISTS

Unlike some implementations, CPP signatures cover ALL fields. Any modification invalidates the signature.

Layer 2: Collection Integrity

Events are linked via hash chain and aggregated into an RFC 6962 Merkle tree:

Event[0].prev_hash = SHA-256("0" * 64)  // Genesis
Event[n].prev_hash = SHA-256(Event[n-1])

Merkle Tree:
        Root
       /    \
    H01      H23
   /   \    /   \
 H(E0) H(E1) H(E2) H(E3)

Plus the Completeness Invariant we discussed.

Layer 3: External Verifiability

The Merkle root is anchored to RFC 3161 TSA:

{
  "external_anchor": {
    "type": "RFC3161",
    "tsa_url": "https://freetsa.org/tsr",
    "timestamp_token": "base64:...",
    "anchored_at": "2026-01-18T18:00:05.000Z"
  }
}

Verification URL: Surviving Metadata Stripping

Here's a reality: 95%+ of images on social media have metadata stripped.

CPP assumes this is the default state, not an exception.

Every CPP-protected capture has a permanent verification URL:

https://verify.veritaschain.org/cpp/CPP-2026-ABC123XYZ

Even if a platform strips all metadata:

User uploads image to verification service
Service computes perceptual hash (PHASH)
PHASH matches against database
Full verification pack retrieved
Verification proceeds normally

Conformance Levels

CPP defines three tiers:

Level	TSA Anchor	ACE	Target Audience
Bronze	Optional	Optional	Hobbyists, personal use
Silver	Daily minimum	Optional	Families, prosumers
Gold	Per-capture	Required	Legal evidence, journalism

Choose the level appropriate for your use case.

UI Guidelines: "Provenance Available" Not "Verified"

This matters more than you think.

C2PA's use of "Verified" has created confusion. Users see a checkmark and assume the content is true.

CPP explicitly avoids this:

✅ Use	❌ Avoid
"Provenance Available"	"Verified"
"Capture Recorded"	"Authenticated"
ℹ️ Information icon	✓ Checkmark

Required disclosure:

"This provenance data shows when and how this media was captured. It does NOT verify that the content is true or that the source is trustworthy."

C2PA Interoperability

CPP exports to C2PA format:

{
  "c2pa.actions": [
    {
      "action": "c2pa.captured",
      "parameters": {
        "vso.cpp.version": "1.0",
        "vso.cpp.verification_url": "https://verify.veritaschain.org/cpp/...",
        "vso.cpp.merkle_root": "sha256:...",
        "vso.cpp.completeness_valid": true,
        "vso.cpp.external_anchor_type": "RFC3161"
      }
    }
  ]
}

Use both systems together:

Internal: Full CPP chain with deletion detection
External: C2PA manifest for broad compatibility

What CPP Does NOT Guarantee

Let's be clear about limitations:

Not Guaranteed	Why
Content is true	CPP proves capture timing, not factual accuracy
Source is trustworthy	Identity binding only, not reputation
Scene wasn't staged	CPP proves when, not authenticity of scene

Provenance is necessary but not sufficient for trust. It's one layer in a larger system.

Getting Started

1. Read the Specification

📄 CPP-Specification-v1.0.md

2. Explore the Schemas

git clone https://github.com/veritaschain/cpp-spec.git
cd cpp-spec/schemas/cpp

event.json - Base event structure
capture-payload.json - Capture event payload
seal-payload.json - SEAL with Completeness Invariant
verification-pack.json - Complete verification bundle

3. Run the Reference Implementation

cd cpp-spec/tools
python completeness_invariant.py

4. Check Test Vectors

cd cpp-spec/test-vectors/completeness
cat completeness-invariant-tests.json

The Bigger Picture: VAP Framework

CPP is part of the VAP (Verifiable AI Provenance) Framework—a cross-domain approach to AI accountability.

Profile	Domain	Focus
VCP	Finance & Trading	Algorithmic trading audit trails
CAP	Content / Creative	AI content generation
CPP	Consumer / Media	Capture provenance
DVP	Automotive	Autonomous vehicle decisions
MAP	Medical	Clinical AI recommendations

Same principles. Domain-specific implementations.

Why This Matters

We're at an inflection point.

AI-generated content is becoming indistinguishable from captured content. Deepfakes are getting better. Trust in media is declining.

The response shouldn't be "trust us" badges on content. It should be verifiable evidence chains that anyone can independently check.

Provenance isn't about trust. It's about verification.

CPP is one step toward that future—an open standard, freely available, designed to complement existing work while addressing critical gaps.

What's Next?

We're looking for:

Implementers: Build CPP into your applications
Feedback: Open issues on GitHub
Research: Academic analysis welcome
Standards bodies: Let's align approaches

The specification is stable. The conversation is just beginning.

What do you think? Is mathematical deletion detection the missing piece in content authenticity? Drop a comment below.

Tokachi Kamimura

Founder & Technical Director

VeritasChain Standards Organization

LinkedIn | GitHub | standards@veritaschain.org

DEV Community