DEV Community

Cover image for Why Content Provenance Needs Deletion Detection: Introducing CPP v1.0

Why Content Provenance Needs Deletion Detection: Introducing CPP v1.0

The Problem Nobody's Talking About

You've probably heard about C2PA (Coalition for Content Provenance and Authenticity). Adobe, Microsoft, BBC, and 300+ organizations are backing it. It's becoming an industry standard.

But there's a problem.

C2PA cannot detect deleted events.

Think about that for a moment. An attacker captures 10 photos, deletes 3 compromising ones, and publishes the remaining 7 with valid C2PA signatures. The verification passes. The audit trail looks complete.

It's not.

This isn't a bug—it's a fundamental architectural limitation. And it's not the only one.


The Self-Attestation Trap

Here's how C2PA works:

Creator signs content → "Trust me" → Verifier accepts
Enter fullscreen mode Exit fullscreen mode

The creator is signing their own claims. There's no independent third-party verification required. A $289/year certificate from any CA, combined with malicious intent, produces "verified" misinformation that looks completely legitimate.

The verification checkmark tells users the signature is cryptographically valid. Users interpret this as "the content is true."

It's not the same thing.

We've essentially created a system for producing cryptographically valid lies.


Introducing CPP: A Different Approach

Today, VeritasChain Standards Organization releases CPP v1.0 (Capture Provenance Profile)—an open specification that addresses these fundamental gaps.

CPP doesn't replace C2PA. It complements it.

Question C2PA CPP
"How was this edited?" ✅ Yes ❌ Not the focus
"Was this actually captured?" ⚠️ Partial ✅ Yes
"Are any events missing?" ❌ No ✅ Yes
"Independent timestamp?" ⚠️ Optional ✅ Required

The philosophy shift is fundamental:

"Verify, Don't Trust"


The Completeness Invariant: Mathematical Deletion Detection

This is CPP's core innovation.

Every collection of capture events includes a Completeness Invariant—a mathematical property that makes deletion detectable.

How It Works

When you seal a collection of events, CPP computes:

hash_sum = H(E)  H(E)  H(E)  ...  H(Eₙ)
Enter fullscreen mode Exit fullscreen mode

Where:

  • H(Eᵢ) = SHA-256 hash of event i
  • = XOR operation

This hash_sum is stored in the SEAL event along with expected_count.

The Magic of XOR

XOR has beautiful properties for this use case:

  • Commutative: Order doesn't matter for the sum
  • Self-inverse: A ⊕ A = 0
  • Identity: A ⊕ 0 = A

Attack Scenario: Deletion

Original events: E1, E2, E3, E4
Stored hash_sum: H(E1) ⊕ H(E2) ⊕ H(E3) ⊕ H(E4)
Stored count: 4

Attacker deletes E3, re-links chain: E1 → E2 → E4

Verification:
  Computed hash_sum: H(E1) ⊕ H(E2) ⊕ H(E4)
  Stored hash_sum:   H(E1) ⊕ H(E2) ⊕ H(E3) ⊕ H(E4)

  Result: MISMATCH → COMPLETENESS VIOLATION DETECTED
Enter fullscreen mode Exit fullscreen mode

Even if the attacker perfectly re-links the hash chain, the Completeness Invariant catches the deletion.

Reference Implementation

import hashlib
import json

def compute_event_hash(event: dict) -> bytes:
    """Hash event without signature field."""
    event_copy = {k: v for k, v in event.items() if k != 'signature'}
    canonical = json.dumps(event_copy, sort_keys=True, separators=(',', ':'))
    return hashlib.sha256(canonical.encode()).digest()

def xor_bytes(a: bytes, b: bytes) -> bytes:
    """XOR two byte arrays."""
    return bytes(x ^ y for x, y in zip(a, b))

def compute_completeness_invariant(events: list) -> dict:
    """Compute Completeness Invariant for event list."""
    hash_sum = bytes(32)  # Start with zeros

    for event in events:
        event_hash = compute_event_hash(event)
        hash_sum = xor_bytes(hash_sum, event_hash)

    return {
        'expected_count': len(events),
        'hash_sum': f"sha256:{hash_sum.hex()}",
        'first_timestamp': min(e['timestamp'] for e in events),
        'last_timestamp': max(e['timestamp'] for e in events)
    }

def verify_completeness(events: list, stored_ci: dict) -> tuple:
    """Verify Completeness Invariant."""
    # Check count
    if len(events) != stored_ci['expected_count']:
        return False, f"Count mismatch: {len(events)} vs {stored_ci['expected_count']}"

    # Compute and compare hash_sum
    computed = compute_completeness_invariant(events)
    if computed['hash_sum'] != stored_ci['hash_sum']:
        return False, "Hash sum mismatch - events may be missing or modified"

    return True, "Completeness verified"
Enter fullscreen mode Exit fullscreen mode

External Timestamps: No More Self-Attestation

CPP requires RFC 3161 TSA (Time Stamp Authority) anchoring for Silver and Gold conformance levels.

Creator signs → TSA countersigns → INDEPENDENT VERIFICATION
Enter fullscreen mode Exit fullscreen mode

The timestamp comes from an external third party. The creator cannot forge or backdate it.

Free TSA Options

You don't need expensive certificates:

  • FreeTSA.org - Free, public TSA
  • DigiStamp - Free tier available
  • Self-hosted - Run your own TSA
import requests

def anchor_to_tsa(merkle_root: bytes, tsa_url="https://freetsa.org/tsr"):
    """Get RFC 3161 timestamp for Merkle root."""
    # Build timestamp request (simplified)
    ts_request = build_timestamp_request(merkle_root)

    response = requests.post(
        tsa_url,
        data=ts_request,
        headers={'Content-Type': 'application/timestamp-query'}
    )

    return response.content  # Timestamp token
Enter fullscreen mode Exit fullscreen mode

Privacy by Design

Unlike approaches that require identity disclosure, CPP is privacy-first:

Location: OFF by Default

{
  "location": null  // Default - no location collected
}
Enter fullscreen mode Exit fullscreen mode

Users must explicitly opt-in:

{
  "location": {
    "precision": "CITY",
    "latitude": 35.68,
    "longitude": 139.76,
    "consent": {
      "granted_at": "2026-01-18T10:00:00.000Z",
      "scope": "THIS_COLLECTION"
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Identity Modes

  • Anonymous: Device key only
  • Pseudonymous: User-controlled identifier
  • Identified: Verified identity (opt-in only)

Crypto-Shredding for GDPR

Need to delete content? CPP supports true deletion:

  1. Content encrypted with unique key
  2. DELETE event recorded in chain
  3. Encryption key destroyed
  4. Content mathematically unrecoverable
  5. Merkle proof retained (proves deletion happened)

The ACE Extension: Zero-Knowledge Biometric Attestation

For high-assurance scenarios, CPP includes the Attested Capture Extension (ACE).

The principle:

"We prove authentication was attempted. We store ZERO biometric data."

What ACE Records

✅ Recorded ❌ NOT Recorded
Auth method (Face ID, etc.) Facial geometry
Result (SUCCESS/FAILURE) Fingerprint data
Duration (45ms) Biometric templates
Device attestation Raw sensor data
{
  "auth_attempt": {
    "method": "FACE_ID",
    "result": "SUCCESS",
    "duration_ms": 45,
    "timestamp": "2026-01-18T10:29:59.955Z"
  },
  "device_attestation": {
    "type": "APPLE_DEVICE_ATTESTATION",
    "secure_enclave_verified": true
  },
  "privacy_declaration": "NO_BIOMETRIC_DATA_STORED"
}
Enter fullscreen mode Exit fullscreen mode

Three-Layer Architecture

CPP uses a three-layer integrity model:

┌─────────────────────────────────────────────────────────┐
│ Layer 3: External Verifiability                         │
│   RFC 3161 TSA / SCITT / Blockchain                     │
│   → Independent third-party timestamp                   │
├─────────────────────────────────────────────────────────┤
│ Layer 2: Collection Integrity                           │
│   RFC 6962 Merkle Tree + Completeness Invariant         │
│   → Deletion detection                                  │
├─────────────────────────────────────────────────────────┤
│ Layer 1: Event Integrity                                │
│   SHA-256 + Ed25519                                     │
│   → Individual event tamper-evidence                    │
└─────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Layer 1: Event Integrity

Each event is individually signed:

{
  "cpp_version": "1.0",
  "event_id": "01932f5a-7b8c-7def-8abc-123456789012",
  "event_type": "CPP_CAPTURE",
  "timestamp": "2026-01-18T10:30:00.000Z",
  "device_id": "urn:uuid:550e8400-e29b-41d4-a716-446655440000",
  "sequence_number": 1,
  "prev_hash": "sha256:0000000000000000000000000000000000000000000000000000000000000000",
  "payload": {
    "media_hash": "sha256:e3b0c44298fc1c149afbf4c8996fb924...",
    "media_type": "image/heic",
    "capture_device": {
      "manufacturer": "Apple",
      "model": "iPhone 16 Pro"
    }
  },
  "signature": {
    "algorithm": "Ed25519",
    "public_key": "base64:MCowBQYDK2VwAyEA...",
    "value": "base64:xyz789..."
  }
}
Enter fullscreen mode Exit fullscreen mode

Critical: NO EXCLUSION LISTS

Unlike some implementations, CPP signatures cover ALL fields. Any modification invalidates the signature.

Layer 2: Collection Integrity

Events are linked via hash chain and aggregated into an RFC 6962 Merkle tree:

Event[0].prev_hash = SHA-256("0" * 64)  // Genesis
Event[n].prev_hash = SHA-256(Event[n-1])

Merkle Tree:
        Root
       /    \
    H01      H23
   /   \    /   \
 H(E0) H(E1) H(E2) H(E3)
Enter fullscreen mode Exit fullscreen mode

Plus the Completeness Invariant we discussed.

Layer 3: External Verifiability

The Merkle root is anchored to RFC 3161 TSA:

{
  "external_anchor": {
    "type": "RFC3161",
    "tsa_url": "https://freetsa.org/tsr",
    "timestamp_token": "base64:...",
    "anchored_at": "2026-01-18T18:00:05.000Z"
  }
}
Enter fullscreen mode Exit fullscreen mode

Verification URL: Surviving Metadata Stripping

Here's a reality: 95%+ of images on social media have metadata stripped.

CPP assumes this is the default state, not an exception.

Every CPP-protected capture has a permanent verification URL:

https://verify.veritaschain.org/cpp/CPP-2026-ABC123XYZ
Enter fullscreen mode Exit fullscreen mode

Even if a platform strips all metadata:

  1. User uploads image to verification service
  2. Service computes perceptual hash (PHASH)
  3. PHASH matches against database
  4. Full verification pack retrieved
  5. Verification proceeds normally

Conformance Levels

CPP defines three tiers:

Level TSA Anchor ACE Target Audience
Bronze Optional Optional Hobbyists, personal use
Silver Daily minimum Optional Families, prosumers
Gold Per-capture Required Legal evidence, journalism

Choose the level appropriate for your use case.


UI Guidelines: "Provenance Available" Not "Verified"

This matters more than you think.

C2PA's use of "Verified" has created confusion. Users see a checkmark and assume the content is true.

CPP explicitly avoids this:

✅ Use ❌ Avoid
"Provenance Available" "Verified"
"Capture Recorded" "Authenticated"
ℹ️ Information icon ✓ Checkmark

Required disclosure:

"This provenance data shows when and how this media was captured. It does NOT verify that the content is true or that the source is trustworthy."


C2PA Interoperability

CPP exports to C2PA format:

{
  "c2pa.actions": [
    {
      "action": "c2pa.captured",
      "parameters": {
        "vso.cpp.version": "1.0",
        "vso.cpp.verification_url": "https://verify.veritaschain.org/cpp/...",
        "vso.cpp.merkle_root": "sha256:...",
        "vso.cpp.completeness_valid": true,
        "vso.cpp.external_anchor_type": "RFC3161"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Use both systems together:

  • Internal: Full CPP chain with deletion detection
  • External: C2PA manifest for broad compatibility

What CPP Does NOT Guarantee

Let's be clear about limitations:

Not Guaranteed Why
Content is true CPP proves capture timing, not factual accuracy
Source is trustworthy Identity binding only, not reputation
Scene wasn't staged CPP proves when, not authenticity of scene

Provenance is necessary but not sufficient for trust. It's one layer in a larger system.


Getting Started

1. Read the Specification

📄 CPP-Specification-v1.0.md

2. Explore the Schemas

git clone https://github.com/veritaschain/cpp-spec.git
cd cpp-spec/schemas/cpp
Enter fullscreen mode Exit fullscreen mode
  • event.json - Base event structure
  • capture-payload.json - Capture event payload
  • seal-payload.json - SEAL with Completeness Invariant
  • verification-pack.json - Complete verification bundle

3. Run the Reference Implementation

cd cpp-spec/tools
python completeness_invariant.py
Enter fullscreen mode Exit fullscreen mode

4. Check Test Vectors

cd cpp-spec/test-vectors/completeness
cat completeness-invariant-tests.json
Enter fullscreen mode Exit fullscreen mode

The Bigger Picture: VAP Framework

CPP is part of the VAP (Verifiable AI Provenance) Framework—a cross-domain approach to AI accountability.

Profile Domain Focus
VCP Finance & Trading Algorithmic trading audit trails
CAP Content / Creative AI content generation
CPP Consumer / Media Capture provenance
DVP Automotive Autonomous vehicle decisions
MAP Medical Clinical AI recommendations

Same principles. Domain-specific implementations.


Why This Matters

We're at an inflection point.

AI-generated content is becoming indistinguishable from captured content. Deepfakes are getting better. Trust in media is declining.

The response shouldn't be "trust us" badges on content. It should be verifiable evidence chains that anyone can independently check.

Provenance isn't about trust. It's about verification.

CPP is one step toward that future—an open standard, freely available, designed to complement existing work while addressing critical gaps.


Links

License: CC BY 4.0 (specification), Apache 2.0 (code)


What's Next?

We're looking for:

  • Implementers: Build CPP into your applications
  • Feedback: Open issues on GitHub
  • Research: Academic analysis welcome
  • Standards bodies: Let's align approaches

The specification is stable. The conversation is just beginning.


What do you think? Is mathematical deletion detection the missing piece in content authenticity? Drop a comment below.


Tokachi Kamimura

Founder & Technical Director

VeritasChain Standards Organization

LinkedIn | GitHub | standards@veritaschain.org

Top comments (0)