TL;DR: We built a system that uses LiDAR depth analysis, moiré pattern detection, rolling shutter flicker analysis, and IMU-based human presence verification to detect when someone photographs a screen instead of a real scene — and we bind the result cryptographically with RFC 3161 timestamps. This is how we did it, why it matters for digital evidence, and the cross-platform challenges we solved along the way.
The Analog Hole Problem Nobody Talks About
Every content provenance system — C2PA, Content Credentials, you name it — has a fundamental vulnerability that's rarely discussed in technical circles: the analog hole.
Here's the attack: take a manipulated image, display it on a high-resolution monitor, then photograph that monitor with a "trusted" camera app. The resulting photo carries valid provenance credentials — cryptographic signatures, timestamps, the works — despite containing synthetic or manipulated content. The camera faithfully records what it sees, and what it sees is a screen displaying a lie.
┌──────────────────────────────────────────────────┐
│ Deepfake / Manipulated Image │
│ ↓ │
│ Display on 4K Monitor │
│ ↓ │
│ Photograph with "Trusted" Camera App │
│ ↓ │
│ ✅ Valid C2PA signature │
│ ✅ Valid RFC 3161 timestamp │
│ ✅ Valid GPS coordinates │
│ ❌ Content is NOT a real-world scene │
└──────────────────────────────────────────────────┘
This isn't a theoretical concern. In January 2025, researchers demonstrated that Nikon's C2PA implementation could be tricked into signing fake images with valid certificates — forcing Nikon to revoke all C2PA certificates and pause their authentication service. The attack didn't even require screen photography; but the analog hole makes it trivially easy.
We needed a way to detect this at capture time, not after the fact. And we needed it to work on consumer smartphones, not $3,000+ professional cameras.
This article walks through how we built that system for VeraSnap, our open-standard cryptographic evidence capture app.
Why Existing Approaches Fall Short
Before diving into our implementation, let's survey what's already out there — and why none of it solved our problem.
Sony Camera Authenticity Solution (PDAF-based)
Sony deploys 3D depth detection on their Alpha camera lineup (A1 II, A9 III, A7V). Their system uses Phase Detection AutoFocus (PDAF) pixel data from the imaging sensor to infer depth along a single optical axis. It works — but it requires cameras costing $2,500–$7,000, and the verification service is currently limited to select news organizations.
Key limitation: PDAF is a passive sensing technology. It infers depth from how light falls on split photodiodes during autofocus. It doesn't actively measure distance.
Serelay (Autofocus Focal Length Mapping)
Serelay's patented approach (US11012613B2) samples focal lengths at approximately 9 discrete points using the smartphone's standard autofocus mechanism. An SVM classifier distinguishes flat surfaces from 3D scenes based on whether all focus points converge to the same focal length.
Key limitation: 9 data points. That's it. And autofocus degrades significantly in low light. The patent is US-only, which is interesting for freedom-to-operate analysis.
Truepic (Software-based Image Analysis)
Truepic includes "picture of a picture detection" among their 35+ fraud tests, likely using moiré pattern detection, color distortion analysis, and edge artifact classification. No depth sensors involved.
Key limitation: Pure software analysis is inherently a cat-and-mouse game. As displays improve (higher PPI, better color accuracy, wider viewing angles), software-only detection gets harder.
The Gap
Nobody had built a system that:
- Uses dedicated depth sensors (not autofocus proxies)
- Runs on consumer smartphones (not professional cameras)
- Combines multiple detection modalities (not just one signal)
- Binds results cryptographically to an open standard
- Works cross-platform (iOS and Android)
That's what we set out to build.
Architecture Overview: Defense in Depth
Our screen detection system follows a defense-in-depth philosophy. No single technique is foolproof, so we layer multiple independent detection methods and fuse their results.
┌─────────────────────────────────────────────────────────┐
│ VeraSnap Capture Pipeline │
│ │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌─────────┐ │
│ │ LiDAR │ │ Moiré │ │ Flicker │ │ IMU │ │
│ │ Depth │ │ Pattern │ │ Detect │ │ Tremor │ │
│ │ Analysis │ │ Analysis │ │ Analysis │ │ Analysis│ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └────┬────┘ │
│ │ │ │ │ │
│ ▼ ▼ ▼ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ Weighted Score Fusion Engine │ │
│ │ │ │
│ │ Score = w1×Depth + w2×Moiré + w3×Flicker │ │
│ │ + w4×(1 - TremorPresent) │ │
│ └──────────────────┬───────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────┐ │
│ │ CPP v1.5 Event (SHA-256 → RFC 3161 Timestamp) │ │
│ └──────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────┘
Each tier operates independently and has different strengths:
| Tier | Method | Accuracy | Hardware Needed | Lighting Dependency |
|---|---|---|---|---|
| 1 | LiDAR Depth Analysis | ~97% | LiDAR sensor | None (active IR) |
| 1 | Moiré Pattern (CNN) | 96–99% | Camera only | Moderate |
| 1 | Rolling Shutter Flicker | >95% | Camera only | Low |
| 2 | IMU Tremor Analysis | ~85% | Accelerometer | None |
| 2 | Ambient Light PWM | ~80% | Light sensor | N/A |
Let's dive deep into each one.
Tier 1: LiDAR Depth Uniformity Analysis
This is our flagship detection method — and the one that made VeraSnap the first consumer smartphone app to use dedicated LiDAR for screen detection in evidence capture.
The Core Insight
A real three-dimensional scene produces variable depth data across the frame. Objects exist at different distances — a person at 1.5m, a wall at 3m, furniture at 2m. A flat display produces uniform depth readings — every pixel on that 27" monitor is at essentially the same distance from the camera.
This is well-established in face biometric anti-spoofing (detecting printed photos held up to a face scanner), but nobody had applied it to general scene verification for evidentiary purposes.
iPhone LiDAR: The Hardware
iPhone Pro models (since iPhone 12 Pro) include a dedicated LiDAR scanner — a dToF (direct Time-of-Flight) sensor that emits infrared laser pulses and measures how long they take to bounce back. Key specs:
- Resolution: 256 × 192 = 49,152 depth points per frame
- Refresh rate: Up to 60 Hz
- Range: 0.2m to 5m
- Accuracy: ±1cm at close range
- Lighting: Works in complete darkness (active IR illumination)
Compare this to Serelay's 9 autofocus sample points. We have over 5,000× more data.
The Algorithm
Our screen detection algorithm computes four indicators from the LiDAR depth map, then combines them with weighted scoring:
def is_likely_screen(analysis: DepthAnalysis) -> tuple[bool, float]:
"""
Reference implementation — CPP v1.4 Depth Analysis Extension.
Implementations MAY use different algorithms as long as
the output format conforms to spec.
"""
stats = analysis.statistics
plane = analysis.plane_analysis
# Criterion 1: Low depth variance → flat surface
flatness_score = 1.0 - min(stats.std_deviation / 0.5, 1.0)
# Criterion 2: Dominant plane covers most of frame
plane_dominance = plane.dominant_plane_ratio
# Criterion 3: Narrow depth range
depth_uniformity = 1.0 - min(stats.depth_range / 2.0, 1.0)
# Criterion 4: Sharp rectangular edges in depth discontinuities
edge_sharpness = detect_rectangular_edges(analysis)
# Weighted combination
score = (
flatness_score * 0.30 +
plane_dominance * 0.25 +
depth_uniformity * 0.25 +
edge_sharpness * 0.20
)
is_screen = score > 0.70
confidence = abs(score - 0.50) * 2 # 0.0 at boundary, 1.0 at extremes
return is_screen, confidence
Calibration Data
We tested against real-world scenarios to establish thresholds:
| Scene Type | Typical StdDev | Typical PlaneRatio | Expected Verdict |
|---|---|---|---|
| Outdoor landscape | 5.0+ m | <0.20 | ✅ NOT screen |
| Indoor room | 1.0–3.0 m | 0.20–0.40 | ✅ NOT screen |
| Document on desk | 0.3–0.8 m | 0.30–0.50 | ✅ NOT screen |
| Person portrait | 0.5–1.5 m | 0.15–0.30 | ✅ NOT screen |
| Monitor display | <0.05 m | >0.85 | 🚩 LIKELY screen |
| Smartphone screen | <0.02 m | >0.90 | 🚩 LIKELY screen |
| Printed photo (flat) | 0.01–0.05 m | >0.80 | ⚠️ Possible false positive |
The printed photo case is the known limitation — we handle this by reporting confidence levels and recommending human review for high-stakes verification.
Reflectivity Analysis: The Secret Weapon
Beyond depth uniformity, LiDAR provides something no other method can: surface reflectivity data. LCD and OLED screens have characteristic infrared reflectivity patterns that differ from natural surfaces:
- Glass panels produce specular IR reflections at certain angles
- LCD polarizers interact distinctively with IR light
- OLED emitters show different IR return characteristics than printed surfaces
We detect ReflectivityAnomaly as a boolean indicator in our screen detection output. This alone doesn't trigger a screen classification, but combined with the other indicators, it significantly reduces false positives.
iOS Implementation (Swift)
import ARKit
class DepthAnalyzer {
func analyzeDepthFrame(_ depthMap: CVPixelBuffer) -> DepthAnalysis {
let width = CVPixelBufferGetWidth(depthMap) // 256
let height = CVPixelBufferGetHeight(depthMap) // 192
CVPixelBufferLockBaseAddress(depthMap, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(depthMap, .readOnly) }
let baseAddress = CVPixelBufferGetBaseAddress(depthMap)!
let floatBuffer = baseAddress.assumingMemoryBound(to: Float32.self)
// Collect valid depth values
var depths: [Float] = []
for i in 0..<(width * height) {
let value = floatBuffer[i]
if value.isFinite && value > 0.0 && value < 10.0 {
depths.append(value)
}
}
guard !depths.isEmpty else {
return DepthAnalysis(available: false, reason: .captureFailed)
}
// Statistics
let minDepth = depths.min()!
let maxDepth = depths.max()!
let mean = depths.reduce(0, +) / Float(depths.count)
let variance = depths.map { ($0 - mean) * ($0 - mean) }
.reduce(0, +) / Float(depths.count)
let stdDev = sqrt(variance)
let validRatio = Float(depths.count) / Float(width * height)
// Plane analysis via RANSAC
let planeResult = detectDominantPlane(
depths: depths,
width: width,
height: height
)
// Screen detection
let (isScreen, confidence) = computeScreenScore(
stdDev: stdDev,
depthRange: maxDepth - minDepth,
planeRatio: planeResult.ratio,
edgeSharpness: planeResult.edgeSharpness
)
return DepthAnalysis(
available: true,
sensorType: .lidar,
frameTimestamp: Date(),
resolution: Resolution(width: width, height: height),
statistics: Statistics(
minDepth: minDepth,
maxDepth: maxDepth,
meanDepth: mean,
stdDeviation: stdDev,
depthRange: maxDepth - minDepth,
validPixelRatio: validRatio
),
planeAnalysis: planeResult.analysis,
screenDetection: ScreenDetection(
isLikelyScreen: isScreen,
confidence: confidence,
indicators: /* ... */
),
analysisHash: computeSHA256(depthMap)
)
}
}
The Privacy-Preserving Twist
Here's a design decision we're proud of: the raw depth map is never stored. We compute statistics and the screen detection verdict at capture time, then hash the raw depth data (stored as AnalysisHash). The hash proves the computation was performed on real depth data without preserving any 3D reconstruction of the scene.
This matters for GDPR compliance — depth maps could theoretically contain biometric information (face geometry), and storing them would trigger Article 9 special category protections. By hashing and discarding, we avoid this entirely.
Tier 1: Moiré Pattern Detection (CNN-Based)
LiDAR is powerful, but it's only available on iPhone Pro models. We needed a technique that works on every device — including budget Android phones with no depth sensor at all.
The Physics of Moiré
When you photograph a screen, two regular grids interact: the camera's sensor pixel grid and the display's pixel grid. This interference creates characteristic moiré patterns — wavy, rainbow-like artifacts that don't exist in natural scenes.
Camera Sensor Grid (e.g., 12MP)
|||||||||||||||||||||||||||
||||||||||||||||||||||||||| ← Interference
|||||||||||||||||||||||||||
Display Pixel Grid (e.g., 401 PPI)
|| || || || || || || || ||
|| || || || || || || || ||
→ Moiré pattern artifacts
The academic literature is rich here. Garcia & de Queiroz (IEEE TIFS 2015) established the fundamental 2D DFT + Difference of Gaussians approach, achieving 92–97% accuracy on standard LCD screens.
Our Three-Tier Approach
We implement three detection methods, ranked by computational cost:
Tier 1 — Frequency Domain FFT (simplest, ~85% accuracy):
// Android (Kotlin) — Simplified
fun detectMoireFFT(image: Bitmap): Float {
val grayscale = toGrayscale(image)
val fft2d = computeFFT2D(grayscale)
val magnitude = computeMagnitudeSpectrum(fft2d)
// Look for periodic peaks at non-natural frequencies
// Screens produce regular grid interference patterns
val peaks = findPeriodicPeaks(magnitude,
minFrequency = 0.1, // Normalized
maxFrequency = 0.4
)
return peaks.sumOf { it.amplitude } / peaks.size
}
Tier 2 — MobileNetV2 Classifier (recommended, 96%+ accuracy):
// iOS (Swift) — Core ML inference
import CoreML
import Vision
func detectMoireCNN(image: CGImage) -> ScreenDetectionResult {
let model = try! MoireDetectorV2(configuration: .init())
let request = VNCoreMLRequest(model: try! VNCoreMLModel(for: model.model))
let handler = VNImageRequestHandler(cgImage: image)
try! handler.perform([request])
guard let result = request.results?.first as? VNClassificationObservation else {
return .unknown
}
return ScreenDetectionResult(
isScreen: result.identifier == "screen",
confidence: result.confidence,
modelVersion: "moiredet-v1.2-mobilenetv2"
)
}
| Component | iOS (Swift) | Android (Kotlin) |
|---|---|---|
| ML Runtime | Core ML (.mlmodel) | TensorFlow Lite (.tflite) |
| Model | MobileNetV2 + classifier head | MobileNetV2 + classifier head |
| Input | 224×224 center crop | 224×224 center crop |
| Inference Time | ~15ms on A15+ | ~20ms on Snapdragon 8 Gen 1+ |
| Model Size | ~8–12 MB | ~8–12 MB |
Tier 3 — Wavelet + CNN Cascade (highest accuracy, 99%+):
Wavelet decomposition into LH, HL, HH sub-bands followed by lightweight CNN analysis of each sub-band. Highest accuracy but ~3× the compute cost.
The High-PPI Challenge
Modern displays are making moiré detection harder. A 4K display at 458 PPI pushes moiré frequencies beyond the camera's Nyquist limit at distances beyond ~30cm. We handle this by:
- Using multi-scale analysis (checking at multiple resolution levels)
- Detecting sub-pixel rendering patterns (RGB stripe vs. PenTile diamond) via Gabor filter banks at 8 orientations
- Falling back to LiDAR depth when moiré is ambiguous
Training Data
Our model was trained on a diverse dataset covering:
- LCD, OLED, mini-LED, E-Ink displays
- 720p through 8K resolutions
- Various viewing angles (0°–60°) and distances (15cm–2m)
- Multiple ambient lighting conditions
- GAN-based de-moiréing attack samples for adversarial robustness
Tier 1: Rolling Shutter Flicker Detection
CMOS image sensors expose pixels sequentially — the top row is captured a few microseconds before the bottom row. This "rolling shutter" effect creates visible banding when photographing displays that flicker at their refresh rate.
How It Works
Display flickering at 60Hz:
████████████████ ← bright phase
░░░░░░░░░░░░░░░░ ← dark phase (backlight PWM)
████████████████ ← bright phase
Camera rolling shutter captures:
Row 0: ████████ (bright phase)
Row 100: ████████ (bright phase)
Row 200: ░░░░░░░░ (dark phase — banding!)
Row 300: ████████ (bright phase)
...
The algorithm is straightforward:
1. Extract a raw frame (before ISP processing if possible)
2. Compute row-wise brightness: B[row] = mean(Y_channel[row])
3. Apply FFT to B[] → frequency spectrum
4. Search for peaks at: 50, 60, 100, 120, 240 Hz
(adjusted for exposure time and regional power frequency)
5. If peak SNR > threshold → screen detected
Cross-Platform Implementation
| Component | iOS (Swift) | Android (Kotlin) |
|---|---|---|
| Frame Access | AVCaptureVideoDataOutput |
Camera2 API + ImageReader (YUV_420_888) |
| Analysis | Row-mean brightness → FFT | Row-mean brightness → FFT |
| Target Frequencies | 50Hz (JP East), 60Hz (JP West/US), 100/120/240Hz | Same |
False Positive Mitigation
Fluorescent lights also flicker at 100/120Hz. PWM LED dimming creates similar artifacts. Our solution: never use flicker detection alone. It's always fused with moiré and/or depth data in the ensemble decision.
Processing cost: ~5ms per frame. Negligible.
The Variable Refresh Rate Problem
Modern displays with ProMotion/LTPO technology dynamically switch between 1–120Hz. This makes flicker detection unreliable because the frequency changes between frames. Our detection accuracy drops from >95% to 70–85% on these displays. We compensate by:
- Analyzing multiple consecutive frames (looking for any consistent frequency)
- Weighting flicker lower in the fusion score when no clear peak is found
- Relying more heavily on moiré and depth for ambiguous cases
Tier 2: IMU-Based Human Presence Verification
This is a subtle but clever technique. When a human holds a phone, the accelerometer and gyroscope register characteristic micro-tremors — involuntary hand movements in the 4–12Hz band that are physiologically unavoidable. A phone mounted on a tripod or mechanical arm (which might be used in a sophisticated screen-capture attack) lacks these tremors.
The Signal
Human hand-holding characteristics:
- Tremor band: 4–12 Hz (bandpass filtered)
- PSD ratio: High energy in tremor band vs. total
- Zero-crossing: Characteristic rate
- Jerk profile: Follows "minimum-jerk" trajectories
(biological optimization principle)
Mechanical/tripod characteristics:
- Tremor band: Near-zero energy
- Motion profile: Step functions, not smooth curves
- Jerk profile: Discontinuous
Implementation
// iOS — Collect accelerometer data during capture
import CoreMotion
let motionManager = CMMotionManager()
motionManager.accelerometerUpdateInterval = 1.0 / 100.0 // 100Hz sampling
var samples: [CMAccelerometerData] = []
motionManager.startAccelerometerUpdates(to: .main) { data, _ in
guard let data = data else { return }
samples.append(data)
}
// After capture, analyze the 500ms window around shutter press
func analyzeHumanPresence(_ samples: [CMAccelerometerData]) -> Float {
let magnitudes = samples.map {
sqrt($0.acceleration.x² + $0.acceleration.y² + $0.acceleration.z²)
}
// Bandpass filter: 4–12 Hz
let filtered = bandpassFilter(magnitudes, low: 4.0, high: 12.0, fs: 100.0)
// Power Spectral Density in tremor band
let psd = computePSD(filtered)
let tremorEnergy = psd.filter { $0.frequency >= 4 && $0.frequency <= 12 }
.map { $0.power }
.reduce(0, +)
let totalEnergy = psd.map { $0.power }.reduce(0, +)
let tremorRatio = tremorEnergy / max(totalEnergy, 1e-10)
// High ratio → human; low ratio → tripod/mechanical
return tremorRatio
}
Detection accuracy: ~85%. This won't catch a sophisticated attacker who hand-holds their phone while photographing a screen, but it adds another independent signal to the fusion engine.
The Fusion Engine: Combining Everything
Individual detectors have known weaknesses. The power of our system comes from fusing independent signals:
# Weighted score combination
def compute_screen_score(
depth_result: Optional[DepthResult],
moire_score: float,
flicker_detected: bool,
tremor_present: bool
) -> float:
weights = {}
scores = {}
# LiDAR depth (if available)
if depth_result and depth_result.available:
weights['depth'] = 0.35
scores['depth'] = depth_result.flatness_score
# Moiré pattern
weights['moire'] = 0.30
scores['moire'] = moire_score
# Flicker
weights['flicker'] = 0.15
scores['flicker'] = 1.0 if flicker_detected else 0.0
# IMU tremor (inverted — no tremor suggests non-human)
weights['tremor'] = 0.10
scores['tremor'] = 0.0 if tremor_present else 1.0
# Normalize weights
total_weight = sum(weights.values())
combined = sum(
scores[k] * weights[k] / total_weight
for k in scores
)
return combined
When LiDAR is unavailable (non-Pro iPhones, most Android devices), the weights automatically redistribute across the remaining modalities. The system degrades gracefully rather than failing.
Expected Performance
| Configuration | Accuracy | False Positive Rate |
|---|---|---|
| LiDAR + Moiré + Flicker + IMU | >97% | <2% |
| Moiré + Flicker + IMU (no LiDAR) | >96% | <5% |
| Moiré only (minimum config) | ~96% | ~8% |
The CPP Integration: Cryptographically Binding Results
Screen detection results are meaningless if they can be tampered with after the fact. We bind them into the Content Provenance Protocol (CPP) event chain using the same cryptographic infrastructure as every other capture event.
The JSON Schema
{
"SensorData": {
"GPS": { "Latitude": 35.6762, "Longitude": 139.6503, "Accuracy": 5.0 },
"Accelerometer": [0.012, -0.003, 9.801],
"Compass": 180.5,
"DepthAnalysis": {
"Available": true,
"SensorType": "LiDAR",
"FrameTimestamp": "2026-02-14T10:30:00.123Z",
"Resolution": { "Width": 256, "Height": 192 },
"Statistics": {
"MinDepth": 0.45,
"MaxDepth": 3.82,
"MeanDepth": 1.23,
"StdDeviation": 0.87,
"DepthRange": 3.37,
"ValidPixelRatio": 0.92
},
"PlaneAnalysis": {
"DominantPlaneRatio": 0.15,
"DominantPlaneDistance": 1.05,
"PlaneCount": 3,
"LargestPlaneArea": 0.12
},
"ScreenDetection": {
"IsLikelyScreen": false,
"Confidence": 0.95,
"Indicators": {
"FlatnessScore": 0.12,
"DepthUniformity": 0.08,
"EdgeSharpness": 0.25,
"ReflectivityAnomaly": false
}
},
"AnalysisHash": "sha256:a1b2c3d4e5f6..."
}
}
}
The Cryptographic Chain
- Depth analysis runs during capture → JSON generated
- JSON is canonicalized (RFC 8785 JCS)
- SHA-256 hash computed over the entire event (including screen detection)
- Hash signed with device key (Secure Enclave on iOS, StrongBox on Android)
- Hash submitted to RFC 3161 TSA → timestamp token received
- Event inserted into hash chain (previous hash links to this event)
Event N-1 ──hash──→ Event N (with ScreenDetection) ──hash──→ Event N+1
│
├── SHA-256 hash
├── ES256 signature (Secure Enclave)
└── RFC 3161 timestamp token
The result: modifying the screen detection verdict after the fact would break the hash chain, invalidate the signature, and conflict with the RFC 3161 timestamp. Three independent cryptographic guarantees.
Cross-Platform Compatibility: The Hard Part
VeraSnap runs on both iOS and Android. Making screen detection work identically across platforms was one of the hardest engineering challenges in the project.
The Sensor Landscape
iOS is relatively homogeneous — LiDAR exists on Pro models, TrueDepth on all models (front camera). Android is a zoo:
| Sensor Type | Platforms | CPP SensorType |
|---|---|---|
| LiDAR (dToF) | iPhone Pro, iPad Pro | LiDAR |
| TrueDepth (structured light) | iPhone front camera | TrueDepth |
| ToF | Samsung Galaxy S20+ Ultra, Huawei P30 Pro, Sony Xperia | ToF |
| Structured Light | Google Pixel 4/4 XL | StructuredLight |
| Stereo (dual camera) | Many dual-camera Android devices | Stereo |
| None | Budget phones, older devices | Unavailable |
CPP v1.4's Platform-Agnostic Approach
The CPP specification defines sensor types abstractly, so iOS-generated proofs verify correctly on Android and vice versa:
// iPhone 15 Pro capture
{ "SensorType": "LiDAR", "Statistics": { "StdDeviation": 0.87 } }
// Samsung Galaxy S22 Ultra capture
{ "SensorType": "ToF", "Statistics": { "StdDeviation": 0.91 } }
// Budget Android capture
{ "SensorType": "Unavailable", "UnavailableReason": "SENSOR_NOT_AVAILABLE" }
The verifier doesn't need to know or care about platform-specific implementation details. It sees a standardized JSON schema and applies the same validation logic.
Android Camera2 API: Depth Data Access
// Android (Kotlin) — Accessing ToF/depth data via Camera2
class DepthCaptureSession(private val cameraManager: CameraManager) {
fun startDepthCapture(cameraId: String) {
val characteristics = cameraManager.getCameraCharacteristics(cameraId)
// Check for depth sensor capability
val capabilities = characteristics.get(
CameraCharacteristics.REQUEST_AVAILABLE_CAPABILITIES
)
val hasDepth = capabilities?.contains(
CameraCharacteristics.REQUEST_AVAILABLE_CAPABILITIES_DEPTH_OUTPUT
) == true
if (!hasDepth) {
// Fall back to software-only detection
return startSoftwareOnlyDetection()
}
// Configure depth ImageReader
val depthReader = ImageReader.newInstance(
DEPTH_WIDTH, DEPTH_HEIGHT,
ImageFormat.DEPTH16, // 16-bit depth in millimeters
2 // maxImages
)
depthReader.setOnImageAvailableListener({ reader ->
val image = reader.acquireLatestImage() ?: return@setOnImageAvailableListener
val depthMap = processDepth16(image)
val analysis = analyzeDepth(depthMap)
image.close()
}, backgroundHandler)
// Create capture session with both color and depth outputs
val surfaces = listOf(colorSurface, depthReader.surface)
cameraDevice.createCaptureSession(surfaces, sessionCallback, null)
}
private fun processDepth16(image: Image): FloatArray {
val plane = image.planes[0]
val buffer = plane.buffer.asShortBuffer()
val depths = FloatArray(buffer.remaining())
for (i in depths.indices) {
val raw = buffer.get(i)
// DEPTH16: upper 13 bits = depth in mm, lower 3 bits = confidence
val depthMm = (raw.toInt() and 0xFFF8) shr 3
val confidence = raw.toInt() and 0x07
depths[i] = if (confidence > 0 && depthMm > 0) {
depthMm.toFloat() / 1000.0f // Convert to meters
} else {
Float.NaN // Invalid
}
}
return depths
}
}
JSON Compatibility: The Silent Killer
A subtle but critical issue: JSON floating-point representation must be identical across platforms or hash verification breaks. We use RFC 8785 JSON Canonicalization Scheme (JCS) to ensure:
- Numbers use shortest representation (
1.23not1.230000) - Keys are sorted lexicographically
- No trailing commas, no comments
- UTF-8 encoding normalized
// iOS — JCS canonicalization
func canonicalize(_ json: Any) -> Data {
// RFC 8785: deterministic JSON serialization
let options: JSONSerialization.WritingOptions = [.sortedKeys]
let data = try! JSONSerialization.data(withJSONObject: json, options: options)
// Additional JCS normalization for numbers...
return jcsNormalize(data)
}
// Android — Must produce identical output
fun canonicalize(json: JSONObject): ByteArray {
// Same RFC 8785 implementation
val sorted = sortKeysRecursively(json)
return jcsSerialize(sorted).toByteArray(Charsets.UTF_8)
}
If the iOS implementation serializes 0.95 and Android serializes 0.9500000238418579 (due to Float32 vs Float64 differences), the SHA-256 hash won't match and cross-platform verification fails. We test this extensively in CI.
Known Limitations and Honest Assessment
We follow the CPP philosophy of "Provenance ≠ Truth". Screen detection is probabilistic, not deterministic. Here's what we're transparent about:
False Positives
Flat artwork and printed photos can trigger screen detection. A large flat painting or a glossy photograph on a table shows the same depth uniformity as a screen. We mitigate this with reflectivity analysis (screens have different IR characteristics than printed paper), but it's not perfect.
Recommendation: For high-stakes legal evidence, always include the Confidence value and note that human review is recommended when confidence is below 0.80.
Evasion Attacks
A sophisticated attacker could:
- Display content on a curved screen (defeating depth uniformity)
- Place objects at varying distances in front of the screen
- Use a projector onto an irregular surface
- Apply anti-moiré filters to the display
We don't claim screen detection is foolproof. It raises the bar significantly — from "trivially easy" to "requires specialized equipment and knowledge."
Device Coverage
LiDAR is only on iPhone Pro and iPad Pro. ToF sensors on Android are becoming rarer (Samsung removed them from the S23 line). Software-only detection (moiré + flicker) remains the realistic option for most devices. We handle this gracefully:
{
"DepthAnalysis": {
"Available": false,
"SensorType": "Unavailable",
"UnavailableReason": "SENSOR_NOT_AVAILABLE"
},
"screen_detection": {
"moire_analysis": {
"score": 0.05,
"model_version": "moiredet-v1.2-mobilenetv2",
"is_screen_capture": false,
"confidence": 0.95
},
"flicker_analysis": {
"detected": false
},
"combined_screen_score": 0.04
}
}
Why This Matters: The EU AI Act Connection
This isn't just a technical exercise. EU AI Act Article 50 mandates that AI-generated content be marked in a machine-readable format by August 2, 2026. The European Commission's draft Code of Practice explicitly calls for a multi-layered approach including:
- Cryptographic metadata (C2PA-style Content Credentials)
- Imperceptible watermarks (frequency-domain embedding)
- Fingerprinting/logging (fallback when metadata is stripped)
But none of these layers address the analog hole. You can watermark an AI-generated image all you want — once it's displayed on a screen and re-photographed, the watermark is destroyed and the new photo gets clean provenance credentials.
Screen detection closes this gap. It's the complement to AI content marking — one system says "this was AI-generated," the other says "this was captured from a real 3D scene."
The penalty for Article 50 non-compliance? Up to €15 million or 3% of global annual turnover. That's a powerful incentive for enterprises to adopt capture-time provenance verification.
Open Standard, Not Walled Garden
Everything described in this article is specified in the Content Provenance Protocol (CPP) v1.4–v1.5, published as an IETF Internet-Draft (draft-vso-cpp-core). The screen detection extension is fully documented with:
- JSON schema definitions
- Reference algorithm implementations
- Calibration data
- Verification procedures
The specification is open, the GitHub repos are public:
- CPP Spec: github.com/veritaschain/cpp-spec
- VAP Framework: github.com/veritaschain/vap-spec
We believe content provenance is infrastructure, not a competitive moat. The more implementations adopt CPP's screen detection schema, the more valuable the ecosystem becomes for everyone.
What's Next
We're actively working on:
- Android Key Attestation integration — proving the detection ran on genuine hardware, not an emulator
- zk-img protocol — zero-knowledge proofs that verify screen detection results without revealing the underlying depth data
- Adversarial training — continuously updating our moiré model with GAN-generated de-moiréing attacks
- C2PA conformance — mapping CPP screen detection results to C2PA assertion types for interoperability
If you're building content provenance tooling and want to integrate screen detection, the CPP spec is your starting point. PRs welcome.
Try It
VeraSnap is live on both platforms:
- iOS: App Store (LiDAR on Pro models, software detection on all)
- Android: Google Play (ToF where available, software detection on all)
Take a photo of your monitor. Then take a photo of your desk. Compare the DepthAnalysis in the proof JSON. The difference is dramatic.
VeraSnap is developed by VeritasChain Co., Ltd. The Content Provenance Protocol is licensed under Creative Commons Attribution 4.0 International (CC BY 4.0).
Top comments (0)