The Problem: When Cryptographic Proof Isn't Enough
Picture this scenario: An attacker generates a fake image using AI, displays it on a monitor, and photographs the screen with an evidence camera app. The result? A cryptographically signed "proof" that the fake image was legitimately captured. π±
This is called a screen capture attack, and it's a blind spot in traditional content provenance systems.
At VeraSnap, we've been building a cryptographic evidence camera that proves when and by what device media was captured. But proving what was capturedβwhether it's a real-world object or just a screenβrequires something more: depth sensing.
Sony's Approach: 3D Depth in Professional Cameras
Sony recently launched their Authenticity Camera Solution for news organizations. Their key innovation? Using 3D depth information captured simultaneously with the image to detect if the subject is a real object or a screen display.
The technology is available on professional cameras like the Ξ±1 II and Ξ±9 III, starting at $6,000+. Great for Reuters, not so great for the average person documenting a car accident for insurance. πΈ
Our goal: Bring this same capability to consumer smartphones using the LiDAR sensor in iPhone Pro models.
Why LiDAR Works for Screen Detection
Screens and real-world scenes have fundamentally different depth characteristics:
Real-world scenes:
- Multiple objects at varying distances
- Irregular depth patterns
- Wide depth range (meters)
Screen displays:
- Single flat plane
- Uniform depth across the surface
- Narrow depth range (centimeters)
- Sharp rectangular edges (the bezel)
Here's what the depth data looks like:
Real Scene (outdoor landscape) Screen (monitor display)
βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
β ββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββ
β ββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββ
β ββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββ
β ββββββββββββββββββββββββββ β β βββββββββββββββββββββββββββββ
βββββββββββββββββββββββββββββββ βββββββββββββββββββββββββββββββ
StdDev: 1.4m StdDev: 0.02m
Depth Range: 4.4m Depth Range: 0.06m
The Detection Algorithm
We use four weighted criteria to determine if a subject is likely a screen:
def is_likely_screen(analysis: DepthAnalysis) -> tuple[bool, float]:
"""
Determine if the subject is likely a digital screen.
Returns:
(is_screen: bool, confidence: float)
"""
stats = analysis.statistics
plane = analysis.plane_analysis
# Criterion 1: Low depth variance = flat surface
flatness_score = 1.0 - min(stats.std_deviation / 0.5, 1.0)
# Criterion 2: Dominant plane covers most of frame
plane_dominance = plane.dominant_plane_ratio
# Criterion 3: Narrow depth range
depth_uniformity = 1.0 - min(stats.depth_range / 2.0, 1.0)
# Criterion 4: Sharp rectangular edges (bezel)
edge_sharpness = detect_rectangular_edges(analysis.raw_depth)
# Weighted score
score = (
flatness_score * 0.30 +
plane_dominance * 0.25 +
depth_uniformity * 0.25 +
edge_sharpness * 0.20
)
is_screen = score > 0.70
confidence = abs(score - 0.50) * 2
return is_screen, confidence
The threshold of 0.70 was calibrated against real-world test scenarios.
Integrating with CPP (Content Provenance Protocol)
VeraSnap implements the Content Provenance Protocol (CPP), an open standard for cryptographic evidence capture. In CPP v1.4, we're adding depth analysis as an optional extension:
{
"SensorData": {
"GPS": { "...": "..." },
"Accelerometer": ["..."],
"DepthAnalysis": {
"Available": true,
"SensorType": "LiDAR",
"FrameTimestamp": "2026-01-29T10:30:00.123Z",
"Resolution": {
"Width": 256,
"Height": 192
},
"Statistics": {
"MinDepth": 0.45,
"MaxDepth": 3.82,
"MeanDepth": 1.23,
"StdDeviation": 0.87,
"DepthRange": 3.37,
"ValidPixelRatio": 0.92
},
"PlaneAnalysis": {
"DominantPlaneRatio": 0.15,
"PlaneCount": 3
},
"ScreenDetection": {
"IsLikelyScreen": false,
"Confidence": 0.95,
"Indicators": {
"FlatnessScore": 0.12,
"DepthUniformity": 0.08,
"EdgeSharpness": 0.25
}
},
"AnalysisHash": "sha256:abc123..."
}
}
}
Key design decisions:
- Optional extension β Works on LiDAR devices, gracefully degrades on others
- Statistics only β Raw depth map is NOT stored (privacy)
-
Hash proof β
AnalysisHashproves the computation without storing raw data - 100ms timing β Depth frame must be captured within 100ms of the photo
iOS Implementation Sketch
Here's how you'd capture depth data alongside a photo on iOS:
import ARKit
import AVFoundation
class DepthCaptureService {
private var arSession: ARSession?
func captureDepthFrame() async throws -> DepthAnalysisResult {
guard ARWorldTrackingConfiguration.supportsSceneReconstruction(.mesh) else {
return DepthAnalysisResult(
available: false,
unavailableReason: .sensorNotAvailable
)
}
// Get current ARFrame with depth
guard let frame = arSession?.currentFrame,
let depthMap = frame.sceneDepth?.depthMap else {
return DepthAnalysisResult(
available: false,
unavailableReason: .captureFailed
)
}
// Analyze depth statistics
let stats = analyzeDepthStatistics(depthMap)
let planeAnalysis = detectPlanes(depthMap)
let screenDetection = evaluateScreenLikelihood(stats, planeAnalysis)
// Hash raw depth for proof (don't store raw data)
let depthHash = hashDepthMap(depthMap)
return DepthAnalysisResult(
available: true,
sensorType: .lidar,
frameTimestamp: Date(),
resolution: Resolution(
width: CVPixelBufferGetWidth(depthMap),
height: CVPixelBufferGetHeight(depthMap)
),
statistics: stats,
planeAnalysis: planeAnalysis,
screenDetection: screenDetection,
analysisHash: depthHash
)
}
private func analyzeDepthStatistics(_ depthMap: CVPixelBuffer) -> DepthStatistics {
// Lock buffer and extract depth values
CVPixelBufferLockBaseAddress(depthMap, .readOnly)
defer { CVPixelBufferUnlockBaseAddress(depthMap, .readOnly) }
let width = CVPixelBufferGetWidth(depthMap)
let height = CVPixelBufferGetHeight(depthMap)
let baseAddress = CVPixelBufferGetBaseAddress(depthMap)!
let buffer = baseAddress.assumingMemoryBound(to: Float32.self)
var validDepths: [Float] = []
for y in 0..<height {
for x in 0..<width {
let depth = buffer[y * width + x]
if depth.isFinite && depth > 0 {
validDepths.append(depth)
}
}
}
let minDepth = validDepths.min() ?? 0
let maxDepth = validDepths.max() ?? 0
let meanDepth = validDepths.reduce(0, +) / Float(validDepths.count)
let variance = validDepths.map { pow($0 - meanDepth, 2) }.reduce(0, +) / Float(validDepths.count)
let stdDev = sqrt(variance)
return DepthStatistics(
minDepth: minDepth,
maxDepth: maxDepth,
meanDepth: meanDepth,
stdDeviation: stdDev,
depthRange: maxDepth - minDepth,
validPixelRatio: Float(validDepths.count) / Float(width * height)
)
}
}
Supported Devices
LiDAR sensor (full support):
- iPhone 12/13/14/15/16 Pro & Pro Max
- iPad Pro (2020+)
TrueDepth sensor (front camera only):
- iPhone X and later
No depth sensor:
- Non-Pro iPhones (rear camera)
- Graceful degradation:
DepthAnalysis.Available = false
Privacy-First Design
We took privacy seriously in this implementation:
- Raw depth maps are NEVER stored β Only statistical summaries
- No biometric extraction β No facial features, no 3D reconstruction
- Local processing only β Depth analysis runs entirely on-device
-
Hash-based proof β
AnalysisHashproves computation integrity
LiDAR Data β Statistical Analysis β Screen Detection
β β β
β βΌ βΌ
β Statistics Verdict
β β β
βΌ ββββββββββ¬ββββββββββββ
SHA-256 β
β βΌ
βΌ Recorded in
AnalysisHash CPP Event
(proof only)
β» Raw depth data is NEVER stored or transmitted
Limitations & False Positives
This isn't a silver bullet. False positives can occur with:
- Printed photos on flat surfaces
- Flat artwork and paintings
- Wall-mounted pictures
That's why we report confidence levels and recommend human review for high-stakes verification. The system provides additional evidence, not definitive proof.
What's Next
We're planning several enhancements:
- Texture analysis β Detect moirΓ© patterns from screen photography
- Temporal analysis β Track depth consistency across video frames
- Android support β ToF sensors on Samsung, Huawei, etc.
- ML models β Train on screen vs. real-world depth signatures
Try It Out
VeraSnap is available on the App Store. The depth analysis feature will be available in an upcoming release as a Pro feature.
Open source specs:
- CPP Specification β Content Provenance Protocol
- VAP Framework β Verifiable AI Provenance
Conclusion
As generative AI makes fake images increasingly convincing, the ability to prove that a photograph captures a real-world sceneβnot a screen displayβbecomes essential.
By leveraging LiDAR sensors already in millions of iPhones, we can bring professional-grade authenticity technology to everyone. No $6,000 camera required. π
Have questions about implementing depth-based screen detection? Drop a comment below or reach out on the CPP GitHub.
Built by VeritasChain Co., Ltd.
Top comments (0)