VeritasChain Standards Organization (VSO)

Posted on Jan 29

Building Screen Capture Detection with LiDAR: How We're Fighting Fake Evidence in VeraSnap

#ios #swift #lidar #contentauthenticity

The Problem: When Cryptographic Proof Isn't Enough

Picture this scenario: An attacker generates a fake image using AI, displays it on a monitor, and photographs the screen with an evidence camera app. The result? A cryptographically signed "proof" that the fake image was legitimately captured. 😱

This is called a screen capture attack, and it's a blind spot in traditional content provenance systems.

At VeraSnap, we've been building a cryptographic evidence camera that proves when and by what device media was captured. But proving what was captured—whether it's a real-world object or just a screen—requires something more: depth sensing.

Sony's Approach: 3D Depth in Professional Cameras

Sony recently launched their Authenticity Camera Solution for news organizations. Their key innovation? Using 3D depth information captured simultaneously with the image to detect if the subject is a real object or a screen display.

The technology is available on professional cameras like the α1 II and α9 III, starting at $6,000+. Great for Reuters, not so great for the average person documenting a car accident for insurance. 📸

Our goal: Bring this same capability to consumer smartphones using the LiDAR sensor in iPhone Pro models.

Why LiDAR Works for Screen Detection

Screens and real-world scenes have fundamentally different depth characteristics:

Real-world scenes:

Multiple objects at varying distances
Irregular depth patterns
Wide depth range (meters)

Screen displays:

Single flat plane
Uniform depth across the surface
Narrow depth range (centimeters)
Sharp rectangular edges (the bezel)

Here's what the depth data looks like:

Real Scene (outdoor landscape)     Screen (monitor display)
┌─────────────────────────────┐   ┌─────────────────────────────┐
│ ░░▓▓████░░▓▓░░████▓▓░░████ │   │ ████████████████████████████│
│ ░▓▓███▓▓░░▓▓████░░▓▓████░░ │   │ ████████████████████████████│
│ ▓▓███░░▓▓████░░▓▓████░░▓▓█ │   │ ████████████████████████████│
│ ███░░▓▓████░░▓▓████░░▓▓███ │   │ ████████████████████████████│
└─────────────────────────────┘   └─────────────────────────────┘
StdDev: 1.4m                       StdDev: 0.02m
Depth Range: 4.4m                  Depth Range: 0.06m

The Detection Algorithm

We use four weighted criteria to determine if a subject is likely a screen:

def is_likely_screen(analysis: DepthAnalysis) -> tuple[bool, float]:
    """
    Determine if the subject is likely a digital screen.

    Returns:
        (is_screen: bool, confidence: float)
    """
    stats = analysis.statistics
    plane = analysis.plane_analysis

    # Criterion 1: Low depth variance = flat surface
    flatness_score = 1.0 - min(stats.std_deviation / 0.5, 1.0)

    # Criterion 2: Dominant plane covers most of frame
    plane_dominance = plane.dominant_plane_ratio

    # Criterion 3: Narrow depth range
    depth_uniformity = 1.0 - min(stats.depth_range / 2.0, 1.0)

    # Criterion 4: Sharp rectangular edges (bezel)
    edge_sharpness = detect_rectangular_edges(analysis.raw_depth)

    # Weighted score
    score = (
        flatness_score * 0.30 +
        plane_dominance * 0.25 +
        depth_uniformity * 0.25 +
        edge_sharpness * 0.20
    )

    is_screen = score > 0.70
    confidence = abs(score - 0.50) * 2

    return is_screen, confidence

The threshold of 0.70 was calibrated against real-world test scenarios.

Integrating with CPP (Content Provenance Protocol)

VeraSnap implements the Content Provenance Protocol (CPP), an open standard for cryptographic evidence capture. In CPP v1.4, we're adding depth analysis as an optional extension:

{
  "SensorData": {
    "GPS": { "...": "..." },
    "Accelerometer": ["..."],
    "DepthAnalysis": {
      "Available": true,
      "SensorType": "LiDAR",
      "FrameTimestamp": "2026-01-29T10:30:00.123Z",
      "Resolution": {
        "Width": 256,
        "Height": 192
      },
      "Statistics": {
        "MinDepth": 0.45,
        "MaxDepth": 3.82,
        "MeanDepth": 1.23,
        "StdDeviation": 0.87,
        "DepthRange": 3.37,
        "ValidPixelRatio": 0.92
      },
      "PlaneAnalysis": {
        "DominantPlaneRatio": 0.15,
        "PlaneCount": 3
      },
      "ScreenDetection": {
        "IsLikelyScreen": false,
        "Confidence": 0.95,
        "Indicators": {
          "FlatnessScore": 0.12,
          "DepthUniformity": 0.08,
          "EdgeSharpness": 0.25
        }
      },
      "AnalysisHash": "sha256:abc123..."
    }
  }
}

Key design decisions:

Optional extension — Works on LiDAR devices, gracefully degrades on others
Statistics only — Raw depth map is NOT stored (privacy)
Hash proof — AnalysisHash proves the computation without storing raw data
100ms timing — Depth frame must be captured within 100ms of the photo

iOS Implementation Sketch

Here's how you'd capture depth data alongside a photo on iOS:

import ARKit
import AVFoundation

class DepthCaptureService {
    private var arSession: ARSession?

    func captureDepthFrame() async throws -> DepthAnalysisResult {
        guard ARWorldTrackingConfiguration.supportsSceneReconstruction(.mesh) else {
            return DepthAnalysisResult(
                available: false,
                unavailableReason: .sensorNotAvailable
            )
        }

        // Get current ARFrame with depth
        guard let frame = arSession?.currentFrame,
              let depthMap = frame.sceneDepth?.depthMap else {
            return DepthAnalysisResult(
                available: false,
                unavailableReason: .captureFailed
            )
        }

        // Analyze depth statistics
        let stats = analyzeDepthStatistics(depthMap)
        let planeAnalysis = detectPlanes(depthMap)
        let screenDetection = evaluateScreenLikelihood(stats, planeAnalysis)

        // Hash raw depth for proof (don't store raw data)
        let depthHash = hashDepthMap(depthMap)

        return DepthAnalysisResult(
            available: true,
            sensorType: .lidar,
            frameTimestamp: Date(),
            resolution: Resolution(
                width: CVPixelBufferGetWidth(depthMap),
                height: CVPixelBufferGetHeight(depthMap)
            ),
            statistics: stats,
            planeAnalysis: planeAnalysis,
            screenDetection: screenDetection,
            analysisHash: depthHash
        )
    }

    private func analyzeDepthStatistics(_ depthMap: CVPixelBuffer) -> DepthStatistics {
        // Lock buffer and extract depth values
        CVPixelBufferLockBaseAddress(depthMap, .readOnly)
        defer { CVPixelBufferUnlockBaseAddress(depthMap, .readOnly) }

        let width = CVPixelBufferGetWidth(depthMap)
        let height = CVPixelBufferGetHeight(depthMap)
        let baseAddress = CVPixelBufferGetBaseAddress(depthMap)!
        let buffer = baseAddress.assumingMemoryBound(to: Float32.self)

        var validDepths: [Float] = []

        for y in 0..<height {
            for x in 0..<width {
                let depth = buffer[y * width + x]
                if depth.isFinite && depth > 0 {
                    validDepths.append(depth)
                }
            }
        }

        let minDepth = validDepths.min() ?? 0
        let maxDepth = validDepths.max() ?? 0
        let meanDepth = validDepths.reduce(0, +) / Float(validDepths.count)
        let variance = validDepths.map { pow($0 - meanDepth, 2) }.reduce(0, +) / Float(validDepths.count)
        let stdDev = sqrt(variance)

        return DepthStatistics(
            minDepth: minDepth,
            maxDepth: maxDepth,
            meanDepth: meanDepth,
            stdDeviation: stdDev,
            depthRange: maxDepth - minDepth,
            validPixelRatio: Float(validDepths.count) / Float(width * height)
        )
    }
}

Supported Devices

LiDAR sensor (full support):

iPhone 12/13/14/15/16 Pro & Pro Max
iPad Pro (2020+)

TrueDepth sensor (front camera only):

iPhone X and later

No depth sensor:

Non-Pro iPhones (rear camera)
Graceful degradation: DepthAnalysis.Available = false

Privacy-First Design

We took privacy seriously in this implementation:

Raw depth maps are NEVER stored — Only statistical summaries
No biometric extraction — No facial features, no 3D reconstruction
Local processing only — Depth analysis runs entirely on-device
Hash-based proof — AnalysisHash proves computation integrity

LiDAR Data → Statistical Analysis → Screen Detection
     │              │                    │
     │              ▼                    ▼
     │         Statistics           Verdict
     │              │                    │
     ▼              └────────┬───────────┘
  SHA-256                    │
     │                       ▼
     ▼               Recorded in
  AnalysisHash         CPP Event
  (proof only)

  ※ Raw depth data is NEVER stored or transmitted

Limitations & False Positives

This isn't a silver bullet. False positives can occur with:

Printed photos on flat surfaces
Flat artwork and paintings
Wall-mounted pictures

That's why we report confidence levels and recommend human review for high-stakes verification. The system provides additional evidence, not definitive proof.

What's Next

We're planning several enhancements:

Texture analysis — Detect moiré patterns from screen photography
Temporal analysis — Track depth consistency across video frames
Android support — ToF sensors on Samsung, Huawei, etc.
ML models — Train on screen vs. real-world depth signatures

Try It Out

VeraSnap is available on the App Store. The depth analysis feature will be available in an upcoming release as a Pro feature.

Open source specs:

CPP Specification — Content Provenance Protocol
VAP Framework — Verifiable AI Provenance

Conclusion

As generative AI makes fake images increasingly convincing, the ability to prove that a photograph captures a real-world scene—not a screen display—becomes essential.

By leveraging LiDAR sensors already in millions of iPhones, we can bring professional-grade authenticity technology to everyone. No $6,000 camera required. 🚀

Have questions about implementing depth-based screen detection? Drop a comment below or reach out on the CPP GitHub.

Built by VeritasChain Co., Ltd.