DEV Community

Cover image for Building Screen Capture Detection with LiDAR: How We're Fighting Fake Evidence in VeraSnap

Building Screen Capture Detection with LiDAR: How We're Fighting Fake Evidence in VeraSnap

The Problem: When Cryptographic Proof Isn't Enough

Picture this scenario: An attacker generates a fake image using AI, displays it on a monitor, and photographs the screen with an evidence camera app. The result? A cryptographically signed "proof" that the fake image was legitimately captured. 😱

This is called a screen capture attack, and it's a blind spot in traditional content provenance systems.

At VeraSnap, we've been building a cryptographic evidence camera that proves when and by what device media was captured. But proving what was capturedβ€”whether it's a real-world object or just a screenβ€”requires something more: depth sensing.

Sony's Approach: 3D Depth in Professional Cameras

Sony recently launched their Authenticity Camera Solution for news organizations. Their key innovation? Using 3D depth information captured simultaneously with the image to detect if the subject is a real object or a screen display.

The technology is available on professional cameras like the Ξ±1 II and Ξ±9 III, starting at $6,000+. Great for Reuters, not so great for the average person documenting a car accident for insurance. πŸ“Έ

Our goal: Bring this same capability to consumer smartphones using the LiDAR sensor in iPhone Pro models.

Why LiDAR Works for Screen Detection

Screens and real-world scenes have fundamentally different depth characteristics:

Real-world scenes:

  • Multiple objects at varying distances
  • Irregular depth patterns
  • Wide depth range (meters)

Screen displays:

  • Single flat plane
  • Uniform depth across the surface
  • Narrow depth range (centimeters)
  • Sharp rectangular edges (the bezel)

Here's what the depth data looks like:

Real Scene (outdoor landscape)     Screen (monitor display)
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ β–‘β–‘β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–“β–“β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆβ–“β–“β–‘β–‘β–ˆβ–ˆβ–ˆβ–ˆ β”‚   β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚
β”‚ β–‘β–“β–“β–ˆβ–ˆβ–ˆβ–“β–“β–‘β–‘β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘ β”‚   β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚
β”‚ β–“β–“β–ˆβ–ˆβ–ˆβ–‘β–‘β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–“β–“β–ˆ β”‚   β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚
β”‚ β–ˆβ–ˆβ–ˆβ–‘β–‘β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–“β–“β–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–“β–“β–ˆβ–ˆβ–ˆ β”‚   β”‚ β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
StdDev: 1.4m                       StdDev: 0.02m
Depth Range: 4.4m                  Depth Range: 0.06m
Enter fullscreen mode Exit fullscreen mode

The Detection Algorithm

We use four weighted criteria to determine if a subject is likely a screen:

def is_likely_screen(analysis: DepthAnalysis) -> tuple[bool, float]:
    """
    Determine if the subject is likely a digital screen.

    Returns:
        (is_screen: bool, confidence: float)
    """
    stats = analysis.statistics
    plane = analysis.plane_analysis

    # Criterion 1: Low depth variance = flat surface
    flatness_score = 1.0 - min(stats.std_deviation / 0.5, 1.0)

    # Criterion 2: Dominant plane covers most of frame
    plane_dominance = plane.dominant_plane_ratio

    # Criterion 3: Narrow depth range
    depth_uniformity = 1.0 - min(stats.depth_range / 2.0, 1.0)

    # Criterion 4: Sharp rectangular edges (bezel)
    edge_sharpness = detect_rectangular_edges(analysis.raw_depth)

    # Weighted score
    score = (
        flatness_score * 0.30 +
        plane_dominance * 0.25 +
        depth_uniformity * 0.25 +
        edge_sharpness * 0.20
    )

    is_screen = score > 0.70
    confidence = abs(score - 0.50) * 2

    return is_screen, confidence
Enter fullscreen mode Exit fullscreen mode

The threshold of 0.70 was calibrated against real-world test scenarios.

Integrating with CPP (Content Provenance Protocol)

VeraSnap implements the Content Provenance Protocol (CPP), an open standard for cryptographic evidence capture. In CPP v1.4, we're adding depth analysis as an optional extension:

{
  "SensorData": {
    "GPS": { "...": "..." },
    "Accelerometer": ["..."],
    "DepthAnalysis": {
      "Available": true,
      "SensorType": "LiDAR",
      "FrameTimestamp": "2026-01-29T10:30:00.123Z",
      "Resolution": {
        "Width": 256,
        "Height": 192
      },
      "Statistics": {
        "MinDepth": 0.45,
        "MaxDepth": 3.82,
        "MeanDepth": 1.23,
        "StdDeviation": 0.87,
        "DepthRange": 3.37,
        "ValidPixelRatio": 0.92
      },
      "PlaneAnalysis": {
        "DominantPlaneRatio": 0.15,
        "PlaneCount": 3
      },
      "ScreenDetection": {
        "IsLikelyScreen": false,
        "Confidence": 0.95,
        "Indicators": {
          "FlatnessScore": 0.12,
          "DepthUniformity": 0.08,
          "EdgeSharpness": 0.25
        }
      },
      "AnalysisHash": "sha256:abc123..."
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Key design decisions:

  • Optional extension β€” Works on LiDAR devices, gracefully degrades on others
  • Statistics only β€” Raw depth map is NOT stored (privacy)
  • Hash proof β€” AnalysisHash proves the computation without storing raw data
  • 100ms timing β€” Depth frame must be captured within 100ms of the photo

iOS Implementation Sketch

Here's how you'd capture depth data alongside a photo on iOS:

import ARKit
import AVFoundation

class DepthCaptureService {
    private var arSession: ARSession?

    func captureDepthFrame() async throws -> DepthAnalysisResult {
        guard ARWorldTrackingConfiguration.supportsSceneReconstruction(.mesh) else {
            return DepthAnalysisResult(
                available: false,
                unavailableReason: .sensorNotAvailable
            )
        }

        // Get current ARFrame with depth
        guard let frame = arSession?.currentFrame,
              let depthMap = frame.sceneDepth?.depthMap else {
            return DepthAnalysisResult(
                available: false,
                unavailableReason: .captureFailed
            )
        }

        // Analyze depth statistics
        let stats = analyzeDepthStatistics(depthMap)
        let planeAnalysis = detectPlanes(depthMap)
        let screenDetection = evaluateScreenLikelihood(stats, planeAnalysis)

        // Hash raw depth for proof (don't store raw data)
        let depthHash = hashDepthMap(depthMap)

        return DepthAnalysisResult(
            available: true,
            sensorType: .lidar,
            frameTimestamp: Date(),
            resolution: Resolution(
                width: CVPixelBufferGetWidth(depthMap),
                height: CVPixelBufferGetHeight(depthMap)
            ),
            statistics: stats,
            planeAnalysis: planeAnalysis,
            screenDetection: screenDetection,
            analysisHash: depthHash
        )
    }

    private func analyzeDepthStatistics(_ depthMap: CVPixelBuffer) -> DepthStatistics {
        // Lock buffer and extract depth values
        CVPixelBufferLockBaseAddress(depthMap, .readOnly)
        defer { CVPixelBufferUnlockBaseAddress(depthMap, .readOnly) }

        let width = CVPixelBufferGetWidth(depthMap)
        let height = CVPixelBufferGetHeight(depthMap)
        let baseAddress = CVPixelBufferGetBaseAddress(depthMap)!
        let buffer = baseAddress.assumingMemoryBound(to: Float32.self)

        var validDepths: [Float] = []

        for y in 0..<height {
            for x in 0..<width {
                let depth = buffer[y * width + x]
                if depth.isFinite && depth > 0 {
                    validDepths.append(depth)
                }
            }
        }

        let minDepth = validDepths.min() ?? 0
        let maxDepth = validDepths.max() ?? 0
        let meanDepth = validDepths.reduce(0, +) / Float(validDepths.count)
        let variance = validDepths.map { pow($0 - meanDepth, 2) }.reduce(0, +) / Float(validDepths.count)
        let stdDev = sqrt(variance)

        return DepthStatistics(
            minDepth: minDepth,
            maxDepth: maxDepth,
            meanDepth: meanDepth,
            stdDeviation: stdDev,
            depthRange: maxDepth - minDepth,
            validPixelRatio: Float(validDepths.count) / Float(width * height)
        )
    }
}
Enter fullscreen mode Exit fullscreen mode

Supported Devices

LiDAR sensor (full support):

  • iPhone 12/13/14/15/16 Pro & Pro Max
  • iPad Pro (2020+)

TrueDepth sensor (front camera only):

  • iPhone X and later

No depth sensor:

  • Non-Pro iPhones (rear camera)
  • Graceful degradation: DepthAnalysis.Available = false

Privacy-First Design

We took privacy seriously in this implementation:

  1. Raw depth maps are NEVER stored β€” Only statistical summaries
  2. No biometric extraction β€” No facial features, no 3D reconstruction
  3. Local processing only β€” Depth analysis runs entirely on-device
  4. Hash-based proof β€” AnalysisHash proves computation integrity
LiDAR Data β†’ Statistical Analysis β†’ Screen Detection
     β”‚              β”‚                    β”‚
     β”‚              β–Ό                    β–Ό
     β”‚         Statistics           Verdict
     β”‚              β”‚                    β”‚
     β–Ό              β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  SHA-256                    β”‚
     β”‚                       β–Ό
     β–Ό               Recorded in
  AnalysisHash         CPP Event
  (proof only)

  β€» Raw depth data is NEVER stored or transmitted
Enter fullscreen mode Exit fullscreen mode

Limitations & False Positives

This isn't a silver bullet. False positives can occur with:

  • Printed photos on flat surfaces
  • Flat artwork and paintings
  • Wall-mounted pictures

That's why we report confidence levels and recommend human review for high-stakes verification. The system provides additional evidence, not definitive proof.

What's Next

We're planning several enhancements:

  • Texture analysis β€” Detect moirΓ© patterns from screen photography
  • Temporal analysis β€” Track depth consistency across video frames
  • Android support β€” ToF sensors on Samsung, Huawei, etc.
  • ML models β€” Train on screen vs. real-world depth signatures

Try It Out

VeraSnap is available on the App Store. The depth analysis feature will be available in an upcoming release as a Pro feature.

Open source specs:

Conclusion

As generative AI makes fake images increasingly convincing, the ability to prove that a photograph captures a real-world sceneβ€”not a screen displayβ€”becomes essential.

By leveraging LiDAR sensors already in millions of iPhones, we can bring professional-grade authenticity technology to everyone. No $6,000 camera required. πŸš€


Have questions about implementing depth-based screen detection? Drop a comment below or reach out on the CPP GitHub.

Built by VeritasChain Co., Ltd.

Top comments (0)