Kim Brandwijk

Posted on Mar 18

LiDAR Depth Maps on the GPU in React Native — And a Dawn Upstream Contribution

#ios #opensource #performance #reactnative

In Part 1, I built a zero-copy GPU compute pipeline for camera frames. In Part 2, I got Apple Log HDR working with LUT color grading. This post covers what happened next: getting LiDAR depth data into the same GPU pipeline, the format wars that ensued, and submitting my first upstream patch to Dawn (Google's WebGPU implementation).

The Goal

iPhone Pro models have LiDAR sensors that produce real-time depth maps at 320×180 at 60fps. I wanted to feed that depth data into the same WebGPU compute pipeline as the camera video — one shader that reads both the camera frame and the depth map, producing a depth-colored overlay blended with the live video.

The API I wanted:

const { currentFrame } = useGPUFrameProcessor(camera, {
  resources: {
    depth: GPUResource.cameraDepth(),
  },
  pipeline: (frame, { depth }) => {
    'worklet';
    frame.runShader(DEPTH_COLORMAP_WGSL, { inputs: { depth } });
  },
});

Declare a dynamic depth resource. Bind it as a shader input. The native side handles AVCaptureDepthDataOutput, frame synchronization, and per-frame texture updates. The user writes a WGSL shader and gets LiDAR data.

Getting there took longer than expected.

The Apple Log Color Space Saga

Before depth, I had to fix something from Part 2: Apple Log wasn't actually Apple Log.

"Why Doesn't It Look Washed Out?"

Apple Log footage should look flat and desaturated — that's the whole point of a log encoding. But our output looked identical to standard sRGB. Same contrast, same saturation. The Apple Log → Rec.709 LUT produced oversaturated garbage because it expected flat input.

I spent hours adjusting the YUV→RGB shader math. Full range vs video range. BT.709 vs BT.2020 matrix. Different normalization constants. Nothing changed the look. Then I added CVPixelBuffer attachment logging:

[DawnPipeline] CVPixelBuffer transfer=ITU_R_2100_HLG, matrix=ITU_R_2020, primaries=ITU_R_2020

HLG. Not Apple Log. iOS was delivering HLG-encoded frames despite activeColorSpace = .appleLog on the capture device. The camera was converting Apple Log to HLG before handing frames to AVCaptureVideoDataOutput.

Spying on Blackmagic Camera

I had a theory but no proof. Then I did something useful: opened Console.app, filtered for Blackmagic Camera's process, and looked at their AVCaptureSession configuration:

VC: <SRC:Wide back x422/3840x2160, ColorSpace:3, ...>

ColorSpace:3 — that's Apple Log. Our session showed ColorSpace:2 — HLG. Same device, same format, different color space. What were they doing differently?

Two things:

automaticallyConfiguresCaptureDeviceForWideColor = false on the session — this prevents iOS from overriding the color space
autoConfig: 0 in their session configuration

Adding one line fixed it:

session.automaticallyConfiguresCaptureDeviceForWideColor = false

Suddenly the footage was flat, washed out, and beautiful. The LUT worked perfectly. I saved a feedback memory: "Disable automaticallyConfiguresCaptureDeviceForWideColor for Apple Log; session overrides color space otherwise."

The Format Wars: 4:2:0 vs 4:2:2

But the color space wasn't the only issue. The camera's native format was x422 (10-bit 4:2:2 YUV) — half chroma subsampling horizontally, full vertically. We were requesting x420 (4:2:0 — half in both dimensions), forcing AVFoundation to convert. That conversion was likely applying the HLG transform.

Dawn's multi-planar support has separate feature flags per format:

DawnMultiPlanarFormats — enables 8-bit NV12 (R8BG8Biplanar420Unorm)
MultiPlanarFormatP010 — enables 10-bit 4:2:0 (R10X6BG10X6Biplanar420Unorm)
MultiPlanarFormatP210 — enables 10-bit 4:2:2 (R10X6BG10X6Biplanar422Unorm)

We had P010 enabled but not P210. Requesting the native 4:2:2 format required enabling P210 — and adjusting the UV plane dimensions in the shader (half-width, full-height instead of half-both).

Lesson I saved as permanent feedback: "Verify ALL required Dawn feature flags when adding new GPU format support; Dawn silently returns zeros for unsupported formats."

Upstream Progress

The multi-planar feature flags (P010, P210) and the getRecorder() API I needed for canvas overlays were both merged into react-native-skia upstream (PR #3753 and PR #3751). The 1×1 throwaway surface hack from Part 2 — creating a dummy offscreen surface just to extract the thread-local Recorder* — is now replaced with a clean ctx.getRecorder() call. Small wins that clean up the codebase.

Adding LiDAR Depth

With Apple Log solid, I moved to depth. The design was straightforward:

When GPUResource.cameraDepth() appears in resources, add AVCaptureDepthDataOutput to the session
Use a separate DepthDelegate that caches the latest depth frame
The video FrameDelegate grabs the cached depth and passes both to processFrame
Import the depth IOSurface as a R16Float texture via Dawn's SharedTextureMemory
Bind it as a dynamic per-frame input in the depth shader

Each step had its own set of problems.

Problem 1: The LiDAR Device

The LiDAR depth camera is a separate AVCaptureDevice — .builtInLiDARDepthCamera. It captures both video and depth, but its video output is 8-bit NV12 (420v) instead of the wide-angle camera's BGRA. The pipeline needed a new code path: NV12→RGB conversion with BT.709 matrix and video range expansion.

But the shader selection was tied to useDepth, not the actual pixel format. When the JS-side depth resource parsing had a timing issue, useDepth was false even though the LiDAR camera was delivering NV12. The BGRA passthrough shader ran on YUV data — bind group mismatch, silent black output.

The fix was adding a lidarYUV flag set in Swift based on the actual camera device, passed through the ObjC++ bridge to C++, independent of whether depth data is also being captured.

Problem 2: Dawn Doesn't Know About Depth

This was the big one. Everything worked: frames arrived, the depth CVPixelBuffer had a valid IOSurface, SharedTextureMemory imported it, CreateTexture with R16Float succeeded, BeginAccess succeeded. But textureLoad in the shader returned all zeros.

Sound familiar? Same pattern as Apple Log without the P010 feature. Dawn silently accepts the import but can't actually read the data.

I fetched Dawn's source code from dawn.googlesource.com and found GetFormatEquivalentToIOSurfaceFormat() in SharedTextureMemoryMTL.mm. It's a switch statement mapping CVPixelFormat codes to Dawn TextureFormats. 17 formats are mapped — every video format you'd expect: BGRA, NV12, P010, P210, etc.

kCVPixelFormatType_DepthFloat16 (hdep) is not in the list.

The depth IOSurface has format code 0x68646570. Dawn doesn't recognize it. It imports the IOSurface handle successfully (it's still a valid IOSurface), creates the texture object, and even succeeds at BeginAccess. But the underlying Metal texture can't be properly configured because Dawn doesn't know the pixel format — so textureLoad returns zeros.

The irony: Dawn already maps kCVPixelFormatType_OneComponent16Half → R16Float. The depth format DepthFloat16 has identical memory layout — single-channel, 16-bit half-float. Only the FourCC tag differs.

The CPU Upload Workaround

For now, I bypass SharedTextureMemory entirely for depth:

// CPU readback + WriteTexture (115KB at 320x180 — negligible)
CVPixelBufferLockBaseAddress(depthBuffer, kCVPixelBufferLock_ReadOnly);
const void* data = CVPixelBufferGetBaseAddress(depthBuffer);
size_t bpr = CVPixelBufferGetBytesPerRow(depthBuffer);

wgpu::TexelCopyTextureInfo dst{};
dst.texture = depthTexture;  // persistent, created once

wgpu::TexelCopyBufferLayout layout{};
layout.bytesPerRow = (uint32_t)bpr;
layout.rowsPerImage = (uint32_t)depthH;

device.GetQueue().WriteTexture(&dst, data, bpr * depthH, &layout, &extent);
CVPixelBufferUnlockBaseAddress(depthBuffer, kCVPixelBufferLock_ReadOnly);

320×180 × 2 bytes = 115KB per frame. At 60fps, that's 6.9MB/sec of CPU→GPU transfer. On a device that does 25GB/sec bandwidth, this is invisible. The texture is persistent (created once, reused every frame), so there's no allocation overhead either.

Zero-copy would be nicer. Which brings us to...

The Dawn Upstream Contribution

Four lines of code. That's all it takes:

case kCVPixelFormatType_DepthFloat16:
    return wgpu::TextureFormat::R16Float;
case kCVPixelFormatType_DepthFloat32:
    return wgpu::TextureFormat::R32Float;
case kCVPixelFormatType_DisparityFloat16:
    return wgpu::TextureFormat::R16Float;
case kCVPixelFormatType_DisparityFloat32:
    return wgpu::TextureFormat::R32Float;

Added to the switch in GetFormatEquivalentToIOSurfaceFormat(), right before the default case. Same file, same pattern as every other format mapping.

Dawn uses Chromium's Gerrit for code review instead of GitHub PRs. I filed a bug at issues.chromium.org under Blink>WebGPU, then submitted the CL.

The CL is at dawn-review.googlesource.com/c/dawn/+/297995. Four case statements. If accepted, every iOS app using Dawn can zero-copy import LiDAR depth data.

The Depth Colormap Shader

With depth data flowing (via CPU upload for now), the shader is satisfying:

@group(0) @binding(0) var inputTex: texture_2d<f32>;
@group(0) @binding(1) var outputTex: texture_storage_2d<rgba16float, write>;
@group(0) @binding(3) var depthTex: texture_2d<f32>;
@group(0) @binding(4) var depthSampler: sampler;

fn depthColormap(t: f32) -> vec3f {
  let r = clamp(2.0 * t - 0.5, 0.0, 1.0);
  let g = clamp(1.0 - 2.0 * abs(t - 0.5), 0.0, 1.0)
        + clamp(2.0 * t - 1.0, 0.0, 1.0);
  let b = clamp(1.0 - 2.0 * t, 0.0, 1.0);
  return vec3f(r, g, b);
}

@compute @workgroup_size(16, 16)
fn main(@builtin(global_invocation_id) id: vec3u) {
  let outDims = textureDimensions(outputTex);
  if (id.x >= outDims.x || id.y >= outDims.y) { return; }

  let color = textureLoad(inputTex, vec2i(id.xy), 0).rgb;

  // Rotate UV 90° CW — depth texture is landscape, output is portrait
  let rotU = (f32(id.y) + 0.5) / f32(outDims.y);
  let rotV = 1.0 - (f32(id.x) + 0.5) / f32(outDims.x);
  let depth = textureSampleLevel(depthTex, depthSampler, vec2f(rotU, rotV), 0.0).r;

  let t = clamp(depth / 5.0, 0.0, 1.0);  // 0-5m range
  let blended = mix(color, depthColormap(t), 0.6);

  textureStore(outputTex, vec2i(id.xy), vec4f(blended, 1.0));
}

The depth texture is 320×180 (landscape), but the output is 1080×1920 (portrait, after the rotation pass). The UV rotation maps portrait coordinates to landscape depth coordinates. textureSampleLevel with a linear sampler gives free bilinear interpolation, upsampling the 320×180 depth map smoothly to the 1080×1920 output.

The normalize distance (depth / 5.0) sets the colormap range — 5 meters maps to the full blue→green→yellow gradient. LiDAR range is ~5m on iPhone, so this captures the useful range. Closer objects are blue, farther objects yellow.

GPU-Side Portrait Rotation

One thing that helped across the board: moving the landscape→portrait rotation from Skia to the GPU.

Previously, the camera delivered landscape textures (1920×1080), the compute pipeline processed them in landscape, and Skia applied a 90° rotation transform when drawing to the portrait screen. This rotation was expensive — Skia had to do a rotated blit of a 4K F16 texture every frame, costing significant GPU time.

Now the first compute pass (passthrough or YUV→RGB) does the rotation:

// 90° CW: output(x, y) reads from input(y, inH - 1 - x)
let yDims = textureDimensions(yPlaneTex);
let srcCoord = vec2u(id.y, yDims.y - 1u - id.x);

The output texture is portrait-sized (1080×1920). Every downstream shader and Skia sees portrait coordinates. The Skia Canvas draws with an identity transform — no rotation, no scaling, just a straight blit. This matters at high resolutions where the rotated blit was the bottleneck.

It also fixed the onFrame canvas coordinate problem: previously, Skia draws happened in landscape coordinates and then got rotated, making positioning unintuitive. Now (0, 0) is top-left of what the user sees.

What I Learned

Dawn's format mapping is strict and silent

When Dawn doesn't recognize an IOSurface pixel format, it doesn't error — it imports successfully but reads zeros. We hit this three times: once with Apple Log (missing P010), once with 4:2:2 (missing P210), and once with LiDAR depth (missing DepthFloat16 mapping entirely). The symptoms are always the same: "everything succeeds but the output is black."

Check how other apps configure their sessions

Spying on Blackmagic Camera's AVCaptureSession via Console.app was the breakthrough for Apple Log. Their ColorSpace:3 vs our ColorSpace:2 immediately showed the misconfiguration. System-level logging is an underused debugging tool.

CPU upload is fine for small data

The instinct to zero-copy everything is strong, but 115KB at 60fps is nothing. The CPU upload for depth data is simpler, more debuggable, and just as fast as the zero-copy path would be. Save the optimization for when it matters — the 8MB camera frame, not the 115KB depth map.

Contributing upstream is surprisingly accessible

Dawn's Gerrit workflow is different from GitHub but not harder. CLA + clone + hook + push. For a four-line format mapping, the total effort was: file a bug (5 min), clone the repo (2 min), make the edit (1 min), push (1 min). The hardest part was finding GetFormatEquivalentToIOSurfaceFormat() in the source — once found, the fix was obvious.

The Full Pipeline

Here's what's running on an iPhone 16 Pro:

Camera (LiDAR, 1920×1080 NV12 420v)
  + Depth (320×180 DepthFloat16, CPU upload to R16Float)
  → Pass 0: NV12→RGB (BT.709, video range) + 90° CW rotation
  → RGBA16Float ping-pong (1080×1920 portrait)
  → Pass 1: Depth colormap (bilinear upsample, blue→green→yellow blend)
  → RGBA16Float output
  → Skia Graphite MakeImageFromTexture → SkImage → Canvas

Or with Apple Log:

Camera (Wide, 1920×1080 10-bit 4:2:2 YUV, Apple Log)
  → Pass 0: YUV→RGB (BT.2020, video range, no clamp) + 90° CW rotation
  → RGBA16Float ping-pong (1080×1920 portrait)
  → Pass 1: Apple Log → Rec.709 LUT (65³ 3D texture)
  → RGBA16Float output
  → Skia Graphite MakeImageFromTexture → SkImage → Canvas

Same pipeline architecture, different first-pass shader. The user writes WGSL, declares resources, and the native side handles format detection, IOSurface import, YUV conversion, and rotation.

What's Next

Pipeline/overlay/onFrame split — separating what goes into recordings vs. what's display-only
Recording — ProRes output with Apple Log preserved, overlays burned in
Dawn upstream — waiting on the CL review, hopefully zero-copy depth soon
More depth effects — focus peaking, background blur, depth-based segmentation

This is Part 3 of an ongoing series. Part 1 covers the initial spike. Part 2 covers Apple Log HDR and debugging. Code: react-native-webgpu-camera

DEV Community