Kim Brandwijk

Posted on Mar 14

Zero-Copy GPU Compute on Camera Frames in React Native — What Actually Worked

#reactnative #webgpu

I'm building a React Native camera library that runs WebGPU compute shaders on live camera frames. Not a tech demo — a real pipeline that needs to sustain 60fps+ on a phone without melting it.

This is the story of the spike that validated whether it's even possible. I hit 13 distinct build and integration failures along the way. Here's what I tried, what broke, and the architecture that actually works.

The Problem

If you've done real-time camera processing in React Native, you've probably used VisionCamera's frame processors. They work, but there's a fundamental bottleneck: pixel data has to cross the JS bridge. At 1080p BGRA, that's about 8MB per frame. At 30fps, you're copying 240MB/sec between native and JavaScript — and every byte has to be garbage collected on the other side.

I wanted something different: GPU compute shaders running directly on camera frames, with the results rendered by Skia — all without the pixels ever leaving the GPU. The stack: WebGPU (via Dawn) for compute, Skia Graphite for rendering, and a native pipeline that keeps everything on the metal.

The first question was whether React Native could even support this. I set out to find out.

The Naive Attempts

Attempt 1: Just Copy the Pixels

The most obvious approach: grab camera frame bytes in the native camera delegate, ship them to JS, create a Skia image.

// Don't do this
const pixels = WebGPUCameraModule.getCurrentFramePixels();
const data = Skia.Data.fromBytes(pixels); // 8MB allocation
const image = Skia.Image.MakeImage(info, data, bytesPerRow);

This works for about three seconds. Then the app crashes from memory pressure. Creating 8MB of Skia.Data thirty times a second overwhelms the garbage collector. Even with explicit .dispose() on previous images and a 30fps cap, it's too much. The GC can't keep up with the allocation rate.

Lesson: Never copy raw pixel data to JS for real-time video. If you're allocating megabytes per frame in JavaScript, you've already lost.

Attempt 2: WebGPU from JavaScript

Okay, so I need to keep pixels on the GPU. Skia Graphite bundles Dawn (Google's WebGPU implementation) and exposes navigator.gpu to JavaScript. You can create textures, compile shaders, dispatch compute — all from JS. The plan: use device.queue.writeTexture() to upload camera pixels to a GPU texture, run a compute shader, render the result.

I got all the way to a working compute pipeline. Shader compiled, bind groups created, compute dispatched. The Canvas even showed something — a solid green rectangle. The Apple GPU profiler overlay appeared for the first time, confirming actual GPU work. Progress.

Except the green was the shader's output for all-zero input. writeTexture() silently returned zeros.

I dug into the react-native-skia source. The Dawn JS bindings are partially implemented:

What works from JS:

navigator.gpu.requestAdapter() / requestDevice()
device.createShaderModule(), createComputePipeline(), createTexture()
encoder.beginComputePass() / dispatchWorkgroups() / submit()
copyBufferToBuffer() — verified with a [10,20,30,40] roundtrip

What doesn't work from JS:

device.queue.writeTexture() — implemented in C++ but returns zeros through JSI. Likely an ArrayBuffer bridging issue.
device.queue.copyBufferToTexture() — completely stubbed out (commented code, never implemented)
ImportSharedTextureMemory() — C++ only, not exposed to JS

Every path to getting pixel data into a texture from JavaScript was broken. I could create GPU resources and dispatch compute, but couldn't feed them data.

After changing the shader to output pink instead of green — and seeing pink — I had my confirmation: the compute pipeline worked perfectly. The input was just empty.

Lesson: If you see a solid color from a compute shader, your input texture is probably empty. Verify data made it to the GPU before debugging the shader.

The Architecture That Works

If JS can't write to textures, the native side has to do it. And if I'm already on the native side, I might as well skip the copy entirely.

iOS camera frames come as CVPixelBuffers backed by IOSurface — which is already GPU memory. Dawn's SharedTextureMemory API can import an IOSurface directly as a GPU texture, zero copies. The pipeline:

Camera (AVCaptureSession)
  → CVPixelBuffer (backed by IOSurface — already on GPU)
  → Dawn SharedTextureMemory.ImportSharedTextureMemory(ioSurface)
  → wgpu::Texture (input)
  → Compute Shader (Sobel edge detection)
  → wgpu::Texture (output)
  → Skia Graphite MakeImageFromTexture()
  → SkImage (texture-backed, no pixel copy)
  → Skia Canvas

Every arrow is either a GPU-side operation or a metadata bind. No pixel copies anywhere.

The native C++ pipeline handles the hot path — IOSurface import, compute dispatch, output texture management. The JS side doesn't poll or copy anything. Instead, it uses Reanimated's useFrameCallback — a worklet that runs on the UI thread every display frame — to grab the latest compute output.

The trick is that worklets run on Reanimated's UI runtime, which has a separate globalThis from the JS runtime. You can't just install a global function and call it from a worklet. Instead, you create a JSI host object on the JS thread and pass it into the worklet closure — Reanimated shares host objects across runtimes:

const stream = useSharedValue<CameraStream | null>(null);
const currentFrame = useSharedValue<SkImage | null>(null);

// Create the stream host object after pipeline setup
stream.value = globalThis.__webgpuCamera_createStream();

// useFrameCallback runs on UI thread — stream is shared across runtimes
useFrameCallback(() => {
  'worklet';
  const s = stream.value;
  if (!s) return;
  const img = s.nextImage();
  if (img) {
    currentFrame.value?.dispose();
    currentFrame.value = img;
  }
});

__webgpuCamera_createStream() returns a CameraStreamHostObject — a plain jsi::HostObject with a nextImage() method that returns a JsiSkImage. The shared value is passed directly to the Skia Canvas, which auto-redraws when the value changes. No React re-renders, no setState, no JS thread involvement at all.

This follows the exact pattern that react-native-skia uses for video playback (useVideo): a JSI host object with a nextImage() method, called from a UI thread worklet, feeding a shared value that drives the Canvas.

The key insight is what crosses the JS boundary: not pixels, not even texture data — just an opaque handle. The heavy work stays on the GPU, and the render loop stays on the UI thread.

Thread Safety

The native pipeline runs on the camera delegate thread. The UI thread reads the latest output via useFrameCallback. The sk_sp<SkImage> output is shared between them, protected by a mutex.

There's a subtler problem: the CameraStreamHostObject holds a raw pointer to the C++ pipeline. If the user stops the camera and the pipeline gets destroyed, the host object has a dangling pointer. I solve this with a shared liveness flag:

// Pipeline constructor
_alive = std::make_shared<std::atomic<bool>>(true);

// Pipeline destructor
_alive->store(false);

// JSI lambda captures the shared_ptr, not the raw pipeline pointer
[pipeline, alive](jsi::Runtime &rt, ...) -> jsi::Value {
  if (!alive->load()) return jsi::Value::null();
  // safe to use pipeline
}

The shared_ptr<atomic<bool>> is reference-counted and outlives both the pipeline and the lambda. Cheap to check, safe against destruction races.

The Skia Graphite Setup Gauntlet

Getting the compute pipeline right was the fun part. Getting Skia Graphite to compile was the other part.

@shopify/react-native-skia supports a Graphite backend (SK_GRAPHITE=1) that uses Dawn for GPU rendering. It also bundles Dawn's WebGPU implementation and exposes navigator.gpu. This is what makes the whole architecture possible — one shared GPU context for both Skia rendering and WebGPU compute.

Here's what "add dependency, build" actually looked like.

Header clashes (Issues 1-3)

I started with both react-native-wgpu and @shopify/react-native-skia installed. Immediate header conflict — Skia Graphite bundles its own Dawn, and react-native-wgpu has its own modified JSI headers. jsi2/EnumMapper.h file not found.

Fix: Remove react-native-wgpu. Skia Graphite is self-contained.

But now the Dawn headers that Skia's C++ code includes (webgpu/webgpu_cpp.h) were missing. Turns out they're shipped in a separate npm package (react-native-skia-graphite-headers) that you have to discover by reading the podspec source. And it also ships Skia private headers (src/gpu/graphite/ContextOptionsPriv.h) that are needed for compilation.

I wrote an Expo config plugin to copy both dawn/ and skia/ headers into the right location during prebuild.

Version mismatch (Issue 4)

Using "*" (latest) versions for the three react-native-skia-graphite-* npm packages. Headers came from one version, iOS binaries from another. Result: Undefined symbols: SkAndroidCodec::MakeFromData — an Android symbol referenced in an iOS build. Classic version skew.

I switched to a git submodule approach: own the Skia source, guarantee header/binary consistency.

The size problem (Issues 5-7)

The submodule brought 2.4 GB into the project. EAS Build (Expo's cloud build service) reported 1.1 GB upload.

Breakdown: Android libs (1.2 GB), macOS libs (410 MB), example apps (110 MB), Skia externals (175 MB). None needed for an iOS build.

Created .easignore to strip it down. Immediately broke codegen — React Native needs src/specs/ for Turbo Module bindings, and I'd excluded all of src/. Then broke pod install — the podspec's prepare_command checks if both iOS and macOS xcframeworks exist before skipping the npm download step. Exclude macOS libs → it tries to require.resolve('react-native-skia-graphite-apple-ios') → which I don't have.

Every exclusion had a hidden dependency.

The silent bug (Issue 8)

Ran bun run install:skia-graphite. Script reported success: "Dawn/WebGPU headers copied." Headers directory was empty.

The install script downloads a tar.gz and looks for dawn/include at the archive root. The archive actually nests it three levels deep at packages/skia/cpp/dawn/include/. The path mismatch means the copy silently finds zero files and reports success.

Manual fix: tar xzf headers.tar.gz --strip-components=3.

Podspec path resolution (Issues 12-13)

My native C++ pipeline needs headers from Skia's source tree. The podspec computes the path with Ruby's File.expand_path:

skia_pkg_root = File.expand_path('../../../../../../...', __dir__)

I counted wrong. Six levels of .. instead of five. Off by one directory — landed at the parent of my repo root instead of the repo root itself.

Cost: Two wasted EAS builds. Each takes ~15 minutes and fails near the very end of compilation.

Then, even with the right path, HEADER_SEARCH_PATHS didn't work. The recursive glob /** was outside the quotes:

# Broken — shell sees unquoted glob
"\"#{root}/cpp\"/**"

# Working — Xcode recursive search
"\"#{root}/cpp/\"/**"

One character difference. Two more builds to discover.

Lesson: Always verify File.expand_path output before committing to a cloud build:

ruby -e "d = '/path/to/ios'; puts File.expand_path('../../../../../target', d)"

Multi-Pass: Chaining Shaders

Once the single-pass pipeline worked, the next question was obvious: can we chain multiple shaders? Run Sobel edge detection first, then feed its output into a color-mapping pass, all in one frame.

The architecture already had the answer. The single-pass pipeline used a ping-pong texture pair (texA and texB) — but only used one bounce. Multi-pass generalizes it: pass 0 reads the camera input and writes to texA, pass 1 reads texA and writes to texB, pass 2 reads texB and writes to texA, and so on. The final output alternates based on the number of passes.

From the JS side, the API is just an array of shaders:

const { currentFrame } = useGPUFrameProcessor(camera, (frame) => {
  'worklet';
  frame.runShader(SOBEL_WGSL);        // pass 0: edge detection
  frame.runShader(SOBEL_COLOR_WGSL);  // pass 1: colorize edges
});

The hook captures the shader chain at setup time using a proxy object — the 'worklet' directive means the function body is declarative, not imperative. The proxy records which shaders were called and in what order, then sends the chain to the native pipeline as a single setup call. No shaders execute during capture.

The Multi-Pass Regression

Getting multi-pass to render was harder than getting it to compute. The first build produced a solid red screen — the pipeline was running (GPU profiler confirmed work), but nothing visible.

I'd made two changes simultaneously during the multi-pass rewrite:

Deferred SkImage creation — Based on a theory that Skia's Graphite recorder was thread-local, I added a dirty flag to defer MakeImageFromTexture from the camera thread to the UI thread. The theory was wrong. The working single-pass version had always created SkImage on the camera thread, and it rendered fine.
RGBA view format override — The camera provides BGRA textures. I added an explicit RGBA texture view with viewFormats on the BGRA input, thinking the compute shader needed RGBA. It didn't — the default BGRA view worked for texture sampling.

The fix was reverting both changes to match the working single-pass pattern. But I'd made both changes at once, so I couldn't tell which one caused the red screen. This violated a basic principle: change one thing at a time when debugging GPU pipelines. GPU errors are silent — no exceptions, no crashes, just wrong pixels. You need to isolate variables.

Lesson: Dawn doesn't throw when you create a texture view with a mismatched format. It just renders garbage. If your output is a solid color after a pipeline change, diff against the last working version and revert one change at a time.

Buffer Readback: Getting Data Out of the GPU

Shaders that produce textures are useful for visual effects. But many real use cases need data — histograms, feature detection, classification scores. This requires reading GPU buffer contents back to JavaScript.

The pipeline supports buffer outputs alongside texture outputs. A shader can declare a storage buffer binding, write to it with atomics, and the pipeline copies the result to a staging buffer for CPU readback:

const { currentFrame, buffers } = useGPUFrameProcessor(camera, (frame) => {
  'worklet';
  frame.runShader(HISTOGRAM_WGSL, { output: Uint32Array, count: 256 });
});

The WGSL shader computes a 256-bin luminance histogram using atomic operations, while simultaneously passing through the camera frame to the output texture:

@group(0) @binding(0) var inputTex: texture_2d<f32>;
@group(0) @binding(1) var outputTex: texture_storage_2d<rgba8unorm, write>;
@group(0) @binding(2) var<storage, read_write> histogram: array<atomic<u32>, 256>;

@compute @workgroup_size(16, 16)
fn main(@builtin(global_invocation_id) gid: vec3u) {
  let color = textureLoad(inputTex, vec2i(gid.xy), 0);
  textureStore(outputTex, gid.xy, color);  // passthrough

  let lum = dot(color.rgb, vec3f(0.2126, 0.7152, 0.0722));
  let bin = min(u32(lum * 256.0), 255u);
  atomicAdd(&histogram[bin], 1u);
}

The native side uses double-buffered staging buffers. While frame N's compute shader writes to GPU buffer A, the CPU reads frame N-1's result from staging buffer B. MapAsync fires a callback when the data is ready, and readBuffer() returns the most recently completed map. The data is always 1-2 frames stale — invisible at 120fps.

Two Ways to Use the Data

Buffer readback creates an interesting design question: where do you consume the data?

The onFrame callback runs inside the pipeline, drawing directly onto the GPU texture. Anything drawn here is burned into the frame — it shows up in recordings, screenshots, and the live preview:

const { currentFrame } = useGPUFrameProcessor(camera, {
  pipeline: (frame) => {
    'worklet';
    const hist = frame.runShader(HISTOGRAM_WGSL, { output: Uint32Array, count: 256 });
    return { hist };
  },
  onFrame: (frame, { hist }) => {
    'worklet';
    if (!hist) return;
    // Draw histogram bars on the GPU texture — becomes part of the video
    for (let i = 0; i < 256; i++) {
      const barH = (hist[i] / maxVal) * 200;
      frame.canvas.drawRect(Skia.XYWHRect(x + i * 2, y + 200 - barH, 2, barH), paint);
    }
  },
});

But buffers is also returned as a Reanimated shared value. You can consume it in React for UI-only overlays that don't burn into the video:

const { currentFrame, buffers } = useGPUFrameProcessor(camera, (frame) => {
  'worklet';
  frame.runShader(HISTOGRAM_WGSL, { output: Uint32Array, count: 256 });
});

// Draw histogram in React — UI overlay only, not in the recorded video
const histPicture = useDerivedValue(() => {
  const hist = buffers.value.__buf_0 as Uint32Array | null;
  if (!hist) return emptyPicture;
  return createPicture((canvas) => { /* draw histogram bars */ });
});

return (
  <Canvas>
    <SkImage image={currentFrame} />
    <Picture picture={histPicture} />  {/* overlay — not recorded */}
  </Canvas>
);

Same data, two consumption points. Recording a tutorial with bounding boxes? Use onFrame. Live preview with debug overlays? Use buffers in React. The user gets both options for free.

ML Inference on the GPU: The ONNX Runtime Spike

With compute shaders and buffer readback working, the next frontier is ML model inference. Running a depth estimation or object detection model on the same Dawn GPU device that processes camera frames — no CPU roundtrip, no data copies between GPU contexts.

ONNX Runtime already has a WebGPU execution provider that accepts an external Dawn device. The C++ API takes string-encoded pointers for WGPUDevice, WGPUInstance, and DawnProcTable via ConfigOptions:

std::unordered_map<std::string, std::string> options;
options["webgpuDevice"] = std::to_string(reinterpret_cast<size_t>(dawnDevice));
options["webgpuInstance"] = std::to_string(reinterpret_cast<size_t>(dawnInstance));
options["dawnProcTable"] = std::to_string(reinterpret_cast<size_t>(&dawnProcs));
sessionOptions.AppendExecutionProvider("WebGPU", options);

This means we can pass Skia Graphite's Dawn device directly to ONNX Runtime — same GPU context, zero-copy tensor input from compute shader output buffers.

The React Native ONNX package (onnxruntime-react-native) uses native C++ via JSI and supports CPU, XNNPACK, CoreML, and NNAPI execution providers. It doesn't support WebGPU — yet. The spike adds "webgpu" to the supported backends and wires the device/instance/proc-table through the JSI layer. The model runs as a C++ execution provider on the same GPU — one GPU context, one set of resources, no synchronization overhead.

The Dawn Build Problem

Building ONNX Runtime with --use_webgpu --use_external_dawn sounds like it should use an existing Dawn installation. It doesn't. "External Dawn" means "Dawn is not bundled in ONNX Runtime's source tree" — it still fetches and compiles Dawn from source during the build. This adds 30+ minutes to the build for something we already have compiled inside Skia Graphite.

The right fix is a --dawn_home flag that points at prebuilt Dawn headers and libraries. That doesn't exist yet — but it would be a nice upstream contribution alongside the React Native WebGPU EP itself.

What I Learned

The spike validated the core question: yes, you can run WebGPU compute shaders on live camera frames in React Native, at real-time frame rates, with zero pixel copies. And then it kept going — multi-pass shader chains, GPU buffer readback, and the beginnings of ML inference on the same device.

The architecture works: IOSurface import → Dawn compute → Skia Graphite rendering. Everything stays on the GPU. JavaScript gets opaque handles, not pixels.

The cost of getting here was disproportionate to the complexity of the actual pipeline code. The C++ compute pipeline is ~400 lines. The build system integration was 13+ issues across multiple days.

What's actually running

Native C++ compute pipeline imports camera frames via IOSurface (zero-copy)
Multi-pass WGSL shader chains — edge detection, colorization, histogram — on the shared Dawn device
GPU buffer readback with double-buffered staging for compute data (histograms, future: bounding boxes)
Output wraps as SkImage via Skia Graphite's texture interop
useGPUFrameProcessor hook with two consumption paths: onFrame for burn-in, buffers for UI overlays
UI thread worklet grabs the latest SkImage via JSI and feeds it to Skia Canvas — no JS thread, no React re-renders
All on a physical iPhone 16 Pro, via EAS Build

The numbers

From Apple's GPU profiler overlay on an iPhone 16 Pro:

Metric	Single-pass	Multi-pass (2 shaders)
GPU frame time	2.88ms	4ms
Display FPS	120fps	120fps
Camera FPS	120fps	120fps
Resolution	3840x2160 (4K)	3840x2160 (4K)
Sustained FPS	120fps, 0% drops	120fps, 0% drops
Thermal	nominal	nominal

Even with two compute passes, the GPU has massive headroom — 4ms of work in an 8.3ms frame budget at 120Hz.

Getting the render loop right took three iterations. The first version used setTimeout — 20fps, even though the GPU overlay showed 40fps. Switching to requestAnimationFrame doubled throughput to 30fps. But each frame still called setState to update the Skia Canvas, which meant a full React re-render per camera frame. At 4K this bottleneck was brutal — 21.7fps even though the GPU overlay showed 41fps.

The final version eliminates JavaScript from the render loop entirely. A CameraStreamHostObject (JSI host object) is created on the JS thread and stored in a Reanimated shared value. useFrameCallback runs on the UI thread, calls stream.nextImage() on the shared host object, and writes the result to another shared value that drives the Skia Canvas. No setState, no React reconciliation, no JS thread involvement at all. This is the same pattern react-native-skia uses for video playback (useVideo), and it's the one that unlocked 120fps.

What's next

ONNX Runtime WebGPU EP for React Native — run ML models on the same Dawn device, PR back upstream
frame.runModel() API — async side-channel that produces buffer data, not textures
Android support — AHardwareBuffer is the IOSurface equivalent, same SharedTextureMemory API
Recording pipeline — render the composited output to an AVAssetWriter surface
Upstream contributions — several issues filed, a few more to file

The bigger picture

React Native's GPU story is still early. The pieces exist — Dawn, Skia Graphite, JSI — but they weren't designed to work together for this use case. Half the WebGPU JS bindings are stubbed out. The install tooling has silent bugs. Header management is a puzzle.

But the foundation is real. Dawn's SharedTextureMemory is the key primitive — it lets you import platform GPU resources (IOSurface, AHardwareBuffer) directly as WebGPU textures. Skia Graphite shares the Dawn device, so compute output becomes Skia input without a copy. And JSI lets you hand GPU resource handles to JavaScript without serialization.

The fact that ONNX Runtime already accepts an external Dawn device via string-encoded pointers is a signal that this convergence was anticipated. One shared GPU context for rendering, compute, and ML inference — that's the end state. We're just wiring it together.

If you're building something that needs real-time GPU compute in React Native — camera effects, AR overlays, video processing, on-device ML — this path works. It's just not paved yet.

This is an ongoing project. Follow along or check out the code: react-native-webgpu-camera

DEV Community