Real-Time Face Liveness in React Native: Vision Camera, Worklets, and ML Kit

Deen Jimoh — Sun, 26 Apr 2026 18:00:07 +0000

If you’ve ever shipped a KYC, onboarding, or account-recovery flow, you’ve run into the liveness problem: how do you prove the face in front of the camera is a real, present human, not a photo, a screen replay, or a video loop?
Solving this in React Native means processing every frame the camera produces, on-device, in milliseconds. The toolchain that makes this practical is a three-piece combination: Vision Camera for the camera pipeline, worklets for off-thread JavaScript, and Google ML Kit for the actual face analysis. This post walks through how those pieces fit together, what each one does, and where the honest limits are.

Why this combination exists
Liveness is fundamentally a temporal problem. A blink is a sequence of frames where the eye-open signal drops then recovers. A head turn is a sequence where yaw moves through a range. You can’t solve it with a single capture — you need every frame, in order, evaluated against a state machine, without blocking the UI thread that’s rendering the “blink now” prompt. That’s the problem this stack solves.

React Native Vision Camera
Vision Camera is the high-performance camera library by Margelo. It exposes the device camera with low-level control over format, frame rate, pixel format, and orientation, and crucially, it gives you access to the raw frame buffer through a feature called Frame Processors.

A frame processor is a function that runs once for every camera frame, on the camera thread. At 30 frames per second, that’s a 33 ms budget per frame. Anything slower drops frames, which is fine occasionally, but a sustained miss means your processor is too heavy.

Worklets
A worklet is a JavaScript function marked with the ‘worklet’ directive, that a Babel plugin prepares to run on a separate runtime from the main React Native thread. Two reasons this matters for liveness:
•The UI stays responsive. Face detection at 30 fps would otherwise fight your animations and prompt rendering for the same thread.
•Native calls become synchronous and zero-copy. Through JSI, a worklet can call into Swift or Kotlin in well under a millisecond, with no serialisation overhead. The same call across the old React Native bridge would add 10–50 ms per frame, enough to halve your effective frame rate.

The setup cost is small: install react-native-worklets-core and add its plugin to your Babel config. Skip that step and your worklet directives become inert string literals, and the frame processor fails at runtime with a confusing closure-capture error.

Google ML Kit
ML Kit is Google’s on-device machine learning library, with native Android and iOS bindings for face detection, text recognition, barcode scanning, pose estimation, and more. You don’t call it from JavaScript directly; you install a frame processor plugin (such as react-native-vision-camera-face-detector) that wraps the ML Kit native SDK and exposes a worklet-callable function.

For face liveness, the signals that matter are:
•Left and right eye open probabilities
•Smiling probability
•Head pose angles (yaw, pitch, roll)
•Face bounding box and a stable tracking ID across frames
That’s enough to build active liveness challenges: blink, smile, turn left, turn right, and verify them in real time.

How it fits together
The architecture is a vertical pipe. ML Kit produces face attributes per frame in native code. Those attributes cross into the worklet via JSI. The worklet does minimal work — extracts the signals it cares about — and dispatches them to the main JS thread, where a normal React hook owns the challenge state machine, timing, and UI.

The discipline worth maintaining: keep the worklet narrow. Extract signals, nothing else. Put thresholds, sequencing, and timing in plain JavaScript where they’re testable without a camera. Most bugs in homegrown liveness implementations come from putting decision logic inside the worklet, where it becomes harder to test and easier to ship thresholds you’ve never validated.

What this stack does and doesn’t give you
This combination supports active liveness — flows where the user performs randomised challenges that are verified in real time. That’s a credible defence against static photos and naive video replay, and it covers a wide band of use cases: internal tooling, low-stakes account recovery, attendance, and presence checks in tutoring or telehealth.

It does not give you passive liveness (ISO/IEC 30107-3 PAD certification). Detecting print attacks, replay attacks on high-quality screens, deepfake injection, or 3D-printed masks needs specialised models trained on texture and micro-motion, and often hardware signals like depth or near-infrared.

For regulated KYC, AML, or payments, reach for a certified vendor — AWS Rekognition Face Liveness, FaceTec, iProov, Onfido, Jumio. They exist for exactly this reason, the per-check cost is low, and the build-versus-buy maths almost always favours buy when regulation is on the line.

Closing thought
Pull any one of these three pieces out and the architecture stops making sense. You need every frame (Vision Camera), you can’t block the UI thread for 30 Hz inference (worklets), and you need real ML signals for the challenges to be more than theatre (ML Kit).

Whatever you build with it, write down your threat model first. Active liveness is a real defence against a specific class of attacks. It’s not a substitute for a certified vendor when the stakes warrant one — and being honest about that line, in the architecture and in the marketing, is the difference between a feature that protects users and one that just feels like it does.

DEV Community: Deen Jimoh

Real-Time Face Liveness in React Native: Vision Camera, Worklets, and ML Kit