Integrating @mediapipe/tasks-vision for Hand Landmark Detection in React

Kathan — Wed, 20 Dec 2023 13:38:04 +0000

I came across a project where I needed to check if a hand was present or not. During my research, I discovered that @mediapipe/hands are not functioning as they used to, and Google has transitioned to @mediapipe/task-vision, which is utilized for their MediaPipe projects. Using their documentation and making some changes, I developed a tool that detects the presence of hands and displays landmarks on a canvas. For future reference, I thought of creating an example of this, so you can go through it and easily work with any task-vision models in React.
here is how To do this:
(Directly scroll to last if you just want to see the full code.)

Setting Up the Environment

Step 1: Install MediaPipe.

npm i @mediapipe/tasks-vision

Step 2: Download the HandLandmark model from here.
(Note: For different models, refer to the models section within the MediaPipe documentation.)

Implementing in React

Now let's jump to our demo file.

Step 1: Import the Model and FilesetResolver.

import { FilesetResolver, HandLandmarker } from "@mediapipe/tasks-vision";
import hand_landmarker_task from "../models/hand_landmarker.task";

FilesetResolver - To find and use a specific set of files needed in project.
HandLandmarker - It's used for recognizing and understanding hand movements in images or videos.
hand_landmarker_task is just name of model.

Step 2: Initialize hand detection.


const initializeHandDetection = async () => {
            try {
                const vision = await FilesetResolver.forVisionTasks(
                    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm",
                );
                handLandmarker = await HandLandmarker.createFromOptions(
                    vision, {
                        baseOptions: { modelAssetPath: hand_landmarker_task },
                        numHands: 2,
                        runningMode: "video"
                    }
                );
                detectHands();
            } catch (error) {
                console.error("Error initializing hand detection:", error);
            }
        };

The function first uses FilesetResolver.forVisionTasks to load necessary files from a URL. Next, the HandLandmarker.createFromOptions is called to create a hand landmarker. to work in a 'video' mode (runningMode: "video"). For Image use runningMode: "image" .
Once everything is set up, the function calls detectHands() to start the actual hand detection process.

Step 3: Detect hands using detectForVideo.

if (videoRef.current && videoRef.current.readyState >= 2) {
const detections = handLandmarker.detectForVideo(videoRef.current, performance.now());

requestAnimationFrame(detectHands);
}

This line is where the hand detection actually happens. The handLandmarker.detectForVideo function is called with two arguments:
videoRef.current: The current video element.
performance.now(): The current time in milliseconds. This is used to timestamp the detections, helping in synchronizing the detection with the video.

requestAnimationFrame is a method that tells the browser to perform an animation and requests that the browser calls a specified function (in this case, detectHands) to update an animation before the next repaint. This creates a loop where detectHands is called repeatedly in sync with the browser's refresh rate.

Step 4: Start the webcam.

const startWebcam = async () => {
            try {
                const stream = await navigator.mediaDevices.getUserMedia({ video: true });
                videoRef.current.srcObject = stream;
                await initializeHandDetection();
            } catch (error) {
                console.error("Error accessing webcam:", error);
            }
        };

videoRef.current.srcObject = stream; Here, the video stream from the webcam is assigned to a video element (referred to by videoRef.current).
await initializeHandDetection(); After the webcam starts, this line calls the initializeHandDetection function.

Step 5: Clean up in useEffect.

return () => {
            if (videoRef.current && videoRef.current.srcObject) {
                videoRef.current.srcObject.getTracks().forEach(track => track.stop());
            }
            if (handLandmarker) {
                handLandmarker.close();
            }
            if (animationFrameId) {
                cancelAnimationFrame(animationFrameId);
            }
        };

Inside the useEffect hook, we'll include our key functions to ensure our app runs smoothly and without errors. By using useEffect, we make sure everything starts and stops at the right time, keeping our app fast and reliable.
Here's a summary with the complete code using our 'Demo' component: This component utilizes a webcam to detect if a hand is present, and alongside, it displays a canvas that shows all the hand landmarks. It's an integrated solution for hand tracking, seamlessly combining live video with graphical landmark representation.

Here full code of DEMO.js:

import React, { useEffect, useRef, useState } from "react";
import { FilesetResolver, HandLandmarker } from "@mediapipe/tasks-vision";
import hand_landmarker_task from "../models/hand_landmarker.task";

const Demo = () => {
    const videoRef = useRef(null);
    const canvasRef = useRef(null);
    const [handPresence, setHandPresence] = useState(null);

    useEffect(() => {
        let handLandmarker;
        let animationFrameId;

        const initializeHandDetection = async () => {
            try {
                const vision = await FilesetResolver.forVisionTasks(
                    "https://cdn.jsdelivr.net/npm/@mediapipe/tasks-vision@latest/wasm",
                );
                handLandmarker = await HandLandmarker.createFromOptions(
                    vision, {
                        baseOptions: { modelAssetPath: hand_landmarker_task },
                        numHands: 2,
                        runningMode: "video"
                    }
                );
                detectHands();
            } catch (error) {
                console.error("Error initializing hand detection:", error);
            }
        };

    const drawLandmarks = (landmarksArray) => {
    const canvas = canvasRef.current;
    const ctx = canvas.getContext('2d');
    ctx.clearRect(0, 0, canvas.width, canvas.height);
    ctx.fillStyle = 'white';

    landmarksArray.forEach(landmarks => {
        landmarks.forEach(landmark => {
            const x = landmark.x * canvas.width;
            const y = landmark.y * canvas.height;

            ctx.beginPath();
            ctx.arc(x, y, 5, 0, 2 * Math.PI); // Draw a circle for each landmark
            ctx.fill();
        });
    });
};

        const detectHands = () => {
            if (videoRef.current && videoRef.current.readyState >= 2) {
                const detections = handLandmarker.detectForVideo(videoRef.current, performance.now());
                setHandPresence(detections.handednesses.length > 0);

                // Assuming detections.landmarks is an array of landmark objects
                if (detections.landmarks) {
                    drawLandmarks(detections.landmarks);
                }
            }
            requestAnimationFrame(detectHands);
        };

        const startWebcam = async () => {
            try {
                const stream = await navigator.mediaDevices.getUserMedia({ video: true });
                videoRef.current.srcObject = stream;
                await initializeHandDetection();
            } catch (error) {
                console.error("Error accessing webcam:", error);
            }
        };

        startWebcam();

        return () => {
            if (videoRef.current && videoRef.current.srcObject) {
                videoRef.current.srcObject.getTracks().forEach(track => track.stop());
            }
            if (handLandmarker) {
                handLandmarker.close();
            }
            if (animationFrameId) {
                cancelAnimationFrame(animationFrameId);
            }
        };
    }, []);

    return (
        <>
        <h1>Is there a Hand? {handPresence ? "Yes" : "No"}</h1>
        <div style={{ position: "relative" }}>
            <video ref={videoRef} autoPlay playsInline ></video>
            <canvas ref={canvasRef} style={{ backgroundColor: "black" , width:"600px", height:"480px"}}></canvas>
        </div>
    </>
    );
};

export default Demo;

It will show something like this:

(not the purple screen but webcam will be there).

Thanks for reading :)

DEV Community: Kathan

Integrating @mediapipe/tasks-vision for Hand Landmark Detection in React