DEV Community

Cover image for AI Face, Body, and Hand Pose Detection with Python and MediaPipe
Dexter
Dexter

Posted on

AI Face, Body, and Hand Pose Detection with Python and MediaPipe

In this tutorial, we will learn how to use Python and MediaPipe to perform real-time face, body, and hand pose detection using a webcam feed. MediaPipe provides pre-trained machine learning models for various tasks like facial landmark detection, hand tracking, and full-body pose estimation.

Prerequisites

Before we begin, ensure you have the following installed:

Python (3.6 or above)
pip package manager

First, let's install the required dependencies:

!pip install mediapipe opencv-python
Enter fullscreen mode Exit fullscreen mode

Now, let's import the necessary libraries:

import mediapipe as mp
import cv2
Enter fullscreen mode Exit fullscreen mode

1. Get Realtime Webcam Feed

cap = cv2.VideoCapture(0)

while cap.isOpened():
    ret, frame = cap.read()
    cv2.imshow('Raw Webcam Feed', frame)

    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()
Enter fullscreen mode Exit fullscreen mode

This code captures video from your webcam and displays it in real-time. Press 'q' to quit the window.

2. Make Detections from Feed

Detect Facial Landmarks

cap = cv2.VideoCapture(0)

with mp.solutions.face_detection.FaceDetection(min_detection_confidence=0.5) as face_detection:

    while cap.isOpened():
        ret, frame = cap.read()
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        results = face_detection.process(image)

        if results.detections:
            for detection in results.detections:
                mp_drawing.draw_detection(frame, detection)

        cv2.imshow('Raw Webcam Feed', frame)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()
Enter fullscreen mode Exit fullscreen mode

This code detects faces in the webcam feed and draws bounding boxes around them.

Detect Hand Poses and Body Poses

cap = cv2.VideoCapture(0)

with mp.solutions.holistic.Holistic(min_detection_confidence=0.5, min_tracking_confidence=0.5) as holistic:

    while cap.isOpened():
        ret, frame = cap.read()
        image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
        results = holistic.process(image)

        if results.face_landmarks:
            mp_drawing.draw_landmarks(frame, results.face_landmarks, mp_holistic.FACE_CONNECTIONS)

        if results.right_hand_landmarks:
            mp_drawing.draw_landmarks(frame, results.right_hand_landmarks, mp_holistic.HAND_CONNECTIONS)

        if results.left_hand_landmarks:
            mp_drawing.draw_landmarks(frame, results.left_hand_landmarks, mp_holistic.HAND_CONNECTIONS)

        if results.pose_landmarks:
            mp_drawing.draw_landmarks(frame, results.pose_landmarks, mp_holistic.POSE_CONNECTIONS)

        cv2.imshow('Raw Webcam Feed', frame)

        if cv2.waitKey(10) & 0xFF == ord('q'):
            break

cap.release()
cv2.destroyAllWindows()
Enter fullscreen mode Exit fullscreen mode

This code detects hand poses and body poses in the webcam feed and draws landmarks accordingly.

Conclusion

Congratulations! You've learned how to perform real-time face, body, and hand pose detection using Python and MediaPipe. You can further explore these concepts and integrate them into your own projects for various applications like gesture recognition, augmented reality, and more. Feel free to experiment and enhance this tutorial to suit your specific needs. Happy coding!

Top comments (0)