DEV Community

Cover image for Google's Hand & Gesture Recognition + my playground
twistedtransistor
twistedtransistor

Posted on

Google's Hand & Gesture Recognition + my playground

Playground Git repo

Google product page

Why Hand & Gesture

live view

When I just learned about this project's existence, I knew I had to try it out. If you are a kid of the 80s/90s like me and still haven't received a functioning Power Glove for which your parents paid ~35 years ago... you might be thinking... that's about time!

Were there other attempts over time? Sure! Quite a lot, especially since modern VR headsets are on the market. Even though it had far superior successors, something was still lacking. The promise of a unified controller and seamless interaction using just the human hand had never been fulfilled.

Why Nextjs viewTransitions

When I think of UIs that could benefit from hand tracking solutions, what first comes to my mind is a multi-layered control panel where the user who interacts with it should not experience typical HTTP loading screens between pages, but rather be served with a consistent and continuous environment that can grow and grow while keeping reasonable efficiency at low resource consumption.

What is the status

Well... I should probably end this simple test right after I implemented MediaPipe to a Next.js example project from GitHub and added camera controls. But then I started to think about how it will work with this or that kind of slider.
How can I improve my own ease of interaction? How can I scale it to work with every machine I run it on?

This is why it is what it is now: a playground that can be easily adjusted by tilting tracking confidence or gesture threshold and also calibrated for every screen and resolution.

The project currently also contains some components that can be reused, like an image slider, horizontal page slider, articles to scroll through, music player with volume rocker...

What do i think

I think it is worth giving it a shot. It most probably will not be something that will resonate with everyone, but if you are willing to spend a bit of time to get used to it and also tweak the settings, then I believe it could be an enjoyable experience and also maybe the start of a new idea for a smart TV control app or a PC UI HUD, or maybe some gaming experience or lots and lots...

Whats the future like

My plan is to add more and more elements and finally build THE control room within a Clear Canvas component from which I will be able to interact with all created tools using two hands, including both preconfigured and custom-made gestures.

Tech list

  • Next.js:
  • React:
  • MediaPipe:
  • TypeScript:
  • Tailwind CSS:

Building and Running

This is very simple. Just use following commands:

pnpm install
pnpm dev
Enter fullscreen mode Exit fullscreen mode

Architecture Overview

The application follows a layered architecture with React Context providers managing state and MediaPipe handling computer vision processing:

┌─────────────────────────────────────────┐
│           Next.js App Router            │
├─────────────────────────────────────────┤
│        Context Providers Layer         │
│  ┌─────────────────────────────────────┐ │
│  │ MediaPipeOptionsProvider            │ │
│  │ GestureOptionsProvider              │ │
│  │ CalibrationProvider                 │ │
│  │ HandTrackingProvider                │ │
│  └─────────────────────────────────────┘ │
├─────────────────────────────────────────┤
│         Hand Tracking Layer            │
│  ┌─────────────────────────────────────┐ │
│  │ MediaPipe Hands API                 │ │
│  │ Camera Utils                        │ │
│  │ Gesture Recognition                 │ │
│  └─────────────────────────────────────┘ │
├─────────────────────────────────────────┤
│            UI Components               │
│  ┌─────────────────────────────────────┐ │
│  │ GestureCursor                       │ │
│  │ HandTrackingOverlay                 │ │
│  │ MediaPipeControls                   │ │
│  └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Routes Structure

The application uses Next.js App Router with the following route structure:

Main Routes

  • / (Home Page): Landing page with navigation to all demo routes and MediaPipe controls
  • /blog: Floating cards demo with ViewTransition animations
  • /card: Interactive card gallery with image transitions
  • /scroll-test: Horizontal slider controlled by hand gestures
  • /slider: Play around with (add any "further.mp3" to public/sfx first)
  • /clear-canvas: Canvas drawing application with gesture controls
  • /calibration: Hand tracking calibration interface

Route Functionality

Each route demonstrates different aspects of gesture control:

  1. Blog Route (/blog): Showcases ViewTransition API with floating card layouts
  2. Card Route (/card): Image gallery with smooth transitions between views
  3. Scroll Test (/scroll-test): Horizontal navigation using pinch-and-drag gestures
  4. Slider Route (/slider): Volume control using hand position mapping
  5. Clear Canvas (/clear-canvas): Drawing interface with gesture-based controls
  6. Calibration (/calibration): Setup interface for hand tracking accuracy

Context Providers & State Management

The application uses a hierarchical context provider system to manage different aspects of the hand tracking system:

1. MediaPipeOptionsProvider

Location: contexts/MediaPipeOptionsContext.tsx

Purpose: Manages MediaPipe hand detection configuration options

Key Features:

  • Persistent storage via localStorage
  • Real-time option updates
  • Default configuration management

Configuration Options:

interface MediaPipeOptions {
  runningMode: "IMAGE" | "VIDEO"; // Processing mode
  maxNumHands: number; // Max hands to detect (1-4)
  minHandDetectionConfidence: number; // Palm detection threshold (0-1)
  minHandPresenceConfidence: number; // Hand presence threshold (0-1)
  minTrackingConfidence: number; // Tracking confidence (0-1)
  modelComplexity: 0 | 1; // Model accuracy vs speed
  staticImageMode: boolean; // Static vs video processing
}
Enter fullscreen mode Exit fullscreen mode

2. GestureOptionsProvider

Location: contexts/GestureOptionsContext.tsx

Purpose: Manages gesture detection algorithms and thresholds

Key Features:

  • Multiple pinch detection modes
  • Configurable sensitivity thresholds
  • Algorithm switching capabilities

Detection Modes:

  • Original: Simple bone landmark distance calculation
  • Simple: Flesh-compensated distance with palm width normalization
  • Advanced: Multi-factor detection with confidence scoring

3. CalibrationProvider

Location: contexts/CalibrationContext.tsx

Purpose: Handles coordinate transformation between camera space and screen space

Key Features:

  • Screen resolution awareness
  • Coordinate transformation algorithms
  • Persistent calibration storage
  • Automatic invalidation on resolution changes

Calibration Data Structure:

interface CalibrationData {
  screenWidth: number; // Current screen width
  screenHeight: number; // Current screen height
  cameraAspectRatio: number; // Camera feed aspect ratio
  scaleFactorX: number; // X-axis scaling factor
  scaleFactorY: number; // Y-axis scaling factor
  offsetX: number; // X-axis offset
  offsetY: number; // Y-axis offset
}
Enter fullscreen mode Exit fullscreen mode

4. HandTrackingProvider

Location: components/hand-tracking/HandTrackingProvider.tsx

Purpose: Core MediaPipe integration and hand detection processing

Key Features:

  • Dynamic MediaPipe script loading
  • Camera initialization and management
  • Real-time hand landmark processing
  • Error handling and recovery

Options-to-UI Integration

The system provides a comprehensive UI for configuring all hand tracking parameters:

MediaPipeControls Component

Location: components/MediaPipeControls.tsx

Integration Pattern:

  1. Context Consumption: Uses both MediaPipeOptions and GestureOptions contexts
  2. Real-time Updates: Changes immediately affect hand tracking behavior
  3. Persistent Storage: All settings automatically saved to localStorage
  4. Import/Export: JSON-based configuration sharing

UI Control Types:

  • SliderControl: Numeric range inputs with visual feedback
  • ToggleControl: Boolean switches with animated states
  • SelectControl: Dropdown menus for enumerated options

Control Categories:

  1. MediaPipe Core Settings:
  • Running mode selection
  • Hand count limits
  • Confidence thresholds
  • Model complexity
  1. Gesture Detection Settings:
  • Detection algorithm selection
  • Sensitivity thresholds per algorithm
  • Mode-specific parameter tuning
  1. Configuration Management:
    • Settings export/import
    • Reset to defaults
    • Real-time configuration preview

Calibration System

Why Calibration is Necessary

The calibration system addresses fundamental challenges in hand tracking applications:

  1. Coordinate System Mismatch: MediaPipe returns normalized coordinates (0-1) from camera space, but UI interactions require pixel-perfect screen coordinates

  2. Camera-Screen Alignment: The camera's field of view rarely matches the screen's aspect ratio or position perfectly

  3. Perspective Distortion: Hand movements in 3D space need accurate mapping to 2D screen interactions

  4. User Variability: Different users have varying hand sizes, camera positions, and interaction preferences

Calibration Structure

Location: app/calibration/ route

Process Flow:

User Input → Camera Capture → Coordinate Mapping → Validation → Storage
Enter fullscreen mode Exit fullscreen mode

Calibration Steps:

  1. Screen Resolution Detection: Captures current viewport dimensions
  2. Camera Aspect Ratio Calculation: Determines camera feed proportions
  3. Reference Point Mapping: User touches screen corners/edges while system records hand positions
  4. Transform Matrix Generation: Calculates scale factors and offsets for coordinate conversion
  5. Validation Testing: User tests accuracy across screen regions
  6. Persistent Storage: Saves calibration data with resolution validation

Calibration Data Processing

Coordinate Transformation Algorithm:

const applyCalibration = (normalizedX: number, normalizedY: number) => {
  if (!calibrationData) {
    // Fallback: simple mapping
    return {
      x: (1 - normalizedX) * window.innerWidth,
      y: normalizedY * window.innerHeight,
    };
  }

  // Apply calibration transformation
  const handX = 1 - normalizedX; // Mirror horizontally
  const handY = normalizedY;

  return {
    x: calibrationData.scaleFactorX * handX + calibrationData.offsetX,
    y: calibrationData.scaleFactorY * handY + calibrationData.offsetY,
  };
};
Enter fullscreen mode Exit fullscreen mode

Calibration Persistence:

  • Stored in localStorage as handTrackingCalibration
  • Automatically invalidated when screen resolution changes
  • Includes timestamp and validation checksums

Benefits of Calibration

  1. Accuracy: Precise cursor positioning across the entire screen
  2. Consistency: Reliable gesture recognition regardless of camera setup
  3. User Experience: Reduced frustration from misaligned interactions
  4. Adaptability: Works with various camera positions and screen sizes
  5. Performance: Optimized coordinate calculations reduce processing overhead

Gesture Recognition System

Detection Algorithms

The system implements three distinct gesture detection approaches:

1. Original Mode

  • Method: Direct 3D distance calculation between fingertip landmarks
  • Use Case: High precision applications requiring exact finger positioning
  • Pros: Simple, fast, precise for ideal conditions
  • Cons: Sensitive to hand size variations and camera distance

2. Simple Mode

  • Method: Palm-width normalized distance with flesh compensation
  • Use Case: General-purpose applications with varied users
  • Pros: Accounts for hand size differences, more forgiving
  • Cons: Less precise than original mode

3. Advanced Mode

  • Method: Multi-factor confidence scoring with weighted detection
  • Use Case: Professional applications requiring robust detection
  • Factors:
    • Fingertip distance (40% weight)
    • Joint distance (40% weight)
    • Finger curl detection (20% weight)
  • Pros: Most robust, handles edge cases, configurable
  • Cons: Higher computational cost, more complex tuning

Top comments (0)