twistedtransistor

Posted on Aug 14

Google's Hand & Gesture Recognition + my playground

#gesturerecognition #google #playground #ui

Playground Git repo

Why Hand & Gesture

When I just learned about this project's existence, I knew I had to try it out. If you are a kid of the 80s/90s like me and still haven't received a functioning Power Glove for which your parents paid ~35 years ago... you might be thinking... that's about time!

Were there other attempts over time? Sure! Quite a lot, especially since modern VR headsets are on the market. Even though it had far superior successors, something was still lacking. The promise of a unified controller and seamless interaction using just the human hand had never been fulfilled.

Why Nextjs viewTransitions

When I think of UIs that could benefit from hand tracking solutions, what first comes to my mind is a multi-layered control panel where the user who interacts with it should not experience typical HTTP loading screens between pages, but rather be served with a consistent and continuous environment that can grow and grow while keeping reasonable efficiency at low resource consumption.

What is the status

Well... I should probably end this simple test right after I implemented MediaPipe to a Next.js example project from GitHub and added camera controls. But then I started to think about how it will work with this or that kind of slider.
How can I improve my own ease of interaction? How can I scale it to work with every machine I run it on?

This is why it is what it is now: a playground that can be easily adjusted by tilting tracking confidence or gesture threshold and also calibrated for every screen and resolution.

The project currently also contains some components that can be reused, like an image slider, horizontal page slider, articles to scroll through, music player with volume rocker...

What do i think

I think it is worth giving it a shot. It most probably will not be something that will resonate with everyone, but if you are willing to spend a bit of time to get used to it and also tweak the settings, then I believe it could be an enjoyable experience and also maybe the start of a new idea for a smart TV control app or a PC UI HUD, or maybe some gaming experience or lots and lots...

Whats the future like

My plan is to add more and more elements and finally build THE control room within a Clear Canvas component from which I will be able to interact with all created tools using two hands, including both preconfigured and custom-made gestures.

Tech list

Next.js:
React:
MediaPipe:
TypeScript:
Tailwind CSS:

Building and Running

This is very simple. Just use following commands:

pnpm install
pnpm dev

Architecture Overview

The application follows a layered architecture with React Context providers managing state and MediaPipe handling computer vision processing:

┌─────────────────────────────────────────┐
│           Next.js App Router            │
├─────────────────────────────────────────┤
│        Context Providers Layer         │
│  ┌─────────────────────────────────────┐ │
│  │ MediaPipeOptionsProvider            │ │
│  │ GestureOptionsProvider              │ │
│  │ CalibrationProvider                 │ │
│  │ HandTrackingProvider                │ │
│  └─────────────────────────────────────┘ │
├─────────────────────────────────────────┤
│         Hand Tracking Layer            │
│  ┌─────────────────────────────────────┐ │
│  │ MediaPipe Hands API                 │ │
│  │ Camera Utils                        │ │
│  │ Gesture Recognition                 │ │
│  └─────────────────────────────────────┘ │
├─────────────────────────────────────────┤
│            UI Components               │
│  ┌─────────────────────────────────────┐ │
│  │ GestureCursor                       │ │
│  │ HandTrackingOverlay                 │ │
│  │ MediaPipeControls                   │ │
│  └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘

Routes Structure

The application uses Next.js App Router with the following route structure:

Main Routes

/ (Home Page): Landing page with navigation to all demo routes and MediaPipe controls
/blog: Floating cards demo with ViewTransition animations
/card: Interactive card gallery with image transitions
/scroll-test: Horizontal slider controlled by hand gestures
/slider: Play around with (add any "further.mp3" to public/sfx first)
/clear-canvas: Canvas drawing application with gesture controls
/calibration: Hand tracking calibration interface

Route Functionality

Each route demonstrates different aspects of gesture control:

Blog Route (/blog): Showcases ViewTransition API with floating card layouts
Card Route (/card): Image gallery with smooth transitions between views
Scroll Test (/scroll-test): Horizontal navigation using pinch-and-drag gestures
Slider Route (/slider): Volume control using hand position mapping
Clear Canvas (/clear-canvas): Drawing interface with gesture-based controls
Calibration (/calibration): Setup interface for hand tracking accuracy

Context Providers & State Management

The application uses a hierarchical context provider system to manage different aspects of the hand tracking system:

1. MediaPipeOptionsProvider

Location: contexts/MediaPipeOptionsContext.tsx

Purpose: Manages MediaPipe hand detection configuration options

Key Features:

Persistent storage via localStorage
Real-time option updates
Default configuration management

Configuration Options:

interface MediaPipeOptions {
  runningMode: "IMAGE" | "VIDEO"; // Processing mode
  maxNumHands: number; // Max hands to detect (1-4)
  minHandDetectionConfidence: number; // Palm detection threshold (0-1)
  minHandPresenceConfidence: number; // Hand presence threshold (0-1)
  minTrackingConfidence: number; // Tracking confidence (0-1)
  modelComplexity: 0 | 1; // Model accuracy vs speed
  staticImageMode: boolean; // Static vs video processing
}

2. GestureOptionsProvider

Location: contexts/GestureOptionsContext.tsx

Purpose: Manages gesture detection algorithms and thresholds

Key Features:

Multiple pinch detection modes
Configurable sensitivity thresholds
Algorithm switching capabilities

Detection Modes:

Original: Simple bone landmark distance calculation
Simple: Flesh-compensated distance with palm width normalization
Advanced: Multi-factor detection with confidence scoring

3. CalibrationProvider

Location: contexts/CalibrationContext.tsx

Purpose: Handles coordinate transformation between camera space and screen space

Key Features:

Screen resolution awareness
Coordinate transformation algorithms
Persistent calibration storage
Automatic invalidation on resolution changes

Calibration Data Structure:

interface CalibrationData {
  screenWidth: number; // Current screen width
  screenHeight: number; // Current screen height
  cameraAspectRatio: number; // Camera feed aspect ratio
  scaleFactorX: number; // X-axis scaling factor
  scaleFactorY: number; // Y-axis scaling factor
  offsetX: number; // X-axis offset
  offsetY: number; // Y-axis offset
}

4. HandTrackingProvider

Location: components/hand-tracking/HandTrackingProvider.tsx

Purpose: Core MediaPipe integration and hand detection processing

Key Features:

Dynamic MediaPipe script loading
Camera initialization and management
Real-time hand landmark processing
Error handling and recovery

Options-to-UI Integration

The system provides a comprehensive UI for configuring all hand tracking parameters:

MediaPipeControls Component

Location: components/MediaPipeControls.tsx

Integration Pattern:

Context Consumption: Uses both MediaPipeOptions and GestureOptions contexts
Real-time Updates: Changes immediately affect hand tracking behavior
Persistent Storage: All settings automatically saved to localStorage
Import/Export: JSON-based configuration sharing

UI Control Types:

SliderControl: Numeric range inputs with visual feedback
ToggleControl: Boolean switches with animated states
SelectControl: Dropdown menus for enumerated options

Control Categories:

MediaPipe Core Settings:

Running mode selection
Hand count limits
Confidence thresholds
Model complexity

Gesture Detection Settings:

Detection algorithm selection
Sensitivity thresholds per algorithm
Mode-specific parameter tuning

Configuration Management:
- Settings export/import
- Reset to defaults
- Real-time configuration preview

Calibration System

Why Calibration is Necessary

The calibration system addresses fundamental challenges in hand tracking applications:

Coordinate System Mismatch: MediaPipe returns normalized coordinates (0-1) from camera space, but UI interactions require pixel-perfect screen coordinates
Camera-Screen Alignment: The camera's field of view rarely matches the screen's aspect ratio or position perfectly
Perspective Distortion: Hand movements in 3D space need accurate mapping to 2D screen interactions
User Variability: Different users have varying hand sizes, camera positions, and interaction preferences

Calibration Structure

Location: app/calibration/ route

Process Flow:

User Input → Camera Capture → Coordinate Mapping → Validation → Storage

Calibration Steps:

Screen Resolution Detection: Captures current viewport dimensions
Camera Aspect Ratio Calculation: Determines camera feed proportions
Reference Point Mapping: User touches screen corners/edges while system records hand positions
Transform Matrix Generation: Calculates scale factors and offsets for coordinate conversion
Validation Testing: User tests accuracy across screen regions
Persistent Storage: Saves calibration data with resolution validation

Calibration Data Processing

Coordinate Transformation Algorithm:

const applyCalibration = (normalizedX: number, normalizedY: number) => {
  if (!calibrationData) {
    // Fallback: simple mapping
    return {
      x: (1 - normalizedX) * window.innerWidth,
      y: normalizedY * window.innerHeight,
    };
  }

  // Apply calibration transformation
  const handX = 1 - normalizedX; // Mirror horizontally
  const handY = normalizedY;

  return {
    x: calibrationData.scaleFactorX * handX + calibrationData.offsetX,
    y: calibrationData.scaleFactorY * handY + calibrationData.offsetY,
  };
};

Calibration Persistence:

Stored in localStorage as handTrackingCalibration
Automatically invalidated when screen resolution changes
Includes timestamp and validation checksums

Benefits of Calibration

Accuracy: Precise cursor positioning across the entire screen
Consistency: Reliable gesture recognition regardless of camera setup
User Experience: Reduced frustration from misaligned interactions
Adaptability: Works with various camera positions and screen sizes
Performance: Optimized coordinate calculations reduce processing overhead

Gesture Recognition System

Detection Algorithms

The system implements three distinct gesture detection approaches:

1. Original Mode

Method: Direct 3D distance calculation between fingertip landmarks
Use Case: High precision applications requiring exact finger positioning
Pros: Simple, fast, precise for ideal conditions
Cons: Sensitive to hand size variations and camera distance

2. Simple Mode

Method: Palm-width normalized distance with flesh compensation
Use Case: General-purpose applications with varied users
Pros: Accounts for hand size differences, more forgiving
Cons: Less precise than original mode

3. Advanced Mode

Method: Multi-factor confidence scoring with weighted detection
Use Case: Professional applications requiring robust detection
Factors:
- Fingertip distance (40% weight)
- Joint distance (40% weight)
- Finger curl detection (20% weight)
Pros: Most robust, handles edge cases, configurable
Cons: Higher computational cost, more complex tuning

DEV Community

Google's Hand & Gesture Recognition + my playground

Why Hand & Gesture

Why Nextjs viewTransitions

What is the status

What do i think

Whats the future like

Tech list

Building and Running

Architecture Overview

Routes Structure

Main Routes

Route Functionality

Context Providers & State Management

1. MediaPipeOptionsProvider

2. GestureOptionsProvider

3. CalibrationProvider

4. HandTrackingProvider

Options-to-UI Integration

MediaPipeControls Component

Calibration System

Why Calibration is Necessary

Calibration Structure

Calibration Data Processing

Benefits of Calibration

Gesture Recognition System

Detection Algorithms

1. Original Mode

2. Simple Mode

3. Advanced Mode

Top comments (0)