Playground Git repo
Google product page
Why Hand & Gesture
When I just learned about this project's existence, I knew I had to try it out. If you are a kid of the 80s/90s like me and still haven't received a functioning Power Glove for which your parents paid ~35 years ago... you might be thinking... that's about time!
Were there other attempts over time? Sure! Quite a lot, especially since modern VR headsets are on the market. Even though it had far superior successors, something was still lacking. The promise of a unified controller and seamless interaction using just the human hand had never been fulfilled.
Why Nextjs viewTransitions
When I think of UIs that could benefit from hand tracking solutions, what first comes to my mind is a multi-layered control panel where the user who interacts with it should not experience typical HTTP loading screens between pages, but rather be served with a consistent and continuous environment that can grow and grow while keeping reasonable efficiency at low resource consumption.
What is the status
Well... I should probably end this simple test right after I implemented MediaPipe to a Next.js example project from GitHub and added camera controls. But then I started to think about how it will work with this or that kind of slider.
How can I improve my own ease of interaction? How can I scale it to work with every machine I run it on?
This is why it is what it is now: a playground that can be easily adjusted by tilting tracking confidence or gesture threshold and also calibrated for every screen and resolution.
The project currently also contains some components that can be reused, like an image slider, horizontal page slider, articles to scroll through, music player with volume rocker...
What do i think
I think it is worth giving it a shot. It most probably will not be something that will resonate with everyone, but if you are willing to spend a bit of time to get used to it and also tweak the settings, then I believe it could be an enjoyable experience and also maybe the start of a new idea for a smart TV control app or a PC UI HUD, or maybe some gaming experience or lots and lots...
Whats the future like
My plan is to add more and more elements and finally build THE control room within a Clear Canvas component from which I will be able to interact with all created tools using two hands, including both preconfigured and custom-made gestures.
Tech list
- Next.js:
- React:
- MediaPipe:
- TypeScript:
- Tailwind CSS:
Building and Running
This is very simple. Just use following commands:
pnpm install
pnpm dev
Architecture Overview
The application follows a layered architecture with React Context providers managing state and MediaPipe handling computer vision processing:
┌─────────────────────────────────────────┐
│ Next.js App Router │
├─────────────────────────────────────────┤
│ Context Providers Layer │
│ ┌─────────────────────────────────────┐ │
│ │ MediaPipeOptionsProvider │ │
│ │ GestureOptionsProvider │ │
│ │ CalibrationProvider │ │
│ │ HandTrackingProvider │ │
│ └─────────────────────────────────────┘ │
├─────────────────────────────────────────┤
│ Hand Tracking Layer │
│ ┌─────────────────────────────────────┐ │
│ │ MediaPipe Hands API │ │
│ │ Camera Utils │ │
│ │ Gesture Recognition │ │
│ └─────────────────────────────────────┘ │
├─────────────────────────────────────────┤
│ UI Components │
│ ┌─────────────────────────────────────┐ │
│ │ GestureCursor │ │
│ │ HandTrackingOverlay │ │
│ │ MediaPipeControls │ │
│ └─────────────────────────────────────┘ │
└─────────────────────────────────────────┘
Routes Structure
The application uses Next.js App Router with the following route structure:
Main Routes
-
/
(Home Page): Landing page with navigation to all demo routes and MediaPipe controls -
/blog
: Floating cards demo with ViewTransition animations -
/card
: Interactive card gallery with image transitions -
/scroll-test
: Horizontal slider controlled by hand gestures -
/slider
: Play around with (add any "further.mp3" to public/sfx first) -
/clear-canvas
: Canvas drawing application with gesture controls -
/calibration
: Hand tracking calibration interface
Route Functionality
Each route demonstrates different aspects of gesture control:
-
Blog Route (
/blog
): Showcases ViewTransition API with floating card layouts -
Card Route (
/card
): Image gallery with smooth transitions between views -
Scroll Test (
/scroll-test
): Horizontal navigation using pinch-and-drag gestures -
Slider Route (
/slider
): Volume control using hand position mapping -
Clear Canvas (
/clear-canvas
): Drawing interface with gesture-based controls -
Calibration (
/calibration
): Setup interface for hand tracking accuracy
Context Providers & State Management
The application uses a hierarchical context provider system to manage different aspects of the hand tracking system:
1. MediaPipeOptionsProvider
Location: contexts/MediaPipeOptionsContext.tsx
Purpose: Manages MediaPipe hand detection configuration options
Key Features:
- Persistent storage via localStorage
- Real-time option updates
- Default configuration management
Configuration Options:
interface MediaPipeOptions {
runningMode: "IMAGE" | "VIDEO"; // Processing mode
maxNumHands: number; // Max hands to detect (1-4)
minHandDetectionConfidence: number; // Palm detection threshold (0-1)
minHandPresenceConfidence: number; // Hand presence threshold (0-1)
minTrackingConfidence: number; // Tracking confidence (0-1)
modelComplexity: 0 | 1; // Model accuracy vs speed
staticImageMode: boolean; // Static vs video processing
}
2. GestureOptionsProvider
Location: contexts/GestureOptionsContext.tsx
Purpose: Manages gesture detection algorithms and thresholds
Key Features:
- Multiple pinch detection modes
- Configurable sensitivity thresholds
- Algorithm switching capabilities
Detection Modes:
- Original: Simple bone landmark distance calculation
- Simple: Flesh-compensated distance with palm width normalization
- Advanced: Multi-factor detection with confidence scoring
3. CalibrationProvider
Location: contexts/CalibrationContext.tsx
Purpose: Handles coordinate transformation between camera space and screen space
Key Features:
- Screen resolution awareness
- Coordinate transformation algorithms
- Persistent calibration storage
- Automatic invalidation on resolution changes
Calibration Data Structure:
interface CalibrationData {
screenWidth: number; // Current screen width
screenHeight: number; // Current screen height
cameraAspectRatio: number; // Camera feed aspect ratio
scaleFactorX: number; // X-axis scaling factor
scaleFactorY: number; // Y-axis scaling factor
offsetX: number; // X-axis offset
offsetY: number; // Y-axis offset
}
4. HandTrackingProvider
Location: components/hand-tracking/HandTrackingProvider.tsx
Purpose: Core MediaPipe integration and hand detection processing
Key Features:
- Dynamic MediaPipe script loading
- Camera initialization and management
- Real-time hand landmark processing
- Error handling and recovery
Options-to-UI Integration
The system provides a comprehensive UI for configuring all hand tracking parameters:
MediaPipeControls Component
Location: components/MediaPipeControls.tsx
Integration Pattern:
- Context Consumption: Uses both MediaPipeOptions and GestureOptions contexts
- Real-time Updates: Changes immediately affect hand tracking behavior
- Persistent Storage: All settings automatically saved to localStorage
- Import/Export: JSON-based configuration sharing
UI Control Types:
- SliderControl: Numeric range inputs with visual feedback
- ToggleControl: Boolean switches with animated states
- SelectControl: Dropdown menus for enumerated options
Control Categories:
- MediaPipe Core Settings:
- Running mode selection
- Hand count limits
- Confidence thresholds
- Model complexity
- Gesture Detection Settings:
- Detection algorithm selection
- Sensitivity thresholds per algorithm
- Mode-specific parameter tuning
-
Configuration Management:
- Settings export/import
- Reset to defaults
- Real-time configuration preview
Calibration System
Why Calibration is Necessary
The calibration system addresses fundamental challenges in hand tracking applications:
Coordinate System Mismatch: MediaPipe returns normalized coordinates (0-1) from camera space, but UI interactions require pixel-perfect screen coordinates
Camera-Screen Alignment: The camera's field of view rarely matches the screen's aspect ratio or position perfectly
Perspective Distortion: Hand movements in 3D space need accurate mapping to 2D screen interactions
User Variability: Different users have varying hand sizes, camera positions, and interaction preferences
Calibration Structure
Location: app/calibration/
route
Process Flow:
User Input → Camera Capture → Coordinate Mapping → Validation → Storage
Calibration Steps:
- Screen Resolution Detection: Captures current viewport dimensions
- Camera Aspect Ratio Calculation: Determines camera feed proportions
- Reference Point Mapping: User touches screen corners/edges while system records hand positions
- Transform Matrix Generation: Calculates scale factors and offsets for coordinate conversion
- Validation Testing: User tests accuracy across screen regions
- Persistent Storage: Saves calibration data with resolution validation
Calibration Data Processing
Coordinate Transformation Algorithm:
const applyCalibration = (normalizedX: number, normalizedY: number) => {
if (!calibrationData) {
// Fallback: simple mapping
return {
x: (1 - normalizedX) * window.innerWidth,
y: normalizedY * window.innerHeight,
};
}
// Apply calibration transformation
const handX = 1 - normalizedX; // Mirror horizontally
const handY = normalizedY;
return {
x: calibrationData.scaleFactorX * handX + calibrationData.offsetX,
y: calibrationData.scaleFactorY * handY + calibrationData.offsetY,
};
};
Calibration Persistence:
- Stored in localStorage as
handTrackingCalibration
- Automatically invalidated when screen resolution changes
- Includes timestamp and validation checksums
Benefits of Calibration
- Accuracy: Precise cursor positioning across the entire screen
- Consistency: Reliable gesture recognition regardless of camera setup
- User Experience: Reduced frustration from misaligned interactions
- Adaptability: Works with various camera positions and screen sizes
- Performance: Optimized coordinate calculations reduce processing overhead
Gesture Recognition System
Detection Algorithms
The system implements three distinct gesture detection approaches:
1. Original Mode
- Method: Direct 3D distance calculation between fingertip landmarks
- Use Case: High precision applications requiring exact finger positioning
- Pros: Simple, fast, precise for ideal conditions
- Cons: Sensitive to hand size variations and camera distance
2. Simple Mode
- Method: Palm-width normalized distance with flesh compensation
- Use Case: General-purpose applications with varied users
- Pros: Accounts for hand size differences, more forgiving
- Cons: Less precise than original mode
3. Advanced Mode
- Method: Multi-factor confidence scoring with weighted detection
- Use Case: Professional applications requiring robust detection
-
Factors:
- Fingertip distance (40% weight)
- Joint distance (40% weight)
- Finger curl detection (20% weight)
- Pros: Most robust, handles edge cases, configurable
- Cons: Higher computational cost, more complex tuning
Top comments (0)