You know that moment when you're demoing gesture controls and the client asks "So we need to install what exactly?" I got tired of explaining native apps and Python dependencies, so I built a computer vision platform that works right in the browser—no installs required.
The Problem
Building gesture-controlled interfaces usually means:
- Installing Python, OpenCV, and a dozen dependencies
- Asking users to trust a native app download
- Wrestling with webcam permissions across different OSes
- Spinning up servers just to process hand movements
- Watching latency kill the "real-time" experience
The Solution: VisionPipe
What if hand tracking just worked in a browser tab?
// Detect pinch gesture - thumb tip (4) to index tip (8)
const distance = Math.sqrt(
Math.pow(thumbTip.x - indexTip.x, 2) +
Math.pow(thumbTip.y - indexTip.y, 2)
);
const angle = Math.min(distance * 500, 90);
That's real code from the branch-opening sample—mapping a pinch gesture to control 3D animations. MediaPipe handles detection, Three.js renders the visuals.
How It Works
- Browser loads MediaPipe - No install, just a CDN import
- Webcam feeds landmarks - 21 hand points tracked in real-time
- JavaScript maps gestures - Translate finger positions into actions
- Three.js renders output - Smooth 3D visualization at 60fps
The entire pipeline runs client-side. Your webcam feed never leaves your device.
Tech Stack
Built with Next.js 16, React 19, and TypeScript. The platform includes:
- Clerk for authentication
- Stripe for payments
- Drizzle ORM with PostgreSQL
- Fumadocs for documentation
- MediaPipe for hand detection
- Three.js for 3D visualization
Get Started in 30 Seconds
Browser (zero install):
Visit the Playground → Select effect → Grant camera → Done
The Playground is always free and unlimited.
Cloud API:
curl -X POST https://visionpipe3d.quochuy.dev/api/v1/sessions \
-H "Authorization: Bearer YOUR_API_KEY"
Returns a shareable playground URL with a JWT token valid for 24 hours.
Features
| Feature | Playground | Cloud API |
|---|---|---|
| Hand Landmark Detection | Yes | Yes |
| Real-time Tracking | Yes | Via sessions |
| Offline Support | Yes (after load) | No |
| Processing Location | Client-side | Server-side |
| Rate Limit | Unlimited | 100 req/min |
| Privacy | Data stays local | Processed & discarded |
Supported formats: JPEG, PNG, WebP, GIF (up to 10MB)
Browser support: Chrome 80+, Firefox 75+, Safari 14+, Edge 80+
Pricing (Cloud API)
| Plan | Credits | Cost | Per Credit |
|---|---|---|---|
| Free | 3 | $0 | - |
| Starter | 1,000 | $10 | $0.010 |
| Growth | 5,000 | $40 | $0.008 |
| Scale | 20,000 | $150 | $0.0075 |
1 credit = 1 API call. Credits never expire. No monthly subscriptions—pay only for what you use.
Enterprise? Custom volume pricing, private endpoints, 99.9% SLA, and dedicated support available.
Why This Works
- Zero friction - Open a URL, grant camera, start detecting. No Python, no installs, no "works on my machine"
- Privacy by default - Playground processes locally. Cloud API discards images immediately after processing
- Production ready - TLS 1.3 encryption, SHA-256 key hashing, GDPR compliant
- Works everywhere - Any modern browser with WebGL support. Any device with a webcam
Sample Projects
Branch Opening Demo
An interactive visualization where pinch gestures control branch animations. The sample shows how to:
- Initialize the HandTracker from the VisionPipe SDK
- Calculate distances between landmarks (thumb tip at position 4, index tip at position 8)
- Map physical movements to animation parameters
Check the samples documentation for the full walkthrough.
Try It
Head to the Playground and wave your hand. That's the entire onboarding.
For the Cloud API:
# 1. Sign in and get your API key from the Cloud Console
# 2. Create a session
curl -X POST https://visionpipe3d.quochuy.dev/api/v1/sessions \
-H "Authorization: Bearer vp_live_YOUR_KEY"
The response includes a playgroundUrl you can embed or share.
GitHub: quochuydev/visionpipe3d
Docs: visionpipe3d.quochuy.dev/docs
Playground: visionpipe3d.quochuy.dev/playground
What gesture-based interfaces have you built or wanted to build? I'm curious what use cases people are exploring beyond the typical "air guitar" demos.
Top comments (0)