You know that moment when you're demoing gesture controls and the client asks "So we need to install what exactly?" I got tired of explaining native apps and Python dependencies, so I built a computer vision platform that works right in the browser—no installs required.
The Problem
Building gesture-controlled interfaces usually means:
- Installing Python, OpenCV, and a dozen dependencies
- Asking users to trust a native app download
- Wrestling with webcam permissions across different OSes
- Spinning up servers just to process hand movements
- Watching latency kill the "real-time" experience
The Solution: VisionPipe
What if hand tracking just worked in a browser tab?
// Detect pinch gesture
const distance = Math.sqrt(
Math.pow(thumbTip.x - indexTip.x, 2) +
Math.pow(thumbTip.y - indexTip.y, 2)
);
const angle = Math.min(distance * 500, 90);
That's real code mapping a pinch gesture to control animations. MediaPipe handles detection, Three.js renders the visuals.
How It Works
- Browser loads MediaPipe - No install, just a CDN import
- Webcam feeds landmarks - 21 hand points tracked in real-time
- JavaScript maps gestures - Translate finger positions into actions
- Three.js renders output - Smooth 3D visualization at 60fps
The entire pipeline runs client-side. Your webcam feed never leaves your device.
Get Started in 30 Seconds
Browser (zero install):
Visit the Playground → Select effect → Grant camera → Done
Cloud API:
curl -X POST https://visionpipe3d.quochuy.dev/api/detect \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "image=@hand.jpg"
New accounts get 100 free API credits—no card required.
Features
| Feature | Browser SDK | Cloud API |
|---|---|---|
| Hand Landmark Detection | Yes | Yes |
| Real-time Tracking | Yes | Batch only |
| Offline Support | Yes (after load) | No |
| Processing Location | Client-side | Server-side |
| Latency | ~16ms | Network dependent |
| Privacy | Data stays local | Processed & discarded |
Pricing (Cloud API)
| Plan | Credits | Cost | Per Credit |
|---|---|---|---|
| Free | 100 | $0 | - |
| Starter | 1,000 | $10 | $0.010 |
| Growth | 5,000 | $40 | $0.008 |
| Scale | 20,000 | $150 | $0.0075 |
1 credit = 1 API call. Credits never expire. The Playground is always free.
Why This Works
- Zero friction - Open a URL, grant camera, start detecting. No Python, no installs, no "works on my machine"
- Privacy by default - Browser SDK processes locally. Cloud API discards images immediately after processing
- Production ready - TLS 1.3 encryption, SHA-256 key hashing, GDPR compliant
- Works everywhere - Chrome, Firefox, Safari 14+, Edge. Any device with a webcam
Try It
Head to the Playground and wave your hand. That's the entire onboarding.
For the Cloud API:
# Get your API key from the console
curl -X POST https://visionpipe3d.quochuy.dev/api/detect \
-H "Authorization: Bearer YOUR_API_KEY" \
-F "image=@sample.jpg"
The 100 free credits are enough to build a proof-of-concept.
GitHub: quochuydev/visionpipe3d
Website: visionpipe3d.quochuy.dev
What gesture-based interfaces have you built or wanted to build? I'm curious what use cases people are exploring beyond the typical "air guitar" demos.
Top comments (0)