At 3 AM in a freezing dorm room in China, staring at a terminal full of Python logs—I asked myself a simple question:
What if the air around me was the interface?
That question led me to build Lì Ào Engine (利奥) — a multimodal Human-Computer Interaction (HCI) system that transforms gestures, voice, and intent into real-time digital actions.
Philosophy: From Constraint to Freedom
Traditional input devices—mouse and keyboard—are powerful, but limiting. They confine interaction to surfaces.
I wanted to break that boundary.
Lì Ào Engine is built on a simple idea:
Human intention should be the interface.
System Architecture (Clean & Scalable)
This is not a prototype-level project. I designed it with modularity and scalability in mind.
Core Structure
-src/system → Core rendering layers (AI, Board, Layers, Media, Remote)
-src/features → Independent modules (ai, draw, galasy, gesture, image, move, prediction, remote, upload)
-src/hooks → Custom hooks for performance-critical logic
-useImageLoader
-useMediaPipe
-usePredictor
-useRealTimeSync
-useTracking
-useVoiceCommand
- `server/engine` (Python) → Vision processing + AI inference (WIP)
Design Principle
- Separation of concerns (UI vs AI processing)
- Real-time responsiveness
- Low-latency data flow
The Core Innovation: Omni-Precise V5.2 Engine
Gesture systems suffer from jitter and instability.
I built a custom smoothing and prediction pipeline:
1. Stable EMA Smoothing
- Reduces noise from raw hand tracking data
- Produces fluid, natural motion
2. Velocity-Based Stroke Dynamics
- Fast movement → thin lines
- Slow movement → thicker strokes
- Mimics real-world drawing behavior
3. No-Gap Interpolation
- Fills missing points between frames
- Ensures continuous stroke rendering
- No-Gap Interpolation **
- Fills missing points between frames
- Ensures continuous stroke rendering
Result: Near pixel-perfect air interaction
Key Features (Live Beta)
- 🎨 Smart Air Drawing — Draw naturally in mid-air using hand gestures
- 🎤 Voice Commands — Hands-free control and system interaction
- 🗣️ Voice-to-Type — Real-time speech-to-text input
- 🔗 Remote Portal — Gesture-based transfer of images/data between devices ### 🚧 In Active Development
- 🖐️ Air Mouse — AI-powered cursor control using hand tracking (not yet publicly released)
Open Technical Questions (Looking for Expert Insight)
I’d love feedback from experienced engineers:
1. Real-Time Sync & Latency
- Best practices for minimizing lag in gesture-driven UI?
- WebSocket vs WebRTC for real-time interaction?
**
- Jitter Reduction Techniques **
- EMA vs Kalman Filter in browser environments
- Trade-offs between performance and accuracy?
**
- Scaling the System **
- How would you evolve this into a production-grade system?
- Edge computing vs centralized processing?
🌐 Live Demo
https://hci-system.netlify.app
https://github.com/sadekul-me/HCI-System
https://sadekulislam.netlify.app
💬 Feedback & Suggestions
I’d really appreciate your honest feedback on this system.
- What do you think about the overall architecture?
- How can I improve the real-time performance and scalability?
- What features or improvements would you suggest next?
I’m actively developing this project and open to learning from experienced engineers.
✨ Final Thought
I’m still a student.
But I believe building systems like this is how we push boundaries.
We don’t wait for the future. We build it.
— Sadekul Islam (Lì Ào / 利奥)
Software Engineering Student | HCI Explorer
Top comments (1)
I'm specifically looking for advice on reducing jitter in browser-based hand tracking. If anyone has experience with Kalman filters vs EMA in React, I'd love to chat!