Sadekul Islam

Posted on Mar 28

Beyond the Mouse: A Real-Time Gesture-Controlled AI System (Lì Ào Engine)

#showdev #ai #react #python

At 3 AM in a freezing dorm room in China, staring at a terminal full of Python logs—I asked myself a simple question:

What if the air around me was the interface?

That question led me to build Lì Ào Engine (利奥) — a multimodal Human-Computer Interaction (HCI) system that transforms gestures, voice, and intent into real-time digital actions.

Philosophy: From Constraint to Freedom

Traditional input devices—mouse and keyboard—are powerful, but limiting. They confine interaction to surfaces.

I wanted to break that boundary.

Lì Ào Engine is built on a simple idea:

Human intention should be the interface.

System Architecture (Clean & Scalable)

This is not a prototype-level project. I designed it with modularity and scalability in mind.

Core Structure


-src/system → Core rendering layers (AI, Board, Layers, Media, Remote)

-src/features → Independent modules (ai, draw, galasy, gesture, image, move, prediction, remote, upload)

-src/hooks → Custom hooks for performance-critical logic
     -useImageLoader
     -useMediaPipe
     -usePredictor
     -useRealTimeSync
     -useTracking
     -useVoiceCommand  

- `server/engine` (Python) → Vision processing + AI inference (WIP)

Design Principle

Separation of concerns (UI vs AI processing)
Real-time responsiveness
Low-latency data flow

The Core Innovation: Omni-Precise V5.2 Engine

Gesture systems suffer from jitter and instability.

I built a custom smoothing and prediction pipeline:

1. Stable EMA Smoothing

Reduces noise from raw hand tracking data
Produces fluid, natural motion

2. Velocity-Based Stroke Dynamics

Fast movement → thin lines
Slow movement → thicker strokes
Mimics real-world drawing behavior

3. No-Gap Interpolation

Fills missing points between frames
Ensures continuous stroke rendering

Result: Near pixel-perfect air interaction

Key Features (Live Beta)

🎨 Smart Air Drawing — Draw naturally in mid-air using hand gestures
🎤 Voice Commands — Hands-free control and system interaction
🗣️ Voice-to-Type — Real-time speech-to-text input
🔗 Remote Portal — Gesture-based transfer of images/data between devices ### 🚧 In Active Development
🖐️ Air Mouse — AI-powered cursor control using hand tracking (not yet publicly released)

Open Technical Questions (Looking for Expert Insight)

I’d love feedback from experienced engineers:

1. Real-Time Sync & Latency

Best practices for minimizing lag in gesture-driven UI?
WebSocket vs WebRTC for real-time interaction?

2. Jitter Reduction Techniques

EMA vs Kalman Filter in browser environments
Trade-offs between performance and accuracy?

3. Scaling the System

How would you evolve this into a production-grade system?
Edge computing vs centralized processing?

🌐 Live Demo

https://hci-system.netlify.app
https://sadekulislam.netlify.app

💬 Feedback & Suggestions

I’d really appreciate your honest feedback on this system.

What do you think about the overall architecture?
How can I improve the real-time performance and scalability?
What features or improvements would you suggest next?

I’m actively developing this project and open to learning from experienced engineers.

✨ Final Thought

I’m still a student.
But I believe building systems like this is how we push boundaries.
We don’t wait for the future. We build it.
— Sadekul Islam (Lì Ào / 利奥)
Software Engineering Student | HCI Explorer

Top comments (1)

Sadekul Islam • Mar 28

I'm specifically looking for advice on reducing jitter in browser-based hand tracking. If anyone has experience with Kalman filters vs EMA in React, I'd love to chat!