DEV Community

Cover image for Beyond the Mouse: A Real-Time Gesture-Controlled AI System (Lì Ào Engine)
Sadekul Islam
Sadekul Islam

Posted on

Beyond the Mouse: A Real-Time Gesture-Controlled AI System (Lì Ào Engine)

 At 3 AM in a freezing dorm room in China, staring at a terminal full of Python logs—I asked myself a simple question:

What if the air around me was the interface?

That question led me to build Lì Ào Engine (利奥) — a multimodal Human-Computer Interaction (HCI) system that transforms gestures, voice, and intent into real-time digital actions.

Philosophy: From Constraint to Freedom

Traditional input devices—mouse and keyboard—are powerful, but limiting. They confine interaction to surfaces.

I wanted to break that boundary.

Lì Ào Engine is built on a simple idea:

Human intention should be the interface.

System Architecture (Clean & Scalable)

This is not a prototype-level project. I designed it with modularity and scalability in mind.

Core Structure


-src/system → Core rendering layers (AI, Board, Layers, Media, Remote)

-src/features → Independent modules (ai, draw, galasy, gesture, image, move, prediction, remote, upload)

-src/hooks → Custom hooks for performance-critical logic
     -useImageLoader
     -useMediaPipe
     -usePredictor
     -useRealTimeSync
     -useTracking
     -useVoiceCommand  

- `server/engine` (Python) → Vision processing + AI inference (WIP)
Enter fullscreen mode Exit fullscreen mode

Design Principle

  • Separation of concerns (UI vs AI processing)
  • Real-time responsiveness
  • Low-latency data flow

The Core Innovation: Omni-Precise V5.2 Engine

Gesture systems suffer from jitter and instability.

I built a custom smoothing and prediction pipeline:

1. Stable EMA Smoothing

  • Reduces noise from raw hand tracking data
  • Produces fluid, natural motion

2. Velocity-Based Stroke Dynamics

  • Fast movement → thin lines
  • Slow movement → thicker strokes
  • Mimics real-world drawing behavior

3. No-Gap Interpolation

  • Fills missing points between frames
  • Ensures continuous stroke rendering
  1. No-Gap Interpolation **
  • Fills missing points between frames
  • Ensures continuous stroke rendering

Result: Near pixel-perfect air interaction

Key Features (Live Beta)

  • 🎨 Smart Air Drawing — Draw naturally in mid-air using hand gestures
  • 🎤 Voice Commands — Hands-free control and system interaction
  • 🗣️ Voice-to-Type — Real-time speech-to-text input
  • 🔗 Remote Portal — Gesture-based transfer of images/data between devices ### 🚧 In Active Development
  • 🖐️ Air Mouse — AI-powered cursor control using hand tracking (not yet publicly released)

Open Technical Questions (Looking for Expert Insight)

I’d love feedback from experienced engineers:

1. Real-Time Sync & Latency

  • Best practices for minimizing lag in gesture-driven UI?
  • WebSocket vs WebRTC for real-time interaction?

**

  1. Jitter Reduction Techniques **
  • EMA vs Kalman Filter in browser environments
  • Trade-offs between performance and accuracy?

**

  1. Scaling the System **
  • How would you evolve this into a production-grade system?
  • Edge computing vs centralized processing?

🌐 Live Demo

https://hci-system.netlify.app
https://github.com/sadekul-me/HCI-System
https://sadekulislam.netlify.app

💬 Feedback & Suggestions

I’d really appreciate your honest feedback on this system.

  • What do you think about the overall architecture?
  • How can I improve the real-time performance and scalability?
  • What features or improvements would you suggest next?

I’m actively developing this project and open to learning from experienced engineers.

✨ Final Thought

I’m still a student.
But I believe building systems like this is how we push boundaries.
We don’t wait for the future. We build it.
— Sadekul Islam (Lì Ào / 利奥)
Software Engineering Student | HCI Explorer

Top comments (1)

Collapse
 
sadekul-me profile image
Sadekul Islam

I'm specifically looking for advice on reducing jitter in browser-based hand tracking. If anyone has experience with Kalman filters vs EMA in React, I'd love to chat!