Domux: Achieving Sub-150ms Intent Parsing for Edge AI Agents

#ai #smarthome #llm #edgeai

Domux: Achieving Sub-150ms Intent Parsing for Edge AI Agents

As GitHub Trending reflects the shift from "general chat" to "vertical execution" (RPA, video editing, etc.), the critical bottleneck for real-time Agents is no longer just reasoning—it's perception and intent parsing.

Today, we're releasing Domux, an experimental lightweight model designed specifically for low-latency command understanding in smart home scenarios.

Why Domux?

Traditional NLU pipelines often involve complex microservices, leading to high latency. For real-time interactions (like voice control or instant automation), we need something faster and lighter.

Domux is built on Gemma-4-E2B-it and fine-tuned using SFT (Supervised Fine-Tuning) and GRPO (Group Relative Policy Optimization). The goal? Keep end-to-end response under 150ms.

Key Features

1. Extreme Low Latency

Optimized for edge devices and servers. It delivers structured data almost instantly, enabling responsive user experiences.

2. High Accuracy & Compliance

98.37% Result Accuracy
100% Format Compliance: Outputs a fixed 7-field pipe-delimited schema (action|device|attribute|value|unit|room|floor), ensuring downstream systems can always parse the result.

3. Semantic Generalization

Unlike rigid keyword-matchers, Domux handles:

Arbitrary Device Names: No fixed whitelist needed. It understands "Desk Lamp", "Strip Light", or even "Majlis Light" through semantic context.
Fuzzy Commands: Maps vague instructions like "make it brighter" to adjustUp with an empty value field, letting downstream logic handle the magnitude.

Technical Deep Dive

Training Strategy

We combined SFT for baseline understanding with GRPO for reinforcement learning. Custom reward functions were designed to penalize format errors and latency, pushing the model to be both accurate and fast.

Supported Capabilities

Domux currently supports:

Devices: Lights, AC, Curtains/Blinds, Scene Modes.
Actions: Turn on/off, Set values, Adjust up/down, Activate/Deactivate scenes, Pause.
Context: Room and Floor awareness (e.g., "Bedroom 1", "Upstairs").

An Open Experiment

This is an early-stage exploration (v0.1.0). We are sharing the code, reward plugins, and datasets to invite the community to test the limits of semantic parsing under aggressive latency budgets.

If you are building smart home agents, voice assistants, or any edge-based control system, Domux might be the lightweight component you need.

👉 Check out the repo: https://github.com/iflytek/domux