DEV Community

Anil Saithana
Anil Saithana

Posted on

Why MQTT Last Will Testament Isn't Enough for Production IoT (And What We Built Instead)

Why MQTT Last Will Testament Isn't Enough for Production IoT (And What We Built Instead)

I spent 7 years building cloud backends — but when I tried connecting real hardware (ESP32s in my home), I hit a wall:

"My device shows 'connected' in AWS IoT Core... but hasn't reported data in 4 hours. Is it hung? Dead? Or just offline?"

Turns out: MQTT's Last Will Testament (LWT) lies to you.

The Lie: "Connected" ≠ Alive

LWT triggers only on TCP disconnect. But real devices fail silently:

  • WiFi drops but TCP socket stays open (NAT timeout = 5+ minutes)
  • Device freezes but doesn't reboot (watchdog failed)
  • Sensor loop crashes but MQTT client still "connected"

Result? Your dashboard shows "✅ Online" while the device hasn't sent data since yesterday.

AWS IoT Core showing connected device with stale data

Our Fix: Application-Level Heartbeats + Stateful ACKs

We built a lightweight Spring Boot backend (hear-beat) that treats telemetry as heartbeat pulses — not just data.

Device → [temp=28°C, ts=1708512000] → Backend
Backend → "ACK @ 1708512000" → Device
Enter fullscreen mode Exit fullscreen mode

1. Offline detection = missed heartbeat window

// DeviceRegistry.java
if (System.currentTimeMillis() - lastHeartbeat > OFFLINE_THRESHOLD) {
    markDeviceOffline(deviceId); // Not TCP disconnect — actual silence
}
Enter fullscreen mode Exit fullscreen mode

2. Command safety via ACK loop

// CommandService.java
sendCommand(deviceId, "REBOOT");
waitForAck(deviceId, timeout=30s); // Did it *execute*? Not just "received"
Enter fullscreen mode Exit fullscreen mode

3. REST control plane + MQTT data plane

  • Mobile apps talk REST (POST /devices/{id}/command)
  • Devices talk MQTT (iot/device/{id}/cmd)
  • Backend bridges both → clean separation

Why This Matters for Real Deployments

Scenario LWT Says Our Heartbeat Says
Device froze (no reboot) ✅ Connected ❌ Offline (no heartbeat in 90s)
WiFi dropped in rural field ✅ Connected (TCP alive) ❌ Offline (no data in 2 mins)
Command sent but device crashed mid-execution ✅ Command delivered ❌ No ACK → retry/fail-safe

This isn't theory. I run this for my home sensors — and it catches failures LWT misses daily.

Try It Yourself

git clone https://github.com/AnilSaithana/hear-beat
cd hear-beat
docker-compose up  # Runs Spring Boot + MQTT broker
Enter fullscreen mode Exit fullscreen mode

ESP32 firmware example included in /firmware folder.


I built this because production IoT fails in the gaps between cloud and hardware.

If you've felt this pain — DM me. I'd love to hear your war stories.

Top comments (0)