<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Anil Saithana</title>
    <description>The latest articles on DEV Community by Anil Saithana (@anil_saithana).</description>
    <link>https://dev.to/anil_saithana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3783912%2Fb7eb1a25-dab2-426d-9720-7fa5f6c50658.jpeg</url>
      <title>DEV Community: Anil Saithana</title>
      <link>https://dev.to/anil_saithana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/anil_saithana"/>
    <language>en</language>
    <item>
      <title>Why MQTT Last Will Testament Isn't Enough for Production IoT (And What We Built Instead)</title>
      <dc:creator>Anil Saithana</dc:creator>
      <pubDate>Sun, 22 Feb 2026 06:10:54 +0000</pubDate>
      <link>https://dev.to/anil_saithana/why-mqtt-last-will-testament-isnt-enough-for-production-iot-and-what-we-built-instead-44cj</link>
      <guid>https://dev.to/anil_saithana/why-mqtt-last-will-testament-isnt-enough-for-production-iot-and-what-we-built-instead-44cj</guid>
      <description>&lt;h1&gt;
  
  
  Why MQTT Last Will Testament Isn't Enough for Production IoT (And What We Built Instead)
&lt;/h1&gt;

&lt;p&gt;I spent 7 years building cloud backends — but when I tried connecting real hardware (ESP32s in my home), I hit a wall:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;"My device shows 'connected' in AWS IoT Core... but hasn't reported data in 4 hours. Is it hung? Dead? Or just offline?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Turns out: &lt;strong&gt;MQTT's Last Will Testament (LWT) lies to you.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Lie: "Connected" ≠ Alive
&lt;/h2&gt;

&lt;p&gt;LWT triggers only on &lt;em&gt;TCP disconnect&lt;/em&gt;. But real devices fail silently:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;WiFi drops but TCP socket stays open (NAT timeout = 5+ minutes)&lt;/li&gt;
&lt;li&gt;Device freezes but doesn't reboot (watchdog failed)&lt;/li&gt;
&lt;li&gt;Sensor loop crashes but MQTT client still "connected"&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Result? Your dashboard shows "✅ Online" while the device hasn't sent data since yesterday.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpxaarp3cc95oqdix30m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnpxaarp3cc95oqdix30m.png" alt="AWS IoT Core showing connected device with stale data" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Our Fix: Application-Level Heartbeats + Stateful ACKs
&lt;/h2&gt;

&lt;p&gt;We built a lightweight Spring Boot backend (&lt;a href="https://github.com/AnilSaithana/hear-beat" rel="noopener noreferrer"&gt;hear-beat&lt;/a&gt;) that treats &lt;strong&gt;telemetry as heartbeat pulses&lt;/strong&gt; — not just data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Device → [temp=28°C, ts=1708512000] → Backend
Backend → "ACK @ 1708512000" → Device
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  1. Offline detection = missed heartbeat window
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// DeviceRegistry.java&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;System&lt;/span&gt;&lt;span class="o"&gt;.&lt;/span&gt;&lt;span class="na"&gt;currentTimeMillis&lt;/span&gt;&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;lastHeartbeat&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="no"&gt;OFFLINE_THRESHOLD&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;markDeviceOffline&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deviceId&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Not TCP disconnect — actual silence&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  2. Command safety via ACK loop
&lt;/h3&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight java"&gt;&lt;code&gt;&lt;span class="c1"&gt;// CommandService.java&lt;/span&gt;
&lt;span class="n"&gt;sendCommand&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deviceId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"REBOOT"&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt;
&lt;span class="n"&gt;waitForAck&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="n"&gt;deviceId&lt;/span&gt;&lt;span class="o"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="o"&gt;);&lt;/span&gt; &lt;span class="c1"&gt;// Did it *execute*? Not just "received"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  3. REST control plane + MQTT data plane
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Mobile apps talk REST (&lt;code&gt;POST /devices/{id}/command&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Devices talk MQTT (&lt;code&gt;iot/device/{id}/cmd&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Backend bridges both → clean separation&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Why This Matters for Real Deployments
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Scenario&lt;/th&gt;
&lt;th&gt;LWT Says&lt;/th&gt;
&lt;th&gt;Our Heartbeat Says&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Device froze (no reboot)&lt;/td&gt;
&lt;td&gt;✅ Connected&lt;/td&gt;
&lt;td&gt;❌ Offline (no heartbeat in 90s)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WiFi dropped in rural field&lt;/td&gt;
&lt;td&gt;✅ Connected (TCP alive)&lt;/td&gt;
&lt;td&gt;❌ Offline (no data in 2 mins)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Command sent but device crashed mid-execution&lt;/td&gt;
&lt;td&gt;✅ Command delivered&lt;/td&gt;
&lt;td&gt;❌ No ACK → retry/fail-safe&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;This isn't theory. I run this for my home sensors — and it catches failures LWT misses &lt;strong&gt;daily&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try It Yourself
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/AnilSaithana/hear-beat
&lt;span class="nb"&gt;cd &lt;/span&gt;hear-beat
docker-compose up  &lt;span class="c"&gt;# Runs Spring Boot + MQTT broker&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ESP32 firmware example included in &lt;code&gt;/firmware&lt;/code&gt; folder.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;I built this because production IoT fails in the gaps between cloud and hardware.&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If you've felt this pain — DM me. I'd love to hear your war stories.&lt;/p&gt;

</description>
      <category>iot</category>
      <category>springboot</category>
      <category>mqtt</category>
      <category>embedded</category>
    </item>
  </channel>
</rss>
