<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Supun Sriyananda</title>
    <description>The latest articles on DEV Community by Supun Sriyananda (@ranaweerasupun).</description>
    <link>https://dev.to/ranaweerasupun</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3951298%2Fb6d6c160-d027-48c3-aeb0-91a507d5b6a8.jpeg</url>
      <title>DEV Community: Supun Sriyananda</title>
      <link>https://dev.to/ranaweerasupun</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ranaweerasupun"/>
    <language>en</language>
    <item>
      <title>Finishing What I Started — From a TODO List to a Published PyPI Package</title>
      <dc:creator>Supun Sriyananda</dc:creator>
      <pubDate>Thu, 28 May 2026 13:03:03 +0000</pubDate>
      <link>https://dev.to/ranaweerasupun/finishing-what-i-started-from-a-todo-list-to-a-published-pypi-package-38bh</link>
      <guid>https://dev.to/ranaweerasupun/finishing-what-i-started-from-a-todo-list-to-a-published-pypi-package-38bh</guid>
      <description>&lt;p&gt;&lt;em&gt;This is a submission for the &lt;a href="https://dev.to/challenges/github-2026-05-21"&gt;GitHub Finish-Up-A-Thon Challenge&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What I Built
&lt;/h2&gt;

&lt;p&gt;I built &lt;strong&gt;robmqtt&lt;/strong&gt; — a resilient MQTT client for edge IoT devices, now published on PyPI.&lt;/p&gt;

&lt;p&gt;It came out of a problem that cost me real time. I deploy sensor systems on hardware that lives in the field: Raspberry Pi units on 4G cellular, embedded gateways, battery monitoring systems. On those networks, connectivity is never stable. And the standard MQTT client, &lt;code&gt;paho-mqtt&lt;/code&gt;, silently drops messages when the broker is unreachable — no error, no warning, no log entry unless you write one yourself.&lt;/p&gt;

&lt;p&gt;I lost data for a long time before I understood why. And finding the cause was brutal. There are no proper logs unless you explicitly add them, so I spent hours — days — chasing a problem that left no trace. I went through forums, blogs, chat AIs, everywhere, looking for how other people had solved it. What I found was the same thing every time: people asking for a &lt;em&gt;production-resilient&lt;/em&gt; MQTT client, and no real answer. Just scattered advice and code snippets that handled one piece and ignored the rest.&lt;/p&gt;

&lt;p&gt;So I built it myself:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Offline queue&lt;/strong&gt; — when the broker is unreachable, messages are written to SQLite instead of being dropped. They survive process restarts and power cycles.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Inflight tracking&lt;/strong&gt; — messages sent but not yet acknowledged are tracked separately and re-sent on reconnect, closing a gap that even QoS 1 leaves open.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Priority eviction&lt;/strong&gt; — when the queue fills, low-priority telemetry is evicted before critical alerts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Exponential backoff&lt;/strong&gt; — a fleet of devices reconnecting after an outage won't all hammer the broker at once.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TLS support&lt;/strong&gt; — for brokers that require encrypted connections.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And then everything a library needs to actually be used in production, not just by me:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TLS and mutual TLS&lt;/strong&gt; plus username/password auth, for secured brokers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bidirectional messaging&lt;/strong&gt; — &lt;code&gt;subscribe&lt;/code&gt;/&lt;code&gt;unsubscribe&lt;/code&gt; with full MQTT wildcard support, and subscriptions that survive reconnects.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Observability&lt;/strong&gt; — a built-in HTTP health check endpoint (&lt;code&gt;/health&lt;/code&gt; returning healthy / degraded / unhealthy) with Docker &lt;code&gt;HEALTHCHECK&lt;/code&gt; and Kubernetes liveness-probe examples.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Structured logging&lt;/strong&gt; throughout — so the next person doesn't lose days to a problem with no trace, the way I did.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;73 tests&lt;/strong&gt; across the storage, tracking, topic-matching, and client layers.
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;robmqtt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters to me because it's not a toy. It's running on real systems I maintain — battery management monitoring and robotics telemetry — where a gap in the data record has actual consequences.&lt;/p&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;a href="https://pypi.org/project/robmqtt/" rel="noopener noreferrer"&gt;https://pypi.org/project/robmqtt/&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/ranaweerasupun/resilient-edge-mqtt-client" rel="noopener noreferrer"&gt;https://github.com/ranaweerasupun/resilient-edge-mqtt-client&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The most satisfying way to see it work is to watch the offline queue survive a broker outage. The repo includes a simulation script for exactly this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1 — run a simulated device publishing every 5 seconds&lt;/span&gt;
python test_13.py

&lt;span class="c"&gt;# Terminal 2 — kill the broker mid-run&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl stop mosquitto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The device keeps publishing. Messages start queuing to SQLite instead of erroring. Then:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Bring the broker back&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start mosquitto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The queue drains automatically, in priority order. Every message that piled up during the outage is delivered. Nothing is lost.&lt;/p&gt;

&lt;p&gt;Basic usage looks like this — the application code never has to know whether the broker is reachable:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;robmqtt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ProductionMQTTClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ProductionMQTTClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;field_device_001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mqtt.yourdomain.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1883&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_queue_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./device.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sensors/temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;read_sensor&lt;/span&gt;&lt;span class="p"&gt;()),&lt;/span&gt;
        &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1u9zdfkxtrpi92ce4mpe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1u9zdfkxtrpi92ce4mpe.png" alt="robmqtt-architechture" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrif4hwjmz99eyy6kgbh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvrif4hwjmz99eyy6kgbh.png" alt="messages publish" width="622" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fum0srn1ehrnn3pa0c2r0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fum0srn1ehrnn3pa0c2r0.png" alt="sqlite queue drained" width="626" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The Comeback Story
&lt;/h2&gt;

&lt;p&gt;Here's the honest before-and-after.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Before.&lt;/strong&gt; Once I'd finally figured out the code that solved my problem, I used it — and it went into my GitHub repo, where it sat. It worked. It solved the thing that had cost me days. But no one knew it existed.&lt;/p&gt;

&lt;p&gt;And that was the part that nagged at me. I'd keep seeing people online asking for exactly what I'd built — something resilient enough for production — and getting the same non-answers I'd gotten: random advice, half-solutions, snippets. I had the answer sitting in a corner of my repo, and a thought stuck in the corner of my head: &lt;em&gt;it's right here. I need to show this off.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;The things I knew I should do but hadn't:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Package it for PyPI so people could &lt;code&gt;pip install&lt;/code&gt; it instead of cloning the repo&lt;/li&gt;
&lt;li&gt;Add TLS/mTLS support for secure broker connections&lt;/li&gt;
&lt;li&gt;Expose a metrics endpoint for observability&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It was the classic unfinished side project. Functional, but not &lt;em&gt;shipped&lt;/em&gt;. Not something anyone other than me could actually use. The plan was always "publish it once it's polished" — and polishing it was the part that kept not happening.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The hard part.&lt;/strong&gt; I have a full-time job. Finishing this wasn't something I could do in work hours. So it happened on weekends, late at night, and in the gaps — I drew flowcharts and planned the package structure on my commute, in my head, on paper. Progress came in small pieces, squeezed around everything else. There were plenty of stretches where it would have been easier to just leave it in the repo and move on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;After.&lt;/strong&gt; I pushed it over the line:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Published to PyPI as &lt;code&gt;robmqtt&lt;/code&gt; v1.0.0.&lt;/strong&gt; This was more work than I expected — restructuring the code into a proper installable package, writing the &lt;code&gt;pyproject.toml&lt;/code&gt;, sorting out the module layout, testing the install in a clean environment.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Added TLS support.&lt;/strong&gt; The client now takes &lt;code&gt;use_tls&lt;/code&gt;, &lt;code&gt;ca_certs&lt;/code&gt;, &lt;code&gt;certfile&lt;/code&gt;, &lt;code&gt;keyfile&lt;/code&gt;, and broker auth parameters, so it works with secured brokers, not just local plaintext ones.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Observability&lt;/strong&gt; Built an HTTP health check endpoint with Docker and Kubernetes probe examples — so the library reports its own health instead of failing silently.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Structured logging&lt;/strong&gt; so the next person doesn't suffer days to a problem with no trace, the way I did!&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Integrity&lt;/strong&gt; Fixed three bugs I'm glad I caught before anyone relied on it: binary payloads being corrupted by string conversion, a resend-tracking gap that could silently lose messages on a second disconnect, and a race condition in &lt;code&gt;publish()&lt;/code&gt; that could drop a message between the connection check and the send.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Started building a real project on top of it.&lt;/strong&gt; To put the library through its paces in a full system, I'm building an IoT fleet analytics platform around it — simulated devices publishing through robmqtt, an MQTT-to-InfluxDB bridge, Grafana dashboards, and statistical anomaly detection. It's still in progress, but it's already doing what mattered most: proving the library holds up as the foundation of a real pipeline, not just in isolated tests.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When it was finally published and I ran the test scripts and watched the offline queue fill and drain exactly the way it was supposed to — messages held through an outage, then delivered, nothing lost — there was nothing more rewarding. The thing that had cost me days of frustration was now one &lt;code&gt;pip install&lt;/code&gt; away for anyone who hits the same wall I did.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Experience with GitHub Copilot
&lt;/h2&gt;

&lt;p&gt;I'll be honest about my workflow, because the reason I relied on Copilot is specific.&lt;/p&gt;

&lt;p&gt;I use a few AI tools when I build things. The general-purpose chat assistants are useful for sketching out an approach or talking through a design — but when it came to actual code, they kept giving me snippets that didn't quite work. That's not surprising: they can't see my project. They don't know my file structure, my variable names, the exact version of a library I'm using, or how the piece they're suggesting fits the rest of the code. So I'd paste in a snippet, hit an error, paste the error back, get a revised snippet, hit another error. Going back and forth with a chat window that can't see my codebase got slow and frustrating.&lt;/p&gt;

&lt;p&gt;Copilot was different because it lives in my editor and can see everything — all my open files, the actual code around the cursor, the real context. That's why it became the tool I leaned on.&lt;/p&gt;

&lt;p&gt;Two ways I used it:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Inline completion, always on.&lt;/strong&gt; As I typed, Copilot autocompleted the repetitive, structural parts — the device profile dictionaries, the JSON payload construction, the boilerplate around &lt;code&gt;publish()&lt;/code&gt; calls with their QoS and priority arguments. Because it could see the patterns already in my file, its suggestions actually fit, instead of being generically plausible code I'd have to rewrite.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Copilot Chat for debugging in context.&lt;/strong&gt; This is where it earned its place. When something broke, I'd ask it directly — &lt;em&gt;"Why am I getting this error, and how can I fix it?"&lt;/em&gt; or &lt;em&gt;"Can you check this code snippet?"&lt;/em&gt; — and because it could read my actual files, the answers were grounded in my real code, not a guess about what my code might look like. That's the difference that made me stop pasting snippets into external chat windows. The tool that can see your codebase gives you answers that apply to your codebase.&lt;/p&gt;

&lt;p&gt;That freed me to spend my real thinking time on the parts that mattered: the sine-wave drift model for realistic sensor data, the priority eviction logic, and the inflight-tracking design that closes the QoS 1 gap. The hard design decisions were mine. Copilot removed the friction around them — the typing, the boilerplate, and the slow debugging loop — so I could stay focused on the actual engineering.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Built by Supun Sriyananda — R&amp;amp;D Engineer working on embedded and IoT systems. robmqtt is open source on &lt;a href="https://github.com/ranaweerasupun/resilient-edge-mqtt-client" rel="noopener noreferrer"&gt;GitHub&lt;/a&gt; and &lt;a href="https://pypi.org/project/robmqtt/" rel="noopener noreferrer"&gt;PyPI&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>devchallenge</category>
      <category>githubchallenge</category>
      <category>python</category>
      <category>iot</category>
    </item>
    <item>
      <title>How to Generate Realistic IoT Sensor Data for Testing Your MQTT Pipeline</title>
      <dc:creator>Supun Sriyananda</dc:creator>
      <pubDate>Thu, 28 May 2026 11:02:48 +0000</pubDate>
      <link>https://dev.to/ranaweerasupun/how-to-generate-realistic-iot-sensor-data-for-testing-your-mqtt-pipeline-1a2h</link>
      <guid>https://dev.to/ranaweerasupun/how-to-generate-realistic-iot-sensor-data-for-testing-your-mqtt-pipeline-1a2h</guid>
      <description>&lt;h2&gt;
  
  
  How to Generate Realistic IoT Sensor Data for Testing Your MQTT Pipeline
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;This is part 2 of a series on building robmqtt. &lt;a href="https://dev.to/ranaweerasupun/why-your-mqtt-client-is-silently-losing-messages-and-how-i-fixed-it-robmqtt-4n4k"&gt;Part 1&lt;/a&gt; covered why paho-mqtt silently drops messages and the library I built to fix it. This part is about testing — how to exercise an MQTT pipeline without deploying physical hardware.&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;In the last post I wrote about robmqtt, a resilient MQTT client for edge devices. Once I had it working, I hit the obvious next problem: how do I test it properly without setting up a rack of Raspberry Pis?&lt;/p&gt;

&lt;p&gt;I needed data flowing through the pipeline. Lots of it. From many devices. Behaving differently. And ideally surviving a broker outage so I could watch the offline queue do its job.&lt;/p&gt;

&lt;p&gt;So I wrote a device simulator. And in writing it, I learned that generating &lt;em&gt;realistic&lt;/em&gt; fake sensor data is harder than it looks — and that most people get it wrong in the same way.&lt;/p&gt;




&lt;h2&gt;
  
  
  The trap: random data isn't realistic data
&lt;/h2&gt;

&lt;p&gt;The first instinct when simulating a sensor is to reach for &lt;code&gt;random.uniform()&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;cpu&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="c1"&gt;# don't do this
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This produces data that looks nothing like a real sensor. Real sensor readings don't jump randomly across the whole range every second. A CPU sitting at 8% doesn't suddenly read 94% then drop to 3%. Temperature drifts slowly. Signal strength wobbles around a baseline. There's continuity from one reading to the next.&lt;/p&gt;

&lt;p&gt;If you test your pipeline with pure random noise, your charts look like static, your anomaly detection has nothing meaningful to detect, and your dashboards are useless for spotting whether anything actually works.&lt;/p&gt;

&lt;p&gt;I wanted data that &lt;em&gt;looked&lt;/em&gt; like it came from a real device.&lt;/p&gt;




&lt;h2&gt;
  
  
  Realistic drift with a sine wave
&lt;/h2&gt;

&lt;p&gt;The trick I landed on was a slow sine wave with per-device random phase, plus a little Gaussian noise on top:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_cpu&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;profile&lt;/span&gt;
    &lt;span class="n"&gt;drift&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sin&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;300&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_drift_phase&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;noise&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gauss&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu_variance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;drift&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;noise&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nf"&gt;min&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;99.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three things are happening here:&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;sine wave&lt;/strong&gt; (&lt;code&gt;math.sin(time.time() / 300 ...)&lt;/code&gt;) creates a slow, smooth oscillation with a period of about five minutes. This is the gradual drift you see in real systems as load rises and falls through the day.&lt;/p&gt;

&lt;p&gt;The &lt;strong&gt;phase offset&lt;/strong&gt; (&lt;code&gt;self._drift_phase&lt;/code&gt;, a random value set once when the device starts) means every device is at a different point in its cycle. Without it, all your simulated devices would drift up and down in perfect unison, which is a dead giveaway that the data is fake.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_drift_phase&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;math&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;pi&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;strong&gt;Gaussian noise&lt;/strong&gt; (&lt;code&gt;random.gauss&lt;/code&gt;) adds small reading-to-reading variation on top of the drift. Real sensors are never perfectly smooth — there's always measurement jitter.&lt;/p&gt;

&lt;p&gt;The result is data that drifts, wobbles, and stays within a believable range — and each device has its own personality. When you chart it, it looks like telemetry, not like a random number generator.&lt;/p&gt;




&lt;h2&gt;
  
  
  Device profiles: a camera is not a sensor
&lt;/h2&gt;

&lt;p&gt;A real fleet isn't 15 copies of the same device. A camera runs hot and busy. A simple sensor sips power and idles. A gateway sits in between. If your simulator treats them all identically, your test data doesn't reflect anything real.&lt;/p&gt;

&lt;p&gt;So each device type gets a profile:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;DEVICE_PROFILES&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sensor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu_variance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;42.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp_variance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;5.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;telemetry_interval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.03&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;camera&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;65&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;   &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu_variance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp_base&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;61.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temp_variance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;telemetry_interval&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;failure_rate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mf"&gt;0.02&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="c1"&gt;# gateway, controller ...
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A camera baselines at 65% CPU and 61°C, publishing every 5 seconds. A sensor baselines at 8% CPU and 42°C, publishing every 10 seconds. When this data lands in a dashboard, the device types are visibly different — exactly like a real deployment, where you can often guess a device's role just from its resource profile.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;failure_rate&lt;/code&gt; field controls how often the device injects an anomaly — a sudden CPU and temperature spike — so there's something for downstream anomaly detection to actually find:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_is_anomaly&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cpu_percent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;88&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;99&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;temperature_c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;uniform&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;92&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;anomaly&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Using robmqtt in the simulator
&lt;/h2&gt;

&lt;p&gt;This is where the simulator doubles as a usage example for the library. Each simulated device is a real robmqtt client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;robmqtt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ProductionMQTTClient&lt;/span&gt;

&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ProductionMQTTClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fleet_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;device_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;broker_host&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;broker_port&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_queue_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./data/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;device_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;min_backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;log_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./logs/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;device_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each device publishes three kinds of message, and the QoS and priority differ by importance:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Telemetry — frequent, can tolerate eviction under pressure
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fleet/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/telemetry&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Status — operational health, higher priority
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fleet/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/status&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Boot/alert events — must not be lost, highest priority, QoS 2
&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fleet/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/events&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the priority system from part 1 in action. If the broker goes down and the offline queue fills, routine telemetry (priority 5) gets evicted before status messages (priority 8), and event messages (priority 10, QoS 2) are protected.&lt;/p&gt;

&lt;p&gt;The status messages even report the client's own internal state, pulled straight from robmqtt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_statistics&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;queue_depth&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;     &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;offline_queue_size&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inflight_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;inflight_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_connected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;    &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_connected&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reconnect_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reconnect_count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So the simulated fleet reports on its own connectivity health — which means you can build a dashboard that shows the offline queue filling and draining in real time.&lt;/p&gt;




&lt;h2&gt;
  
  
  Watching the offline queue work
&lt;/h2&gt;

&lt;p&gt;This is the part I find satisfying. Start a device:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python device_simulator.py &lt;span class="nt"&gt;--device-id&lt;/span&gt; device_001 &lt;span class="nt"&gt;--device-type&lt;/span&gt; gateway
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ini"&gt;&lt;code&gt;&lt;span class="nn"&gt;[device_001]&lt;/span&gt; &lt;span class="err"&gt;Started&lt;/span&gt; &lt;span class="err"&gt;—&lt;/span&gt; &lt;span class="py"&gt;type&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;gateway location=warehouse_a&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now kill the broker. The device keeps publishing — but the messages are now being written to SQLite instead of sent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl stop mosquitto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The device doesn't crash. It doesn't error. It just quietly queues. The &lt;code&gt;queue_depth&lt;/code&gt; in the status payload climbs: 5, 12, 28, 45...&lt;/p&gt;

&lt;p&gt;Bring the broker back:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start mosquitto
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The queue drains automatically. Every reading that piled up during the outage is delivered, in priority order. The &lt;code&gt;queue_depth&lt;/code&gt; falls back to zero. Nothing was lost.&lt;/p&gt;

&lt;p&gt;That's the whole point of the library, demonstrated in a way you can watch happen.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why a simulator is worth building
&lt;/h2&gt;

&lt;p&gt;Even if you have real hardware, a simulator earns its place:&lt;/p&gt;

&lt;p&gt;It lets you test at scale you don't have hardware for. You can run 15 simulated devices on your laptop and see how your pipeline, database, and dashboards behave under fleet-level load.&lt;/p&gt;

&lt;p&gt;It gives you reproducible failure scenarios. Killing a broker on demand is a lot easier than waiting for a real 4G connection to drop in the field.&lt;/p&gt;

&lt;p&gt;It produces clean test data with known properties. You injected the anomalies, so you know exactly what your anomaly detection should catch.&lt;/p&gt;

&lt;p&gt;And — as a bonus — it doubles as living documentation for how to use your client library. The simulator &lt;em&gt;is&lt;/em&gt; the example.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's next
&lt;/h2&gt;

&lt;p&gt;In part 3 I'll cover running this at fleet scale — launching many devices at once and feeding their telemetry through an analytics pipeline into a live dashboard.&lt;/p&gt;

&lt;p&gt;The full simulator code is on GitHub alongside robmqtt:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;a href="https://pypi.org/project/robmqtt/" rel="noopener noreferrer"&gt;pypi.org/project/robmqtt&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/ranaweerasupun/resilient-edge-mqtt-client" rel="noopener noreferrer"&gt;github.com/ranaweerasupun/resilient-edge-mqtt-client&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;How do you test your IoT pipelines — real hardware, simulators, or something else? I'm curious what others do — let me know in the comments.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>iot</category>
      <category>python</category>
      <category>mqtt</category>
      <category>testing</category>
    </item>
    <item>
      <title>Why Your MQTT Client Is Silently Losing Messages (And How I Fixed It) - robmqtt</title>
      <dc:creator>Supun Sriyananda</dc:creator>
      <pubDate>Tue, 26 May 2026 14:01:19 +0000</pubDate>
      <link>https://dev.to/ranaweerasupun/why-your-mqtt-client-is-silently-losing-messages-and-how-i-fixed-it-robmqtt-4n4k</link>
      <guid>https://dev.to/ranaweerasupun/why-your-mqtt-client-is-silently-losing-messages-and-how-i-fixed-it-robmqtt-4n4k</guid>
      <description>&lt;h2&gt;
  
  
  Why Your MQTT Client Is Silently Losing Messages (And How I Fixed It)
&lt;/h2&gt;

&lt;p&gt;I learned this the hard way.&lt;/p&gt;

&lt;p&gt;I was building a sensor system for a field deployment — Raspberry Pi units publishing temperature and humidity data over 4G cellular to an MQTT broker. The dashboard looked fine. The graphs looked fine. Then one day I compared the raw sensor logs against what actually made it to the broker.&lt;/p&gt;

&lt;p&gt;Thousands of readings. Gone. No errors. No warnings. Just gone.&lt;/p&gt;

&lt;p&gt;The culprit? &lt;code&gt;paho-mqtt&lt;/code&gt;'s default behaviour when the broker is unreachable: it silently drops your message and moves on.&lt;/p&gt;

&lt;p&gt;After losing enough data I wrote a library to fix it. It's now on PyPI as &lt;strong&gt;robmqtt&lt;/strong&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;robmqtt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But before I show you how it works, let me show you exactly what the problem is — because it's subtler than most people realise.&lt;/p&gt;




&lt;h2&gt;
  
  
  The Problem with Standard MQTT Clients
&lt;/h2&gt;

&lt;p&gt;When you call &lt;code&gt;client.publish()&lt;/code&gt; in paho-mqtt and the broker is unreachable, one of two things happens:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The message is silently discarded (QoS 0)&lt;/li&gt;
&lt;li&gt;The message is queued in memory for QoS 1/2 — but that queue is lost on process restart, and there's a second gap that even QoS 1 doesn't close&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That second gap is the sneaky one. Here's what happens with QoS 1:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj6saayzafjthwujry6j2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj6saayzafjthwujry6j2.png" alt="qos1-silent-failure.png" width="618" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The message was "sent" from your perspective. It was never confirmed from the broker's perspective. And paho has no mechanism to track this gap across reconnections.&lt;/p&gt;

&lt;p&gt;On a stable data centre network, this almost never matters. On a Raspberry Pi running on 4G cellular in a field cabinet, it happens constantly.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a Resilient Edge Client Actually Needs
&lt;/h2&gt;

&lt;p&gt;After losing enough data, I sat down and wrote out what a proper edge MQTT client needs to do:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Persist offline messages to disk&lt;/strong&gt;&lt;br&gt;
If the broker is unreachable when &lt;code&gt;publish()&lt;/code&gt; is called, the message should be written to disk and replayed later. Not held in memory — memory is lost on restart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Track in-flight messages separately&lt;/strong&gt;&lt;br&gt;
Messages that have been handed to the broker but not yet ACK'd need to be tracked. On reconnect, they must be re-sent before any queued messages start draining.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Priority-based eviction&lt;/strong&gt;&lt;br&gt;
When the queue fills up, not all messages are equal. A critical alarm should survive. A routine telemetry reading from 6 hours ago should not block it.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Exponential backoff on reconnect&lt;/strong&gt;&lt;br&gt;
A fleet of 50 devices coming back online after a broker restart should not all hammer the broker at the same second.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;5. Thread-safe storage&lt;/strong&gt;&lt;br&gt;
The MQTT network thread and your application thread are both touching message state. This needs to be safe without forcing the caller to think about locking.&lt;/p&gt;

&lt;p&gt;None of this is exotic. All of it is missing from the standard &lt;code&gt;paho-mqtt&lt;/code&gt; client when used out of the box.&lt;/p&gt;


&lt;h2&gt;
  
  
  How robmqtt Solves It
&lt;/h2&gt;

&lt;p&gt;Here's the architecture:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4922wwjiybq1zdzd7vm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj4922wwjiybq1zdzd7vm.png" alt="robmqtt-architecture.png" width="800" height="517"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  The Offline Queue
&lt;/h3&gt;

&lt;p&gt;When the client detects it's disconnected, &lt;code&gt;publish()&lt;/code&gt; routes to an &lt;code&gt;OfflineQueue&lt;/code&gt; backed by SQLite:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simplified from offline_queue.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OfflineQueue&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;enqueue&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
                INSERT INTO queue (topic, payload, qos, priority, timestamp)
                VALUES (?, ?, ?, ?, ?)
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;time&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;dequeue_batch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Highest priority first, then oldest first within same priority
&lt;/span&gt;            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
                SELECT id, topic, payload, qos FROM queue
                ORDER BY priority DESC, timestamp ASC
                LIMIT ?
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;,)).&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;SQLite gives you durability without a separate process. It survives power cycles. The threading lock means your application thread and the drain thread never step on each other.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Inflight Tracker
&lt;/h3&gt;

&lt;p&gt;This closes the gap QoS 1 leaves open:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Simplified from inflight_tracker.py
&lt;/span&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;InflightTracker&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;track&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Call this when you hand a message to paho.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
                INSERT OR REPLACE INTO inflight (mid, topic, payload, qos)
                VALUES (?, ?, ?, ?)
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;acknowledge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Call this in on_publish callback.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DELETE FROM inflight WHERE mid = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mid&lt;/span&gt;&lt;span class="p"&gt;,))&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_all_pending&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Call this on reconnect — re-send everything unacknowledged.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_lock&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT topic, payload, qos FROM inflight&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;fetchall&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On reconnect, the client replays all inflight messages first, then starts draining the offline queue. Delivery order is preserved.&lt;/p&gt;

&lt;h3&gt;
  
  
  Priority Eviction
&lt;/h3&gt;

&lt;p&gt;Each message gets a priority from 1 (lowest) to 10 (highest):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Routine telemetry — can be evicted when queue is full
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sensors/temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 23.5}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Critical alert — survives eviction, displaces old telemetry
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;alerts/critical&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;over_temp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;value&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 87.2}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;9&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When the queue hits capacity, the lowest-priority messages are evicted first. Your critical alerts are never blocked by a backlog of stale routine data.&lt;/p&gt;




&lt;h2&gt;
  
  
  Using robmqtt
&lt;/h2&gt;

&lt;p&gt;Install:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;robmqtt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Basic usage — this is everything you need:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;robmqtt&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ProductionMQTTClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ProductionMQTTClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;field_device_001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mqtt.yourdomain.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1883&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;max_queue_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;    &lt;span class="c1"&gt;# holds ~5000 messages during outages
&lt;/span&gt;    &lt;span class="n"&gt;min_backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# start retrying after 2s
&lt;/span&gt;    &lt;span class="n"&gt;max_backoff&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;         &lt;span class="c1"&gt;# cap retry interval at 60s
&lt;/span&gt;    &lt;span class="n"&gt;db_path&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./device.db&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;  &lt;span class="c1"&gt;# SQLite lives here — survives reboots
&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# From here just call publish() — routing is handled internally.
# Connected: sends directly and tracks inflight.
# Disconnected: writes to SQLite, drains automatically on reconnect.
&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;reading&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;read_sensor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;publish&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sensors/temperature&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;reading&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;qos&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;priority&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The application code doesn't need to know whether the broker is reachable. That's the point.&lt;/p&gt;

&lt;p&gt;Check what's happening at runtime:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_statistics&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# {
#   'is_connected': True,
#   'offline_queue_size': 0,
#   'inflight_count': 2,
#   'reconnect_count': 4,
#   ...
# }
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TLS is supported if your broker requires it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ProductionMQTTClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;client_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;secure_device_001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker_host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mqtt.yourdomain.com&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;broker_port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8883&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;use_tls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;ca_certs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/etc/ssl/certs/broker-ca.crt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;username&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;device001&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;your_password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Seeing It in Action
&lt;/h2&gt;

&lt;p&gt;The repo includes &lt;code&gt;test_13.py&lt;/code&gt;, a simulation designed specifically to demo the offline behaviour:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Terminal 1 — run the simulation (publishes every 5 seconds)&lt;/span&gt;
python test_13.py

&lt;span class="c"&gt;# Terminal 2 — simulate a network outage&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl stop mosquitto

&lt;span class="c"&gt;# Watch messages queue up in Terminal 1&lt;/span&gt;
&lt;span class="c"&gt;# Queue stats print every 10 readings&lt;/span&gt;

&lt;span class="c"&gt;# Restore connectivity&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl start mosquitto

&lt;span class="c"&gt;# Watch the offline queue drain automatically — zero messages lost&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The queue drain happens in a background daemon thread. Your application code does nothing. It just works.&lt;/p&gt;




&lt;h2&gt;
  
  
  Real-World Context
&lt;/h2&gt;

&lt;p&gt;I've deployed this pattern on:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Battery management systems&lt;/strong&gt; — monitoring cell voltages and temperatures in production energy storage systems. A 10-minute broker outage during a network switch should not cause a gap in the battery health record.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Robotics telemetry&lt;/strong&gt; — ROS2 robots publishing sensor and status data. Process restarts during OTA updates should not lose the last known state.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MQTT edge gateways&lt;/strong&gt; — aggregating data from downstream sensors over serial or CAN and forwarding to a cloud broker over 4G. The gateway may reconnect dozens of times per day.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In all of these cases, the pattern is the same: treat disconnection as normal, not exceptional. Design the client to buffer, not to fail.&lt;/p&gt;




&lt;h2&gt;
  
  
  Who This Is For
&lt;/h2&gt;

&lt;p&gt;robmqtt is specifically for &lt;strong&gt;edge device deployments&lt;/strong&gt; where:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Network connectivity is unreliable (cellular, Wi-Fi roaming, VPNs)&lt;/li&gt;
&lt;li&gt;Process restarts happen (watchdog resets, power cycles, OTA updates)&lt;/li&gt;
&lt;li&gt;Message loss has real consequences (industrial monitoring, remote sensors, fleet telemetry)&lt;/li&gt;
&lt;li&gt;You don't want to build and maintain this infrastructure yourself&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're running MQTT on a stable cloud-to-cloud connection, &lt;code&gt;paho-mqtt&lt;/code&gt; alone is probably fine. If you're deploying on Raspberry Pi, industrial gateways, field sensors, or anything running on 4G/LTE — this is for you.&lt;/p&gt;




&lt;h2&gt;
  
  
  What's Next
&lt;/h2&gt;

&lt;p&gt;The Prometheus metrics endpoint is on the roadmap. The structured logging already writes &lt;code&gt;.jsonl&lt;/code&gt; metrics files — exposing them via HTTP is a small step and would make robmqtt slot naturally into standard observability stacks.&lt;/p&gt;

&lt;p&gt;If you try it and hit an issue, open a GitHub issue. If you want a feature, open a discussion.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PyPI:&lt;/strong&gt; &lt;a href="https://pypi.org/project/robmqtt/" rel="noopener noreferrer"&gt;pypi.org/project/robmqtt&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/ranaweerasupun/resilient-edge-mqtt-client" rel="noopener noreferrer"&gt;github.com/ranaweerasupun/resilient-edge-mqtt-client&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;&lt;em&gt;Have you run into MQTT message loss on edge devices? How did you handle it — drop a comment below.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>iot</category>
      <category>python</category>
      <category>mqtt</category>
      <category>raspberrypi</category>
    </item>
  </channel>
</rss>
