<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bisal  </title>
    <description>The latest articles on DEV Community by Bisal   (@bisal).</description>
    <link>https://dev.to/bisal</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4000484%2F4567e1d9-51ea-4069-9a41-201bf9a2551b.jpg</url>
      <title>DEV Community: Bisal  </title>
      <link>https://dev.to/bisal</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/bisal"/>
    <language>en</language>
    <item>
      <title>The Silent Crash: Handling Zombie WebSockets in Python IoT Applications</title>
      <dc:creator>Bisal  </dc:creator>
      <pubDate>Wed, 24 Jun 2026 16:16:51 +0000</pubDate>
      <link>https://dev.to/bisal/the-silent-crash-handling-zombie-websockets-in-python-iot-applications-3j2m</link>
      <guid>https://dev.to/bisal/the-silent-crash-handling-zombie-websockets-in-python-iot-applications-3j2m</guid>
      <description>&lt;p&gt;One harsh truth we discover while building backend systems for IoT devices like industrial sensors or electric vehicle (EV) chargers is this: &lt;strong&gt;&lt;em&gt;Hardware lies&lt;/em&gt;&lt;/strong&gt;. Your Python WebSocket server works flawlessly on your local machine, but deploy it to production where physical hardware is shoved into underground concrete parking garages, fighting high-voltage electromagnetic interference, and constantly falling back to legacy 2G/3G networks, and your server will quickly fall victim to "&lt;em&gt;Zombie Connections&lt;/em&gt;."&lt;/p&gt;

&lt;p&gt;I still remember a seminar I attended where a gentleman was explaining how he had deployed their IoT devices on generators in remote mining locations. They were facing this exact issue. The IoT devices were sending all-green signals, but the physical generators were completely offline.&lt;/p&gt;

&lt;p&gt;A Zombie Connection happens when an IoT device loses cellular service and drops off the network abruptly, failing to send a standard TCP close frame. The Python server still considers it an open connection.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Naive (and Dangerous) Approach&lt;/strong&gt;&lt;br&gt;
Most Python WebSocket tutorials teach you to handle connections like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Python&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# DANGER: This line blocks forever if the connection drops silently
&lt;/span&gt;        &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recv&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; 
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;process_sensor_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In an IoT application, this code is a ticking time bomb. If 1000 devices drive into tunnels and lose signal silently, you now have 1000 coroutines permanently suspended in your asyncio event loop waiting for data that will never arrive. Eventually, your server runs out of file descriptors or memory, and crashes.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Fix: Aggressive Heartbeats and Timeouts&lt;/strong&gt;&lt;br&gt;
To build a fault-tolerant ingestion layer, we must stop trusting the socket state and actively interrogate the connection using &lt;em&gt;asyncio.wait_for&lt;/em&gt; and &lt;em&gt;WebSocket Ping/Pong&lt;/em&gt; frames.&lt;br&gt;
Here is how you write a non-blocking connection handler:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;Python&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process_sensor_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Your business logic goes here
&lt;/span&gt;    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;handle_connection&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# 1. Wait for data, but strictly timeout after 30 seconds
&lt;/span&gt;                &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;recv&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;30.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;process_sensor_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# 2. No data received in 30s. The device might be dead, or just idle.
&lt;/span&gt;                &lt;span class="c1"&gt;# We actively send a ping to find out.
&lt;/span&gt;                &lt;span class="n"&gt;pong_waiter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ping&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

                &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c1"&gt;# Wait 10 seconds for the hardware to reply to the ping
&lt;/span&gt;                    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;wait_for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pong_waiter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;timeout&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;10.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;TimeoutError&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="c1"&gt;# 3. No pong received. It's a Zombie. Kill it.
&lt;/span&gt;                    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Zombie connection detected. Freeing server resources.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;websocket&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
                    &lt;span class="k"&gt;break&lt;/span&gt; &lt;span class="c1"&gt;# Exit the loop and destroy the coroutine
&lt;/span&gt;
    &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="n"&gt;websockets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;exceptions&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ConnectionClosed&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Connection closed cleanly by the hardware.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why This Architecture Wins&lt;/strong&gt;&lt;br&gt;
By wrapping our asynchronous receives in strict timeouts, we take back control of the event loop. We allow the server to gracefully reap dead connections, free up memory, and prepare for the hardware to eventually regain cellular service and reconnect.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Remember, there are a lot of zombie "connections" out there :)&lt;/em&gt;&lt;/p&gt;

</description>
      <category>iot</category>
      <category>python</category>
      <category>websockets</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
