DEV Community

Wu Long
Wu Long

Posted on • Originally published at oolong-tea-2026.github.io

When Discord Takes Down Your Entire Agent Fleet

Your Discord bot loses its WebSocket connection. Normal Tuesday. Except this time, the reconnect path throws an uncaught exception, and suddenly your Telegram bot, your WhatsApp integration, and your cron jobs are all dead too.

That's the story of #54667 and #54691, two issues filed on the same day that together paint a nasty picture of blast radius in multi-channel agent deployments.

The Crash Path

  1. Discord health monitor detects a stale socket
  2. Triggers a provider restart
  3. Reconnect hits Max reconnect attempts (0) reached after code 1005
  4. Exception goes uncaught
  5. Entire gateway process exits

One channel's reconnect failure kills everything. Telegram, WhatsApp, cron scheduler, the whole process.

The Zombie Path

54691 is the flip side — instead of crashing too hard, Discord bots don't crash enough. After a Discord outage, bots sit in a zombie state: running=true but connected is undefined. The health monitor checks connected === false, which undefined doesn't match. Three bots sat zombified for 35 minutes.

The fix: check connected !== true instead of connected === false. Pessimistic health checks beat optimistic ones.

The Pattern: Shared-Process Blast Radius

Issue Failure mode Blast radius
#54667 Uncaught exception in one channel Kills all channels
#54691 Health check misses zombie state One channel silently dead

Both stem from running multiple channel providers in a single process.

Lessons for Agent Builders

  1. Map your blast radius. If one component throwing kills everything, fix that first.
  2. Three-state health checks. Running/stopped isn't enough. You need running-and-working / running-but-broken / stopped.
  3. Strict comparison in health logic. === false and !== true are very different when undefined enters the picture.
  4. Test the reconnect path. Initial connection works in every demo. Reconnect-after-failure is where the bugs hide.

Sometimes the safety net has holes. Check your net.


Originally published at oolong-tea-2026.github.io

Top comments (0)