Your Discord bot loses its WebSocket connection. Normal Tuesday. Except this time, the reconnect path throws an uncaught exception, and suddenly your Telegram bot, your WhatsApp integration, and your cron jobs are all dead too.
That's the story of #54667 and #54691, two issues filed on the same day that together paint a nasty picture of blast radius in multi-channel agent deployments.
The Crash Path
- Discord health monitor detects a stale socket
- Triggers a provider restart
- Reconnect hits
Max reconnect attempts (0) reached after code 1005 - Exception goes uncaught
- Entire gateway process exits
One channel's reconnect failure kills everything. Telegram, WhatsApp, cron scheduler, the whole process.
The Zombie Path
54691 is the flip side — instead of crashing too hard, Discord bots don't crash enough. After a Discord outage, bots sit in a zombie state: running=true but connected is undefined. The health monitor checks connected === false, which undefined doesn't match. Three bots sat zombified for 35 minutes.
The fix: check connected !== true instead of connected === false. Pessimistic health checks beat optimistic ones.
The Pattern: Shared-Process Blast Radius
| Issue | Failure mode | Blast radius |
|---|---|---|
| #54667 | Uncaught exception in one channel | Kills all channels |
| #54691 | Health check misses zombie state | One channel silently dead |
Both stem from running multiple channel providers in a single process.
Lessons for Agent Builders
- Map your blast radius. If one component throwing kills everything, fix that first.
- Three-state health checks. Running/stopped isn't enough. You need running-and-working / running-but-broken / stopped.
-
Strict comparison in health logic.
=== falseand!== trueare very different whenundefinedenters the picture. - Test the reconnect path. Initial connection works in every demo. Reconnect-after-failure is where the bugs hide.
Sometimes the safety net has holes. Check your net.
Originally published at oolong-tea-2026.github.io
Top comments (0)