DEV Community

Prashant Swaroop
Prashant Swaroop

Posted on

What 3 Days of Debugging WebSockets Taught Me

I thought building a chat app would be easy. Then I spent three days debugging phantom WebSocket bugs.

Everybody can whip up a toy chat app. The real pain starts when you want to make it maintainable. These three lessons saved me from drowning in ghost bugs. If you’re building anything real with WebSockets, I hope they save you too.

⚔️ Lesson 1: Draw a Hard Line Between Client→Server and Server→Client Messages

In my first attempt, I shoved everything into a single .on("message") handler. End result: total chaos. Messages firing left and right, no clue who said what, and me drowning in logs.

The fix was stupidly simple:

Client → Server: only chats, receipts, typing events.

Server → Client: only info, errors, and routing payloads.

Once I separated these flows, the routing logic only lived where it should, and the server stopped losing its mind. Debugging went from “WTF is this?” to “oh, that’s exactly where it broke.”

Here’s the mental model that finally clicked for me:

web-socket-delivery-message

    ws.on("message", (data, isBinary) => {

      if (isBinary) {
        logger.info("we have a binary payload in on messages! not handling that")
        return;
      }

      const recievedMessage: Envelope | null = parseEnvelope(data)

      if (recievedMessage == null) {
        logger.info("some weird message format recieved")
        return;
      }

      switch (recievedMessage.type) {

        case "chat":
          handleChatMessages(ws, recievedMessage, userToWs, user.id)
          break;

        case "ack":
          handleAck(ws, recievedMessage, userToWs, user.id)
          break;

        default:
          logger.info("you have sent an invalid choice.")
          break;
      }

    })

    // as you can see we only need to handle two types of messages!
Enter fullscreen mode Exit fullscreen mode

2. Don’t mutate the payload schema mid-flight

Biggest rookie mistake I made: sneaking extra fields into my payloads because “eh, quick fix.” Guess what? Three days of phantom bugs later, I realized I was the ghost haunting my own system.

Rule of thumb:

  • Define your schema once.
  • Never mutate it in transit.
  • If you need optional stuff → build it optional into the schema.

Your future self will thank you.

import { z } from "zod";

/**
 * Client → Server: Chat message
 */
export const ChatMessageSchema = z.object({
  type: z.literal("chat"),
  to: z.string(),
  from: z.string(),
  messageId: z.string(),
  message: z.string(),
  mode: z.enum(["offline", "online"]),
  timestamp: z.number(),
  streamId: z.string().optional(), // should be string, not object
});

/**
 * Client → Server: Acknowledgement
 */
export const ChatAckSchema = z.object({
  type: z.literal("ack"),
  to: z.string(),
  from: z.string(),
  messageId: z.string(),
  timestamp: z.number(),
  streamId: z.string().optional(),
  ackType: z.enum(["read", "delivered"]),
});

/**
 * Server → Client: System info
 * Example: "you are connected", "server restarting", etc.
 */
export const SystemInfoSchema = z.object({
  type: z.literal("system"),
  message: z.string(),
});

/**
 * Server → Client: Error info
 * Covers internal / external components.
 */
export const SystemErrorSchema = z.object({
  type: z.literal("error"),
  component:z.string(),
  message: z.string(),
});

/**
 * Envelope: every message in/out must be one of these.
 */
export const EnvelopeSchema = z.union([
  ChatMessageSchema,
  ChatAckSchema,
  SystemInfoSchema,
  SystemErrorSchema,
]);

// ------------ Types ------------
export type ChatMessage = z.infer<typeof ChatMessageSchema>;
export type ChatAck = z.infer<typeof ChatAckSchema>;
export type SystemInfo = z.infer<typeof SystemInfoSchema>;
export type SystemError = z.infer<typeof SystemErrorSchema>;
export type Envelope = z.infer<typeof EnvelopeSchema>;

// as you can see even my message schema carries option redisstream section
// this helps in me marking which mode of message was delivered 
Enter fullscreen mode Exit fullscreen mode

3. Log every client exit (and don’t let React gaslight you)

Here’s a cursed one: I was testing my socket server with a React client. Connections kept dying with random exit codes like 1006 and 1005. I thought my server was broken. I debugged like a madman for three days straight.

The real culprit? React’s Strict Mode mounting/unmounting sockets dropping connection on and off.

Two takeaways:

  1. Log how and when each client exits. It’ll save you hours.
  2. If you’re testing with React → disable Strict Mode or just use a plain JS client.

Once I did that, the ghosts vanished and my server behaved like it should.

    ws.on("close", async (code, reason) => {
      console.log("❌ WS closed:", code, reason.toString());

      // 1. Global presence cleanup
      await terminateUserFromRedis(user.mobileNo)

      // 2. In-memory maps cleanup
      userToWs.delete(user.mobileNo);  // userId → ws map
      wsToUser.delete(ws);             // ws → userId map
      activeConnection.delete(ws);     // ws → liveness flag

      // 3. Logging
      logger.info(`User disconnected: ${user.mobileNo}`);
    });

    // this alone will help you in understand why any connection drops!
    // its your insurance against connections dropping left and right
Enter fullscreen mode Exit fullscreen mode

🏁 Closing Thoughts

Toy chat apps are easy. Debuggable chat apps are hard. The sooner you:

draw hard lines between client and server messages,

respect your schemas, and

log religiously,

…the less you’ll fear WebSockets, and the faster you’ll build systems you can actually trust.

Top comments (0)