What 3 Days of Debugging WebSockets Taught Me

#javascript #webdev #programming #learning

I thought building a chat app would be easy. Then I spent three days debugging phantom WebSocket bugs.

Everybody can whip up a toy chat app. The real pain starts when you want to make it maintainable. These three lessons saved me from drowning in ghost bugs. If you’re building anything real with WebSockets, I hope they save you too.

⚔️ Lesson 1: Draw a Hard Line Between Client→Server and Server→Client Messages

In my first attempt, I shoved everything into a single .on("message") handler. End result: total chaos. Messages firing left and right, no clue who said what, and me drowning in logs.

The fix was stupidly simple:

Client → Server: only chats, receipts, typing events.

Server → Client: only info, errors, and routing payloads.

Once I separated these flows, the routing logic only lived where it should, and the server stopped losing its mind. Debugging went from “WTF is this?” to “oh, that’s exactly where it broke.”

Here’s the mental model that finally clicked for me:

    ws.on("message", (data, isBinary) => {

      if (isBinary) {
        logger.info("we have a binary payload in on messages! not handling that")
        return;
      }

      const recievedMessage: Envelope | null = parseEnvelope(data)

      if (recievedMessage == null) {
        logger.info("some weird message format recieved")
        return;
      }

      switch (recievedMessage.type) {

        case "chat":
          handleChatMessages(ws, recievedMessage, userToWs, user.id)
          break;

        case "ack":
          handleAck(ws, recievedMessage, userToWs, user.id)
          break;

        default:
          logger.info("you have sent an invalid choice.")
          break;
      }

    })

    // as you can see we only need to handle two types of messages!

2. Don’t mutate the payload schema mid-flight

Biggest rookie mistake I made: sneaking extra fields into my payloads because “eh, quick fix.” Guess what? Three days of phantom bugs later, I realized I was the ghost haunting my own system.

Rule of thumb:

Define your schema once.
Never mutate it in transit.
If you need optional stuff → build it optional into the schema.

Your future self will thank you.

import { z } from "zod";

/**
 * Client → Server: Chat message
 */
export const ChatMessageSchema = z.object({
  type: z.literal("chat"),
  to: z.string(),
  from: z.string(),
  messageId: z.string(),
  message: z.string(),
  mode: z.enum(["offline", "online"]),
  timestamp: z.number(),
  streamId: z.string().optional(), // should be string, not object
});

/**
 * Client → Server: Acknowledgement
 */
export const ChatAckSchema = z.object({
  type: z.literal("ack"),
  to: z.string(),
  from: z.string(),
  messageId: z.string(),
  timestamp: z.number(),
  streamId: z.string().optional(),
  ackType: z.enum(["read", "delivered"]),
});

/**
 * Server → Client: System info
 * Example: "you are connected", "server restarting", etc.
 */
export const SystemInfoSchema = z.object({
  type: z.literal("system"),
  message: z.string(),
});

/**
 * Server → Client: Error info
 * Covers internal / external components.
 */
export const SystemErrorSchema = z.object({
  type: z.literal("error"),
  component:z.string(),
  message: z.string(),
});

/**
 * Envelope: every message in/out must be one of these.
 */
export const EnvelopeSchema = z.union([
  ChatMessageSchema,
  ChatAckSchema,
  SystemInfoSchema,
  SystemErrorSchema,
]);

// ------------ Types ------------
export type ChatMessage = z.infer<typeof ChatMessageSchema>;
export type ChatAck = z.infer<typeof ChatAckSchema>;
export type SystemInfo = z.infer<typeof SystemInfoSchema>;
export type SystemError = z.infer<typeof SystemErrorSchema>;
export type Envelope = z.infer<typeof EnvelopeSchema>;

// as you can see even my message schema carries option redisstream section
// this helps in me marking which mode of message was delivered

3. Log every client exit (and don’t let React gaslight you)

Here’s a cursed one: I was testing my socket server with a React client. Connections kept dying with random exit codes like 1006 and 1005. I thought my server was broken. I debugged like a madman for three days straight.

The real culprit? React’s Strict Mode mounting/unmounting sockets dropping connection on and off.

Two takeaways:

Log how and when each client exits. It’ll save you hours.
If you’re testing with React → disable Strict Mode or just use a plain JS client.

Once I did that, the ghosts vanished and my server behaved like it should.

    ws.on("close", async (code, reason) => {
      console.log("❌ WS closed:", code, reason.toString());

      // 1. Global presence cleanup
      await terminateUserFromRedis(user.mobileNo)

      // 2. In-memory maps cleanup
      userToWs.delete(user.mobileNo);  // userId → ws map
      wsToUser.delete(ws);             // ws → userId map
      activeConnection.delete(ws);     // ws → liveness flag

      // 3. Logging
      logger.info(`User disconnected: ${user.mobileNo}`);
    });

    // this alone will help you in understand why any connection drops!
    // its your insurance against connections dropping left and right