DEV Community

Cover image for How to Optimize a Real-Time Chat Application: A Deep Dive into Our Production Codebase
Manas Shinde
Manas Shinde

Posted on

How to Optimize a Real-Time Chat Application: A Deep Dive into Our Production Codebase

Building a real-time chat application is a deceptively challenging engineering task. It combines the complexities of persistent socket networking, heavy client-side cryptography, concurrent asset downloads, and frequent UI re-renders. Without careful engineering, your chat app will quickly become sluggish, drain users’ battery, and overwhelm your database — and once performance debt piles up, it’s incredibly expensive to refactor.

In this post, we’ll walk through four critical optimization techniques we implemented in our production codebase to solve exactly these issues. For each technique, we’ll break down:

  1. The Problem we faced (the real pain, not the sanitised version).
  2. Our Solution (with actual architecture patterns and code snippets).
  3. The Ideal Industry Approach — and where we deliberately chose a simpler path.
  4. An Honest Rating & Assessment — where it shines, where it creaks.

We’ve also highlighted exactly where and what kinds of illustrations or screenshots you should insert to make this post highly visual and engaging.


🚀 Optimization 1: Client-Side LRU Memory Cache for E2EE Media

1. The Problem

In an End-to-End Encrypted (E2EE) chat app, files are stored on the server as opaque encrypted blobs. Every time a user scrolls through their chat, opens the file attachment modal, or switches conversations, the client must download those encrypted bytes and then run CPU-heavy cryptographic decryption in JavaScript (often via WebAssembly or the browser’s SubtleCrypto API).

Doing this repeatedly causes severe UI stuttering during scroll, high CPU temperatures on mobile, and massive wasted network data. Users see spinners for assets they literally just decrypted a few seconds earlier.

2. Our Solution

We built a centralized, in-memory Least Recently Used (LRU) Cache Manager in JavaScript.

  • The Cache Mechanism: We created a class (mediaCache.js) that maps a compound cache key (file ID + hash) to decrypted data: { fileData, objectUrl, size, timestamp }. The decrypted ArrayBuffer is stored alongside a blob URL created via URL.createObjectURL() so the UI can render instantly. The URL is revoked when the entry is evicted, preventing memory leaks.
  • LRU Eviction: To stop the tab from crashing under memory pressure, we enforce two limits: a maximum of 100 entries and a total byte size cap of 250 MB. When either limit is exceeded, the least recently accessed entry is evicted. This double limit prevents a single large video from displacing dozens of useful small images.
  • Global Access: Every file component (FileList, AllFilesModal, etc.) checks this cache before making any API call. On a cache hit, the asset loads in under 1 millisecond — zero network, zero decryption, zero CPU spikes.

3. The Ideal Way

In non-encrypted apps, you’d simply set Cache-Control: immutable on the server and let the browser’s HTTP cache do the work. With E2EE, that doesn’t work because the bytes fetched from the server are useless until decrypted.

The true industry standard is to persist decrypted files inside IndexedDB, combined with a Service Worker that intercepts fetch requests and serves the cached content transparently. This survives page refreshes and allows offline access. However, it also means decrypted material stays on disk longer, raising security concerns that demand careful key rotation and automatic expiration logic.

4. Honest Rating: ⭐️ 8/10 (Highly Professional)

What’s great: In-memory LRU is extremely fast, simple to debug, and automatically purges when the tab closes — drastically limiting the window in which decrypted data could be scraped from memory. The dual eviction strategy (count + total size) prevents both memory fragmentation and out-of-memory crashes, something many teams overlook.

The uncomfortable truth: Our eviction logic scans the entire cache to find the oldest entry ([...cache.entries()].sort(...)), which is O(n) but perfectly fine for 100 items. A proper doubly-linked list + hashmap LRU would be O(1), but that’s overkill for this scale. More importantly, we didn’t build the IndexedDB + Service Worker layer. The real reason? Time constraints. That’s honest technical debt that we’ve chosen to carry because in-memory caching alone solved 90% of the jank.


Client side cache


🔄 Optimization 2: Active Request and Event Deduplication

1. The Problem

Chat interfaces often render the same file in multiple places at once — for example, in the main message list and a sidebar “recent files” widget. Both React components mount simultaneously, see the same file ID, and fire independent HTTP requests for the same encrypted blob.

This creates a request storm: multiple identical downloads and decryptions competing for network and CPU. On the socket side, reconnection storms are another beast. After a brief disconnect, the server might re-emit previously delivered messages because the client didn’t acknowledge them in time. Without deduplication, the UI flashes duplicate messages and triggers useless Redux store updates.

2. Our Solution

We solved both problems with an in-flight request registry and a bounded processed-ID Set.

Network Request Deduplication (FileList.js)
We maintain a global Map of active download promises. Any component that needs a file first checks this map. If another component has already kicked off the download, we return the same promise. Only one network request and one decryption run ever happens concurrently.

const activeRequests = new Map();

async function fetchAndDecryptFile(cacheKey) {
  if (activeRequests.has(cacheKey)) {
    return activeRequests.get(cacheKey);
  }

  const promise = (async () => {
    try {
      const encryptedBlob = await api.downloadFile(cacheKey);
      const decrypted = await cryptoWorker.decrypt(encryptedBlob);
      mediaCache.set(cacheKey, decrypted);
      return decrypted;
    } finally {
      // Clean up so future retries aren't blocked
      activeRequests.delete(cacheKey);
    }
  })();

  activeRequests.set(cacheKey, promise);
  return promise;
}
Enter fullscreen mode Exit fullscreen mode

The finally block is critical: if the download fails, the map entry is removed so a subsequent retry can fire a fresh request.

Socket Message Deduplication (onetooneSocket.js)
We keep a Set of recently processed message IDs. Any incoming packet with an ID already in the Set is silently discarded before it ever touches Redux or React.

const processedIds = new Set();
const MAX_PROCESSED_IDS = 1000;

socket.on('message', (msg) => {
  if (processedIds.has(msg.id)) return;

  processedIds.add(msg.id);

  // Prune the set to keep memory flat
  if (processedIds.size > MAX_PROCESSED_IDS) {
    const it = processedIds.values();
    for (let i = 0; i < Math.floor(MAX_PROCESSED_IDS / 2); i++) {
      processedIds.delete(it.next().value);
    }
  }

  // Dispatch to store, update UI...
});
Enter fullscreen mode Exit fullscreen mode

Capping at 1,000 IDs and occasionally pruning half the Set keeps the memory footprint negligible even during multi-hour sessions.

3. The Ideal Way

Request deduplication is exactly what libraries like React Query (TanStack Query) and SWR provide out of the box — with automatic garbage collection, stale-while-revalidate, and retry logic. We didn’t use them because we needed full control over the decryption pipeline at the time, but they’re the standard today.

For socket deduplication, the robust approach is to store every message in a local database (IndexedDB) with a UNIQUE constraint on message_id. The database becomes the ultimate deduplication authority, and you don’t need an in-memory set at all. Combined with server-side sequence numbers, this is bulletproof.

4. Honest Rating: ⭐️ 9/10 (Excellent)

What’s great: The promise-sharing pattern is a sophisticated, low-overhead JavaScript idiom that eliminates duplicate network traffic without any external dependency. The bounded Set with batch pruning is clever and prevents the “death by a thousand small allocations” problem that plagues long-lived SPAs.

The uncomfortable truth: The socket deduplication relies entirely on msg.id being globally unique. If two truly different messages somehow collided in ID (extremely unlikely with UUIDv4, but possible), one would be silently dropped. Also, the hand-rolled promise registry lacks the sophisticated cache invalidation and revalidation strategies that React Query gives you for free — so our offline behaviour is still rudimentary.


centralised file store


🎨 Optimization 3: Lazy Component Rendering & Strict Memoization

1. The Problem

React chat lists can become brutal rendering bottlenecks. Every keystroke in the message input updates parent state, which by default triggers a re-render of the entire message list, file panels, and sidebars.

With hundreds of messages containing images, PDFs, and interactive elements, this results in noticeable typing lag, dropped frames when new messages arrive, and serious battery drain on mobile devices. Worse, many components render in a “minimized” state (a collapsed sidebar, for instance) yet still fully initialize heavy sub-components like PDF.js workers — work the user can’t even see.

2. Our Solution

We applied a multi-layered strategy: strict prop-level memoization, deferred heavy rendering, and computational memoization.

Custom Memoization (FileList.js)
We wrapped the file component in React.memo with a custom equality comparator that checks only the props that actually affect its output:

const areEqual = (prevProps, nextProps) => {
  return (
    prevProps.fileName === nextProps.fileName &&
    prevProps.fileSize === nextProps.fileSize &&
    prevProps.isMinimized === nextProps.isMinimized &&
    prevProps.thumbnailSignature === nextProps.thumbnailSignature
  );
};

export default React.memo(FileList, areEqual);
Enter fullscreen mode Exit fullscreen mode

This prevents React from even entering the component’s render function during typing or unrelated state changes.

Deferred Heavy Rendering (isMinimized)
Inside FileList, we check if the component is minimized. If so, we skip the entire download/decryption/preview pipeline and render only a lightweight SVG extension icon. The heavy lifting is deferred until the user actually expands the view — and by then, the cache from Optimization 1 usually has the data ready.

Computational Memoization
In list modals, expensive search and sort operations are wrapped in useMemo, with their dependencies set to only the search query and sort order. This ensures we don’t re-sort hundreds of files because a completely unrelated keystroke triggered a re-render.

3. The Ideal Way

For moderate list sizes, memoization works well. But for truly massive chat histories (thousands of messages), the real gold standard is list virtualization using libraries like react-virtuoso or react-window. Virtualization only renders the 10–20 items currently visible in the viewport, recycling DOM nodes as the user scrolls. This keeps the React tree tiny regardless of list length.

4. Honest Rating: ⭐️ 7/10 (Solid but Brittle)

What’s great: The isMinimized pattern is a huge win — it’s a classic “don’t do work you don’t need” optimization that pays immediate dividends. The custom areEqual comparator is surgical and demonstrably reduces render counts.

The uncomfortable truth: Custom areEqual functions are a maintenance trap. If a new developer adds a prop that affects rendering but forgets to update the comparator, you get subtle staleness bugs that are hard to debug. A safer pattern is to pass only primitive props (so shallow comparison works) and push expensive derivations into selectors. We also still haven’t virtualized the main chat list. For sessions with 2,000+ messages, our DOM node count is objectively too high, and we’re leaning on browser layout optimizations to bail us out. That’s not ideal.


⚡ Optimization 4: Throttled Socket Synchronization & Multi-Device Broadcasting

1. The Problem

When a user reconnects after a dropout (elevator, tunnel, switching networks), the client needs to catch up on missed messages. The naive approach requests all messages since the last known one, and the server pumps them down the socket as fast as possible.

This is a disaster: the Node.js event loop gets blocked serializing and sending hundreds of messages in a tight loop, the TCP socket buffer floods, the database gets slammed with large catch-up queries, and other users on the same server instance experience latency spikes. If the user has multiple tabs or devices open, each one might create its own socket and sync independently, multiplying the chaos.

2. Our Solution

We built a throttled, multi-socket-aware sync architecture.

Multi-Socket Mapping (onetoone.js)
On the server, each user is mapped to a Set of socket IDs, not a single one. When a message needs to be delivered, we emit to all sockets for that user, skipping the originating socket to prevent echo. This gives us seamless multi-tab support without multiple independent sessions.

Timestamp-Based Catchup
The client stores the server-generated timestamp of the last successfully processed message. On reconnection, it sends this timestamp in the handshake. The server then queries only for messages with created_at > lastSyncTimestamp, not the entire history.

Non-Blocking Throttled Sync (group_chat.js)
The real key: we spin off the catch-up in an async IIFE that processes messages in batches, explicitly yielding the event loop every 10 messages:

const syncMissedMessages = async (socket, messages) => {
  for (let i = 0; i < messages.length; i++) {
    socket.emit('message', messages[i]);

    if ((i + 1) % 10 === 0) {
      await new Promise(resolve => setTimeout(resolve, 5));
    }
  }
};
Enter fullscreen mode Exit fullscreen mode

That 5ms pause is tiny, but it’s enough for Node.js to handle other incoming events — other users’ chat messages, typing indicators, presence updates. This simple cooperative multitasking keeps the entire server responsive.

3. The Ideal Way

Our approach is remarkably close to production best practices. Two refinements big-tech stacks often add:

  • Sequence-based syncing: Instead of timestamps (which can suffer from clock skew), each chat channel maintains a monotonically increasing sequence number. The client remembers the last sequence it processed, making catch-up deterministic.
  • Redis pub/sub adapter: In a multi-server deployment, a Redis backplane ensures messages emitted on one Node instance reach sockets on another, with cross-instance deduplication. We run a single server instance, so this hasn’t been necessary yet.

4. Honest Rating: ⭐️ 8.5/10 (Very Solid Production Engineering)

What’s great: The setTimeout(resolve, 5) trick is a pragmatic, battle-tested way to prevent event-loop starvation without complex worker-thread architectures. The multi-socket mapping elegantly solves the multi-tab problem, and timestamp-based syncing is simple and effective when server clocks are kept in sync (e.g., via NTP).

The uncomfortable truth: Yielding every 10 messages with a fixed 5ms delay is a heuristic. Under heavy load, those pauses add up (1,000 missed messages = ~500ms of total delay). A more adaptive approach would be to use setImmediate() or process.nextTick() between each message, yielding as soon as the microtask queue is empty. Also, the batch size (10) and pause duration (5ms) were chosen by feel, not rigorous load testing. In a true enterprise environment, these numbers would be backed by profiling. The lack of sequence-based IDs means we’re one server-clock jump away from missing or duplicating messages during catchup — a risk we’ve mitigated with NTP but haven’t eliminated.


missing messaegs


📝 Key Takeaways for Your Development Cycle

To summarize, here are the core rules you can apply to your own development cycle:

Optimization Area The Core Rule Dev Cycle Benefit
Media Assets Never download or decrypt twice. Build local caching with size + count limits. Reduces API costs and client CPU load dramatically.
Network Requests Deduplicate in-flight requests at the promise level. Eliminates redundant database/network traffic and stops request storms.
React UI Rendering Memoize aggressively and skip rendering for minimized/offscreen elements. Keeps input responsive and frame rate stable.
WebSockets Always sync from an offset and yield the event loop during bulk sends. Prevents server stalls and UI lock-ups during reconnection.

The uncomfortable meta-lesson: Most of these optimizations aren’t clever algorithms — they’re just disciplined resource management. We didn’t invent new computer science; we simply refused to waste CPU, network, or memory. The real challenge is cultural: prioritizing this work before it becomes a user complaint. By designing these layers early, you prevent performance debt that becomes brutally expensive to refactor later.

Top comments (0)