Leena Malhotra

Posted on Jan 16

What I Learned Debugging a Memory Leak No Profiler Caught

#webdev #programming #ai

Our production servers were dying. Not crashing—just slowly, inexorably running out of memory until they became unresponsive and had to be restarted. Every eight hours like clockwork.

The monitoring dashboards showed memory climbing steadily from the moment a server started. No spikes, no sudden jumps, just a relentless upward trend that ended the same way every time: restart, briefly clean slate, then the same slow death march begins again.

I spent three days with every profiler I could find. Chrome DevTools, heap dumps, memory snapshots, allocation timelines—the full arsenal of debugging tools that are supposed to catch this exact problem. They all showed the same thing: nothing unusual. No obvious leaks, no retained objects, no smoking gun.

The leak was there. The servers proved it every eight hours. But the tools couldn't see it.

When Your Tools Lie to You

Memory profilers work by taking snapshots of your application's heap and showing you what's being retained. They're built on a fundamental assumption: memory leaks are objects that should have been garbage collected but weren't.

This assumption is usually correct. Most memory leaks are caused by forgotten event listeners, circular references, or closures holding onto contexts longer than intended. Profilers are great at catching these.

Our leak wasn't any of those things.

I took heap snapshots every hour. Compared them. Analyzed object retention. Looked for growing arrays or cached data structures. Everything looked normal. Objects were being created and destroyed as expected. The garbage collector was running. There were no obvious references keeping things alive.

Yet memory kept climbing.

The problem with profilers is they show you what's in memory, not what's consuming memory. They can tell you about JavaScript objects on the heap, but they can't always tell you about the memory outside that heap—the memory consumed by native code, WebAssembly, or the browser's internal data structures.

Our leak was invisible to JavaScript profilers because it wasn't a JavaScript problem.

The Clue in the Pattern

After three days of failed profiling, I stopped looking at what the tools were showing me and started looking at what the servers were actually doing.

Memory climbed linearly. Not exponentially, not in steps, but at a perfectly consistent rate. Roughly 45MB per hour, every hour, regardless of traffic levels.

This was strange. Most memory leaks correlate with usage—more requests mean more leaked objects. Our leak didn't care about usage. It happened whether the server was handling ten requests per minute or a thousand.

Something was running on a timer, allocating memory at a constant rate, and never releasing it.

I started grepping through our codebase for setInterval. Found a few instances—analytics heartbeats, health checks, cache cleanup jobs. Nothing that should leak. All of them properly cleared their intervals on shutdown.

Then I found it. Not in our code—in a third-party analytics library we'd integrated six months ago.

The library was spawning Web Workers to handle event processing in the background. Every minute, it created a new worker, processed queued events, and then... didn't terminate the worker. It just let it sit there, idle, consuming memory.

The library assumed you were running in a browser where page refreshes would clean up workers. It never considered that in a Node.js environment, those workers would accumulate forever.

We had 480 orphaned workers after eight hours. Each one holding onto its own memory space. None of them visible to JavaScript heap profilers because Web Workers maintain separate memory contexts.

What Profilers Can't Show You

This experience taught me something uncomfortable: the tools you rely on have blind spots, and those blind spots are where the hardest bugs hide.

Profilers are designed to catch the common cases. Forgotten closures, event listeners that weren't removed, data structures that keep growing. They're excellent at finding problems in the code patterns they were designed to detect.

They're terrible at finding everything else.

Memory outside the JavaScript heap. Native modules, WebAssembly, GPU memory, worker threads—all of this consumes memory that JavaScript profilers can't see. If your leak is in native code or in a separate execution context, heap snapshots won't help.

Structural leaks versus object leaks. Profilers look for objects that shouldn't exist. They don't look for architectural patterns that cause memory growth. A perfectly valid cache that grows without bounds isn't a leak in the traditional sense—every object in it is intentional—but it has the same effect.

External resource consumption. File handles, database connections, sockets—these consume system resources that show up as memory pressure but don't appear as objects in your heap. You can leak connections without leaking JavaScript objects.

Time-based patterns. Profilers show you snapshots of state. They're not great at revealing patterns that only emerge over hours or days. A leak that allocates 1KB every minute looks identical to normal memory churn in a snapshot.

The Debugging Mindset That Actually Works

After finding the Web Worker leak, I realized I'd been debugging with the wrong mental model. I was looking for objects that shouldn't exist. I should have been looking for patterns that shouldn't repeat.

Start with behavior, not tools. Before opening a profiler, understand what the memory growth looks like. Is it linear or exponential? Does it correlate with traffic? Does it happen during specific operations? The pattern tells you what kind of leak you're hunting.

Question your assumptions about what memory means. JavaScript heap isn't the only memory that matters. System memory, GPU memory, worker memory—all of it counts. If profilers show a clean heap but system memory is climbing, the leak is somewhere else.

Look for what's created but never destroyed. Not just objects—anything. Timers, workers, connections, file handles, event listeners, cache entries. If something is created on a schedule or in response to events, trace its entire lifecycle. Where is it cleaned up? Are you sure?

Use process-level monitoring, not just application-level profiling. Tools like htop, ps, or platform-specific process monitors show you total memory consumption. When that doesn't match what your JavaScript profiler reports, you've found the boundary of your leak.

Isolate by elimination. Comment out code until the leak stops. It's crude but effective. Start with recent changes, external dependencies, background jobs—anything that runs independently of request handling.

Tools That Fill the Gaps

Once I understood that profilers had blind spots, I started building a toolkit for the problems they couldn't catch.

System-level monitoring showed the truth profilers missed. While heap snapshots claimed everything was fine, top showed memory climbing. That gap—between what JavaScript reported and what the system consumed—was where the leak lived.

Process comparison helped isolate the problem. I spun up a clean server and a leaking server side by side. Compared their resource usage. The leaking server had hundreds more threads. That led me to the Web Workers.

Structured logging revealed patterns over time. I added logs around worker creation and destruction. Watched the logs accumulate. Workers created: 480. Workers destroyed: 0. The pattern was obvious once I was looking for it.

AI-assisted code review caught what I missed. After finding the leak, I used Claude Sonnet 4.5 to review our integration code for similar patterns. It identified three other places where we were creating resources without explicit cleanup. Not leaks yet, but vulnerabilities waiting to happen.

Cross-model verification reduced blind spots. When debugging complex issues, I'll often analyze the same problem from different angles using different AI models. Gemini 2.5 Pro caught edge cases in our cleanup logic that other models missed. Each one has different strengths in code analysis.

The Lessons That Stuck

Memory leaks aren't always about forgotten objects. Sometimes they're about forgotten patterns—things that should stop but don't, resources that should be limited but aren't, cleanup that should happen but doesn't.

Your tools have opinions about what problems look like. Profilers assume leaks are retained objects. System monitors assume memory usage should correlate with work done. When your bug doesn't match these assumptions, the tools become less useful than basic observation.

The best debugging happens when you stop trusting your tools and start trusting the evidence. Servers were dying every eight hours. That was real. Profilers showed nothing. That was also real. The conflict between these truths was the clue.

Third-party code is where the weird bugs live. We assumed the analytics library worked correctly because it's widely used. It does work correctly—in browsers. We never questioned whether our environment matched its assumptions.

Good logging beats good profiling when the problem is architectural. Profilers show you state. Logs show you behavior over time. For leaks that emerge slowly, behavior is more informative than state.

What You Should Actually Do

Stop assuming your profiler will catch every memory leak. It won't. Build defense in depth:

Monitor system memory, not just heap memory. If they diverge, investigate why. The gap between them is where invisible leaks live.

Add lifecycle logging to anything that allocates resources. Workers, connections, timers, file handles—log when they're created and when they're destroyed. If creation logs outnumber destruction logs, you've found a leak.

Review third-party dependencies for environment assumptions. Libraries written for browsers might not behave correctly in Node. Libraries written for short-lived processes might leak in long-running ones. When using tools that generate or analyze code, verify they're designed for your execution environment.

Build resource cleanup into your shutdown procedures. When a server terminates, log what resources were still open. Open connections, pending timers, active workers—these are leak candidates.

Test for memory growth in staging with realistic durations. Don't just load test—time test. Run your application for hours or days in a staging environment that mirrors production. Watch memory over time, not just under load.

The Uncomfortable Truth

The hardest bugs aren't caught by the best tools. They're caught by developers who understand that tools have limits and know how to debug beyond those limits.

I spent three days with profilers finding nothing because I trusted them to show me the truth. I found the leak in thirty minutes once I stopped trusting them and started observing the system's actual behavior.

Your profiler is a lens, not the truth. It shows you what it's designed to see. Everything outside that design—Web Workers, native modules, architectural patterns, time-based behaviors—is invisible until you look for it with different tools or, more often, with careful observation and systematic elimination.

The next time you're debugging a memory leak that profilers can't catch, remember: the tools are looking for what they expect to find. Your job is to look for what shouldn't be there, even if the tools can't see it.

Memory leaks don't care what your profiler thinks. They only care what the operating system knows. Start there.

Debugging complex systems? Use Crompt AI to review code patterns across multiple AI models and catch the architectural issues that single-perspective analysis misses.