DEV Community

Cover image for Debugging Node.js Out-of-Memory Crashes: A Practical, Step-by-Step Story
Abhinav
Abhinav

Posted on

Debugging Node.js Out-of-Memory Crashes: A Practical, Step-by-Step Story

last few gcs
How we tracked down a subtle memory leak that kept taking our production servers down—and how we fixed it for good.


The OOM That Ruined a Monday Morning

Everything looked normal—until alerts started firing. One by one, our Node.js API instances were crashing with a familiar but dreaded message:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
Enter fullscreen mode Exit fullscreen mode

The pattern was frustratingly consistent:

  • Servers ran fine for hours
  • Traffic increased
  • Memory climbed steadily
  • Then 💥—a crash

If you’ve ever dealt with Node.js in production, you already know what this smells like: a memory leak.

In this post, I’ll walk through exactly how we diagnosed the problem, what signals mattered most, and the simple fix that stabilized memory under heavy load.


Reading the GC Tea Leaves

Before touching any code, we looked closely at the garbage collector output from V8:

Mark-Compact (reduce) 646.7 (648.5) -> 646.6 (648.2) MB
Enter fullscreen mode Exit fullscreen mode

At first glance, it looks harmless. But the key insight was this:

GC freed almost nothing.

From ~646.7 MB to ~646.6 MB. That’s essentially zero.

What that tells us

  • GC is running frequently and expensively
  • Objects are still strongly referenced
  • Memory is not eligible for collection

In short: this is not “GC being slow”—this is leaked or over-allocated memory.


Preparing the Battlefield

1. Confirm the Heap Limit

First, we verified how much memory Node.js was actually allowed to use:

const v8 = require('v8');

console.log(
  'Heap limit:',
  Math.round(v8.getHeapStatistics().heap_size_limit / 1024 / 1024),
  'MB'
);
Enter fullscreen mode Exit fullscreen mode

This removes guesswork—especially important in containers or cloud runtimes.


2. Turn on GC Tracing

Next, we watched GC behavior in real time:

node --trace-gc server.js
Enter fullscreen mode Exit fullscreen mode

This shows:

  • Scavenge → minor GC (young objects)
  • Mark-Sweep / Mark-Compact → major GC (old generation)

Frequent major GCs with poor cleanup are a huge red flag.


3. Shrink the Heap (On Purpose)

Instead of waiting hours for production crashes, we forced the issue locally:

node --max-old-space-size=128 server.js
Enter fullscreen mode Exit fullscreen mode

A smaller heap means memory problems surface fast—often in minutes.


4. Reproduce with Load

We wrote a simple concurrent load script to mimic real traffic. Under load, memory climbed steadily and never came back down.

At this point, we had a reliable reproduction. Time to hunt the leak.


Template: Load Test Script for Memory Testing

To consistently reproduce the issue locally (instead of waiting for real traffic), we used the following load test template.

This script is intentionally minimal:

  • No external dependencies
  • Configurable concurrency
  • Responses are fully consumed (important for memory accuracy)
  • Designed for GC and heap behavior, not benchmarking

Usage

node load-test.js [concurrent] [endpoint]

# Example
node load-test.js 100 data
Enter fullscreen mode Exit fullscreen mode

Load Test Template Code

RES:
load-test

/**
 * Load Test Script for Memory Testing
 * Usage: node load-test.js [concurrent] [endpoint]
 * Example: node load-test.js 100 data
 */

const http = require('http');

const CONFIG = {
  hostname: 'localhost',
  port: 3000,
  endpoints: {
    data: {
      path: '/api/data',
      method: 'POST',
      headers: {
        'Content-Type': 'application/json'
      },
      body: JSON.stringify({
        items: ['sample_item'],
        userContext: {
          userId: 'test-user',
          sessionId: 'test-session'
        }
      })
    }
  }
};

const CONCURRENT = parseInt(process.argv[2]) || 50;
const ENDPOINT = process.argv[3] || 'data';

const endpointConfig = CONFIG.endpoints[ENDPOINT];
if (!endpointConfig) {
  console.error(
    `Unknown endpoint: ${ENDPOINT}. Available: ${Object.keys(CONFIG.endpoints).join(', ')}`
  );
  process.exit(1);
}

const makeRequest = (requestId) => {
  return new Promise((resolve) => {
    const startTime = Date.now();

    const options = {
      hostname: CONFIG.hostname,
      port: CONFIG.port,
      path: endpointConfig.path,
      method: endpointConfig.method,
      headers: endpointConfig.headers
    };

    const req = http.request(options, (res) => {
      // Consume response to avoid socket & memory leaks
      res.resume();

      res.on('end', () => {
        resolve({
          requestId,
          status: res.statusCode,
          duration: Date.now() - startTime,
          success: res.statusCode >= 200 && res.statusCode < 300
        });
      });
    });

    req.on('error', (err) => {
      resolve({
        requestId,
        success: false,
        duration: Date.now() - startTime,
        error: err.message
      });
    });

    req.setTimeout(30000, () => {
      req.destroy();
      resolve({
        requestId,
        success: false,
        duration: Date.now() - startTime,
        error: 'Timeout'
      });
    });

    if (endpointConfig.body) {
      req.write(endpointConfig.body);
    }

    req.end();
  });
};

const runLoadTest = async () => {
  console.log('='.repeat(60));
  console.log('MEMORY LOAD TEST');
  console.log('='.repeat(60));
  console.log(`Endpoint: ${endpointConfig.method} ${endpointConfig.path}`);
  console.log(`Concurrent Requests: ${CONCURRENT}`);
  console.log(`Target: ${CONFIG.hostname}:${CONFIG.port}`);
  console.log('='.repeat(60));
  console.log('\nStarting load test...\n');

  const startTime = Date.now();

  const promises = Array.from(
    { length: CONCURRENT },
    (_, i) => makeRequest(i + 1)
  );

  const results = await Promise.all(promises);
  const totalTime = Date.now() - startTime;

  const successful = results.filter(r => r.success);
  const failed = results.filter(r => !r.success);
  const durations = successful.map(r => r.duration);

  const avgDuration = durations.length
    ? Math.round(durations.reduce((a, b) => a + b, 0) / durations.length)
    : 0;

  console.log('='.repeat(60));
  console.log('RESULTS');
  console.log('='.repeat(60));
  console.log(`Total Requests:    ${CONCURRENT}`);
  console.log(`Successful:        ${successful.length}`);
  console.log(`Failed:            ${failed.length}`);
  console.log(`Total Time:        ${totalTime}ms`);
  console.log(`Avg Response Time: ${avgDuration}ms`);
  console.log(`Min Response Time: ${Math.min(...durations)}ms`);
  console.log(`Max Response Time: ${Math.max(...durations)}ms`);
  console.log(`Requests/sec:      ${Math.round(CONCURRENT / (totalTime / 1000))}`);
  console.log('='.repeat(60));

  if (failed.length) {
    console.log('\nFailed requests:');
    failed.slice(0, 5).forEach(r => {
      console.log(`  Request #${r.requestId}: ${r.error}`);
    });
  }

  console.log('\n>>> Check server logs for [MEM] entries <<<\n');
};

runLoadTest().catch(console.error);
Enter fullscreen mode Exit fullscreen mode

Following the Memory Trail

We added lightweight logging around suspicious paths:

const logMemory = (label) => {
  const { heapUsed } = process.memoryUsage();
  console.log(`[MEM] ${label}: ${Math.round(heapUsed / 1024 / 1024)} MB`);
};
Enter fullscreen mode Exit fullscreen mode

Under load, the logs told a clear story:

processData START: 85 MB
processData START: 92 MB
processData START: 99 MB
processData START: 107 MB
Enter fullscreen mode Exit fullscreen mode

Memory kept climbing—request after request.

Eventually, all roads led to one innocent-looking helper function.


The Real Culprit

const getItemAssets = (itemType) => {
  const assetConfig = {
    item_a: { thumbnail: '...', full: '...' },
    item_b: { thumbnail: '...', full: '...' },
    // many more entries
  };

  return assetConfig[itemType] || { thumbnail: '', full: '' };
};
Enter fullscreen mode Exit fullscreen mode

Why this was disastrous

  • The config object was recreated on every call
  • The function ran multiple times per request
  • Under concurrency, tens of thousands of objects were created per second

Even though GC could collect them, allocation happened faster than cleanup—pushing objects into the old generation and eventually exhausting the heap.


The Fix: One Small Move, Huge Impact

const ASSET_CONFIG = Object.freeze({
  item_a: { thumbnail: '...', full: '...' },
  item_b: { thumbnail: '...', full: '...' },
});

const DEFAULT_ASSET = Object.freeze({ thumbnail: '', full: '' });

const getItemAssets = (itemType) =>
  ASSET_CONFIG[itemType] || DEFAULT_ASSET;
Enter fullscreen mode Exit fullscreen mode

What changed?

  • Objects are created once, not per request
  • Zero new allocations in the hot path
  • Dramatically reduced GC pressure

Proving the Fix Worked

We reran the exact same test:

Before

  • Heap climbed relentlessly
  • GC freed almost nothing
  • Process crashed at ~128 MB

After

  • Heap usage oscillated within a tight range
  • Minor GCs cleaned memory efficiently
  • No crashes—even under sustained load

Final Thought

Most Node.js OOM crashes aren’t caused by “huge data” or “bad GC.”
They’re caused by small, repeated allocations in the wrong place.

Once you learn to read GC logs and control allocation rate, memory bugs stop being mysterious—and start being fixable.

Top comments (1)

Collapse
 
abhivyaktii profile image
Abhinav

adding to this:

  1. moved the config to db, now each entry has column with the asset
  2. implemented cache