Bill Tu

Posted on Apr 8

From Terminal to Flame Graph: Exporting CPU Profiles From Production

#node #javascript #eventloop

loop-detective gives you a text-based diagnostic report in your terminal. It tells you which functions are heavy, what patterns are blocking the event loop, and where the slow I/O is. For most debugging sessions, that's enough.

But sometimes you need more. You need to see the full call tree. You need to zoom into a 200ms window and understand every function that executed. You need a flame graph.

With v1.4.0, loop-detective can now export the raw V8 CPU profile to a .cpuprofile file — the same format Chrome DevTools uses. One flag:

loop-detective 12345 --save-profile ./profile.cpuprofile

This article explains what's in that file, how to use it, and why combining terminal diagnostics with visual profiling gives you the best of both worlds.

What's Inside a .cpuprofile File

When loop-detective profiles a Node.js process, it uses the V8 Profiler via the Chrome DevTools Protocol. The profiler works by statistical sampling: every ~100 microseconds, it records which JavaScript function is currently executing. Over thousands of samples, this builds a statistical picture of where CPU time is spent.

The raw output is a JSON object with three key fields:

{
  "nodes": [
    {
      "id": 1,
      "callFrame": {
        "functionName": "processRequest",
        "url": "/app/server.js",
        "lineNumber": 41,
        "columnNumber": 2
      },
      "hitCount": 342,
      "children": [2, 3, 7]
    }
  ],
  "startTime": 1705312345000000,
  "endTime": 1705312355000000,
  "samples": [1, 1, 2, 3, 3, 3, 1, 7, ...],
  "timeDeltas": [120, 98, 105, 112, ...]
}

nodes: A call tree. Each node is a function with its file path, line number, and child nodes. This represents every unique call path the profiler observed.
samples: A time series of node IDs. Each entry is "which function was on top of the stack at this sample tick."
timeDeltas: The time gap (in microseconds) between consecutive samples.

This is the same format that Chrome DevTools produces when you record a performance profile in the browser. The .cpuprofile extension is a convention — the file is just JSON.

What loop-detective Already Tells You

The built-in analysis processes this raw data and gives you a focused report:

  Top CPU-Heavy Functions
────────────────────────────────────────────────────────────
   1. heavyComputation
      ██████████████░░░░░░ 6245ms (62.3%)
      /app/server.js:42:1
   2. JSON.parse
      ███░░░░░░░░░░░░░░░░░ 823ms (8.2%)
      (native):1:1

  Diagnosis
────────────────────────────────────────────────────────────
   HIGH  cpu-hog
         Function "heavyComputation" consumed 62.3% of CPU time
         → Consider breaking this into smaller async chunks

This is great for quick diagnosis. You know the top offenders, you know the patterns, you know where to look. But the analysis is lossy — it shows you the top 20 functions and the top 5 call stacks. The raw profile has thousands of nodes and tens of thousands of samples.

When You Need the Full Picture

There are cases where the summary isn't enough:

"The top function is only 15% of CPU, but the app is still slow." The problem might be spread across many functions. A flame graph shows the full distribution at a glance — wide bars are slow, narrow bars are fast.

"I need to see the exact call path." The built-in call stacks show the path to the top 5 heavy functions. But what if the problem is a function that's called from 20 different places? A flame graph shows every call path and how much time each one contributes.

"I need to share this with my team." A .cpuprofile file can be opened by anyone with Chrome. No special tools needed. Drop it in a Slack thread, attach it to a Jira ticket, include it in a post-mortem.

"I want to compare before and after." Save a profile before your fix, save another after. Open both in speedscope and compare side by side.

How to Use the Saved Profile

Chrome DevTools

The most accessible option — everyone has Chrome.

Open Chrome (or any Chromium browser)
Open DevTools (F12)
Go to the Performance tab
Click the ↑ upload button (or the "Load profile" option)
Select your .cpuprofile file

You get:

Flame chart: A timeline view showing function execution over time. The x-axis is time, the y-axis is call depth. Wide bars = slow functions.
Bottom-up view: Functions sorted by self time. Shows where CPU time was actually spent (not including children).
Call tree: Top-down view from the entry point. Shows how time flows through your call hierarchy.
Search: Ctrl+F to find specific functions by name.

speedscope.app

A dedicated profile viewer that's faster and more interactive than Chrome DevTools for large profiles.

Visit speedscope.app
Drag and drop the .cpuprofile file
Choose your view:
- Time Order: flame chart showing execution over time
- Left Heavy: aggregated flame graph, merged by call stack
- Sandwich: shows callers and callees for any selected function

speedscope handles large profiles (100MB+) smoothly and supports keyboard navigation for zooming and panning.

VS Code

Several extensions can open .cpuprofile files directly in VS Code:

vscode-js-profile-flame: Shows an inline flame graph
Flame Chart Visualizer: Interactive flame chart in a VS Code tab

This is convenient when you're already in your editor and want to jump directly from the profile to the source code.

The Combined Workflow

Here's how terminal diagnostics and visual profiling complement each other:

# Step 1: Quick diagnosis — what's the problem?
loop-detective 12345 -d 10

# Output tells you: cpu-hog on processPayload, 54% CPU
# But you want to understand the full call tree

# Step 2: Capture a longer profile with export
loop-detective 12345 -d 60 --save-profile ./investigation.cpuprofile

# Step 3: Open in Chrome DevTools or speedscope
# - Flame graph shows processPayload is called from 3 different routes
# - One route accounts for 80% of the calls
# - Inside processPayload, JSON.parse on line 67 is the real hotspot

# Step 4: Fix the issue, then verify
loop-detective 12345 -d 60 --save-profile ./after-fix.cpuprofile
# Compare the two profiles in speedscope

Step 1 takes 10 seconds and gives you the direction. Steps 2-4 give you the depth. You don't need the flame graph for every debugging session, but when you do, it's there.

Practical Tips

Profile duration matters. A 5-second profile might miss intermittent issues. For flame graph analysis, 30-60 seconds gives you a representative sample. The file size is typically 1-5MB for a 60-second profile.

Name your files. Include context in the filename:

loop-detective 12345 -d 60 --save-profile ./profiles/api-server-$(date +%Y%m%d-%H%M%S).cpuprofile

Watch mode overwrites. In --watch mode, each cycle overwrites the same file. If you need to keep every cycle's profile, use --json output and extract the raw profile data programmatically.

Combine with I/O tracking. The CPU profile shows where compute time goes. The slow I/O summary shows where wait time goes. Together they explain the full request lifecycle:

loop-detective 12345 -d 30 --io-threshold 200 --save-profile ./profile.cpuprofile

The profile includes everything. The saved file contains the complete V8 profile — all nodes, all samples, all time deltas. loop-detective's built-in analysis only shows the top 20 functions. The flame graph shows everything, including functions that individually account for <1% of CPU but collectively matter.

Under the Hood

The implementation is minimal. The Detective class already captures the raw V8 profile via CDP:

async _captureProfile(duration) {
  await this.inspector.send('Profiler.enable');
  await this.inspector.send('Profiler.start');
  await this._sleep(duration);
  const { profile } = await this.inspector.send('Profiler.stop');
  return profile;
}

The profile object is the raw V8 CPU profile. Previously, it was only passed to the Analyzer. Now it's also emitted alongside the analysis:

this.emit('profile', analysis, rawProfile);

The CLI catches it and writes to disk:

detective.on('profile', (analysis, rawProfile) => {
  reporter.onProfile(analysis);
  if (config.saveProfile && rawProfile) {
    fs.writeFileSync(path.resolve(config.saveProfile), JSON.stringify(rawProfile));
  }
});

No transformation, no filtering. The file is the exact JSON that V8 produced. This means any tool that reads .cpuprofile files will work correctly — Chrome DevTools, speedscope, VS Code extensions, or your own analysis scripts.

Programmatic API

If you're building tooling on top of loop-detective, you can access the raw profile via the event:

const { Detective } = require('node-loop-detective');

const detective = new Detective({ pid: 12345, duration: 30000 });

detective.on('profile', (analysis, rawProfile) => {
  // analysis — the processed report (heavy functions, patterns, etc.)
  // rawProfile — the raw V8 CPU profile object

  // Save it
  fs.writeFileSync('profile.cpuprofile', JSON.stringify(rawProfile));

  // Or analyze it yourself
  console.log(`${rawProfile.nodes.length} nodes, ${rawProfile.samples.length} samples`);
});

await detective.start();

You can also use the Analyzer class standalone if you already have a .cpuprofile file:

const { Analyzer } = require('node-loop-detective');
const profile = JSON.parse(fs.readFileSync('profile.cpuprofile', 'utf8'));
const analyzer = new Analyzer({ threshold: 50 });
const result = analyzer.analyzeProfile(profile);

Try It

npm install -g node-loop-detective@1.4.0

# Profile and export
loop-detective <pid> --save-profile ./profile.cpuprofile

# Then open in your browser
# Chrome DevTools → Performance → Load profile
# Or drag into https://www.speedscope.app