DEV Community

Cover image for Your Node.js app is slow to start. You just don't know which module to blame.
Aryan Tiwari
Aryan Tiwari

Posted on

Your Node.js app is slow to start. You just don't know which module to blame.

Last month I was debugging a startup regression at work. Our Node.js service went from ~300ms boot to nearly 900ms overnight. No new features. No infra changes. Just a routine dependency bump.

The usual approach? Comment out requires one by one. Bisect package.json. Stare at --cpu-prof output and pretend to understand V8 internals.

I wanted something simpler: run one command, see which module is eating my startup time, and know if the cost is in the module itself or in everything it drags in.

So I built coldstart — a zero-dependency startup profiler for Node.js that instruments Module._load, reconstructs the dependency tree, and shows you exactly where boot time goes.

Full transparency: I used Claude pretty heavily while building this — for scaffolding the ESM loader hooks, generating the flamegraph HTML template, and iterating on the tree rendering logic. The core idea (patching Module._load with performance.now() bookends) and the architecture were mine, but AI absolutely accelerated the implementation. I think that's just how a lot of solo open source gets built now, and I'd rather be upfront about it.

The problem in 30 seconds

Node.js doesn't tell you why startup is slow. You get one number — total boot time — and zero breakdown.

Meanwhile:

  • A single require('sequelize') can silently add 400ms
  • Transitive dependencies pile up — you require one thing, Node loads 300 modules
  • Synchronous work in module scope (reading files, compiling templates, connecting to DBs) blocks the event loop before your app even starts
  • Cached modules still add edges to the dependency graph, obscuring the real bottlenecks

This matters more than ever. If you're running on Lambda (where cold starts are now billed), on serverless platforms, or in containers that scale from zero — startup time is latency your users feel on the first request.

What coldstart actually does

Run it against any Node app:

npx @yetanotheraryan/coldstart server.js
Enter fullscreen mode Exit fullscreen mode

You get this:

coldstart — 847ms total startup

  ┌─ express          234ms  ████████████░░░░░░░░
  │  ├─ body-parser    89ms  █████░░░░░░░░░░░░░░░
  │  ├─ qs             12ms  █░░░░░░░░░░░░░░░░░░░
  │  └─ path-to-regex   8ms  ░░░░░░░░░░░░░░░░░░░░
  ├─ sequelize        401ms  █████████████████████  ⚠ slow
  │  ├─ pg            203ms  ███████████░░░░░░░░░
  │  └─ lodash         98ms  █████░░░░░░░░░░░░░░░
  └─ dotenv             4ms  ░░░░░░░░░░░░░░░░░░░░

event loop max 42ms, p99 17ms, mean 4.3ms
modules 312 total, 59 cached
time split 286ms first-party, 503ms node_modules
Enter fullscreen mode Exit fullscreen mode

The tree shows parent → child load relationships with inclusive timing (how long the whole subtree took) and bar charts colored by severity. At a glance you can see: sequelize is the problem, and within sequelize, it's pg and lodash doing the heavy lifting.

How it works under the hood

The core technique is straightforward — coldstart monkey-patches Module._load (the internal function Node calls for every require()):

  1. Before the original _load runs, record performance.now() and the parent module
  2. Let Node do its thing — resolve, compile, execute
  3. After _load returns, record the end time
  4. Store the raw event: { request, resolvedPath, parentPath, startMs, endMs, cached }

For ESM, it uses Node's module.register() loader hooks (available in Node 18.19+) to capture resolve and load events, bridging timing data back to the main tracer through a message channel.

After your app finishes starting up, the tracer takes all those raw events and builds:

  • A tree — the actual parent → child dependency graph as loaded at runtime
  • Inclusive time — total wall-clock time for a module and everything it pulled in
  • Exclusive time — just the module's own initialization cost, minus children
  • Event loop stats — max, mean, p99 blocking during startup using perf_hooks
  • A split — how much time was first-party code vs node_modules

The distinction between inclusive and exclusive is key. A module with high inclusive but low exclusive time is just a gateway — it pulls in heavy children but isn't slow itself. High exclusive time means that specific module is doing expensive work at load time.

Three ways to use it

CLI (easiest — profiles any app):

coldstart server.js
coldstart --json server.js          # machine-readable output
coldstart -- node --inspect app.js  # pass node flags through
Enter fullscreen mode Exit fullscreen mode

Programmatic API (embed in your own tooling):

import { monitor, renderTextReport } from '@yetanotheraryan/coldstart'

const done = monitor()
require('./bootstrap')
require('./server')

console.log(renderTextReport(done()))
Enter fullscreen mode Exit fullscreen mode

Preload mode (zero code changes):

node --require @yetanotheraryan/coldstart/register server.js
# or for ESM:
node --import @yetanotheraryan/coldstart/register server.mjs
Enter fullscreen mode Exit fullscreen mode

There's also a renderFlamegraphHtml() export that generates a self-contained HTML flamegraph you can open in a browser — useful for sharing with your team or dropping into a PR description.

What I actually found at work

After running coldstart on our service, the culprit was obvious in under a second: a transitive dependency three levels deep was doing synchronous file I/O at module scope to read a config file. The dependency bump had changed its initialization path.

The fix was a one-line lazy require() that moved the load out of the critical startup path. Boot time went back to ~320ms.

Without the tree view, I'd have been bisecting for an hour.

Why not just use --cpu-prof?

--cpu-prof is great for understanding what code is running, but it doesn't answer which module load is slow or what's the dependency chain that got us here. You get a flamegraph of V8 internals and function calls, not a map of your require() tree with timing.

coldstart is deliberately higher-level. It answers "which npm package is making my startup slow?" — not "which V8 builtin is hot."

They're complementary. Use coldstart to find the slow module, then --cpu-prof if you need to understand why that module is slow.

Current status & what's missing

Working today:

  • CommonJS profiling
  • ESM profiling (Node 18.19+)
  • CLI, programmatic API, preload mode
  • Text report, JSON report, HTML flamegraph

Not yet implemented:

  • Dynamic import() tracing
  • Watch mode for iterating on startup optimizations
  • CI integration (fail if startup exceeds a threshold)

It's early. The API is stable enough for everyday use but I'm iterating on the output format and considering a few features based on what people actually need.

Try it

npm install @yetanotheraryan/coldstart
Enter fullscreen mode Exit fullscreen mode

Or just run it once with npx:

npx @yetanotheraryan/coldstart your-app.js
Enter fullscreen mode Exit fullscreen mode

GitHub: github.com/yetanotheraryan/coldstart

If this is useful to you, a star on the repo genuinely helps with discoverability. And if you run it on your app and find something interesting — I'd love to hear about it in the comments. What was your slowest module?


I'm Aryan — I build open source tools for Node.js on the side. You can find my other projects on GitHub.

Top comments (0)