Xunxing Mao

Posted on Apr 14

Node.js CPU Spike Analysis: When Requests Hang and Event Loop Starves

#node #performance #eventloop #cpu

This article was originally published on maoxunxing.com. Follow me there for more deep dives on Node.js, AI, and frontend engineering.

The Problem

In production environments, we often encounter a peculiar phenomenon: CPU usage suddenly spikes to 100% while the application appears to be "doing nothing." No active computations, no heavy processing—just hanging requests and a frozen event loop.

This article analyzes two real-world cases from a large-scale operation platform:

RPC batch processing timeout causing CPU spikes
Message queue subscriber CPU anomalies without rate limiting

Case 1: RPC Batch Processing - The Silent Killer

The Scenario

We have a getBatchCompleteModuleDiff method that processes 100 components in batches. Each component triggers 4-5 RPC calls to backend services.

// Before optimization - Serial processing, NO timeout
async getBatchCompleteModuleDiff(componentIds: number[]) {
  const chunks = chunk(componentIds, 5)  // 20 batches for 100 components
  let result = []

  for (const chunkItem of chunks) {  // Serial execution!
    const res = await Promise.all(
      chunkItem.map(id => this.getCompleteModuleDiffInfo(id))
    )
    result.push(res)
  }
  return result
}

Why CPU Spikes When Requests Hang

The root cause is Event Loop Starvation caused by the combination of:

No timeout control - RPC calls could hang indefinitely
Promise accumulation - 20 batches × 5 components × 4 RPC calls = 400+ concurrent Promises
Event loop blocking - All Promises compete for event loop cycles

Here's what happens:

Timeline:
├── Batch 1 starts (5 components × 4 RPC calls = 20 Promises)
├── Batch 1 hangs (backend database exception, no timeout)
├── Batch 2 starts (another 20 Promises)
├── Batch 2 hangs
├── ...
├── Batch 20 starts (another 20 Promises)
├── Event Loop: 400+ pending Promises waiting
└── CPU: 100% (event loop constantly checking Promise states)

The CPU isn't doing useful work—it's spinning on Promise resolution checks.

The Solution

// After optimization - Parallel batches with timeout
async getBatchCompleteModuleDiff(componentIds: number[]) {
  const componentIdsChunks = chunk(componentIds, 10)  // Larger chunks

  // Parallel processing with timeout protection
  const batchPromises = componentIdsChunks.map(async (chunk) => {
    const promises = chunk.map(id => 
      this.withTimeout(
        this.getCompleteModuleDiffInfo(id),
        5000,  // 5s timeout prevents indefinite hanging
        `timeout for id: ${id}`
      )
    )
    return Promise.allSettled(promises)  // Isolate failures
  })

  return Promise.all(batchPromises)
}

private withTimeout<T>(promise: Promise<T>, timeoutMs: number, errorMsg: string): Promise<T> {
  return new Promise((resolve, reject) => {
    const timeoutId = setTimeout(() => reject(new Error(errorMsg)), timeoutMs)
    promise
      .then(result => { clearTimeout(timeoutId); resolve(result) })
      .catch(error => { clearTimeout(timeoutId); reject(error) })
  })
}

Results

Metric	Before	After	Improvement
Response Time	60s+	5-8s	87% ↓
CPU Usage	90-100%	40-50%	50% ↓
Timeout Control	None	5s	Prevents hanging

Case 2: Message Queue Subscriber - The Rate Limiting Problem

The Scenario

Our CheckTaskResultSubscriber processes message queue messages for inspection task results:

@MessageSubscriber({ topic: 'TASK_RESULT' })
export class CheckTaskResultSubscriber implements IMessageSubscriber {
  @Inject()
  checkReportService: CheckReportService

  async subscribe(msg: SubscribeMessage) {
    const messageBody = JSON.parse(msg.body.toString())

    if (this.INSPECTOR_TAGS.includes(messageTag)) {
      await this.handleInspectorResult(messageBody)  // DB operations
    } else if (this.DETECTION_TAGS.includes(messageTag)) {
      await this.handleDetectionResult(messageBody)  // DB operations
    }
  }
}

Why CPU Spikes Without Rate Limiting

Message queue consumers by default pull messages as fast as possible. When:

Message burst occurs (e.g., 1000 tasks complete simultaneously)
Each message triggers DB operations (queries, updates)
No concurrency control - all messages processed concurrently
Connection pool exhaustion - DB connections max out
Event loop saturated - waiting on I/O operations

Message Burst (1000 messages)
    ↓
No Rate Limiting
    ↓
1000 concurrent async operations
    ↓
DB Connection Pool (max 50) exhausted
    ↓
949 operations waiting for connections
    ↓
Event Loop: Constantly polling/waiting
    ↓
CPU: 100% (context switching + polling overhead)

Why This Happens in Node.js

Unlike Java's thread pool model, Node.js uses a single-threaded event loop:

Java: Thread pool limits concurrent execution naturally (e.g., 50 threads = max 50 concurrent operations)
Node.js: No natural limit—can create unlimited Promises that all compete for the event loop

When Promises wait for I/O (DB connections), they don't "pause"—they constantly check if the resource is available, consuming CPU cycles.

The Missing Rate Limiting

Looking at our message queue configuration:

// config.prod.ts
config.messageQueue = {
  enableDefaultProducer: true,
  pub: { /* ... */ },
  // NO sub configuration! Consumer uses default settings
}

The subscriber has no concurrency control at either the:

Message queue level (no prefetch limit)
Application level (no semaphore/bottleneck)

Key Takeaways

Why Requests Hanging Cause CPU Spikes

Factor	Explanation
Promise overhead	Each pending Promise consumes event loop cycles
No timeout	Hanging requests accumulate indefinitely
Resource competition	DB connections, memory, file descriptors exhaust
Polling cost	Event loop constantly checks I/O readiness
Context switching	V8 engine overhead from Promise state transitions

Prevention Strategies

Always set timeouts for external calls (RPC, HTTP, DB)
Implement rate limiting for message consumers
Use Promise.allSettled instead of Promise.all to isolate failures
Monitor event loop lag as an early warning indicator
Add circuit breakers for cascading failure prevention

Monitoring Checklist

// Event loop lag monitoring
const eventLoopLag = require('event-loop-lag')
setInterval(() => {
  const lag = eventLoopLag()
  if (lag > 100) {  // > 100ms indicates problem
    console.warn(`Event loop lag: ${lag}ms`)
  }
}, 1000)

Conclusion

Node.js CPU spikes during request hanging are counterintuitive but explainable:

It's not the hanging itself that consumes CPU
It's the accumulation of waiting Promises competing for event loop attention
Timeout and concurrency control are essential defenses

The key insight: Node.js requires explicit resource management that other languages handle implicitly through thread pools. Without it, "doing nothing" can consume everything.

This article is based on real production incidents from a large-scale operation platform. The optimization reduced P99 latency from 60s to 8s and stabilized CPU usage at 40-50%.

References

The Node.js Event Loop — Node.js Documentation — Official documentation of the Node.js event loop
Don't Block the Event Loop — Node.js Guide — Official guide on avoiding event loop blocking
Understanding the Node.js Event Loop — YouTube (Bert Belder) — Deep dive into event loop internals from a Node.js core contributor

If you're interested in AI-assisted development workflows, check out my
AI Coding Playbook where I
cover tool selection and prompt templates.

I also wrote about building a personal knowledge base using Karpathy's
method which
covers how to organize technical learning into a publishing pipeline.

Felix Mao (毛毛星) | maoxunxing.com | @maoxunxing

DEV Community

Node.js CPU Spike Analysis: When Requests Hang and Event Loop Starves

The Problem

Case 1: RPC Batch Processing - The Silent Killer

The Scenario

Why CPU Spikes When Requests Hang

The Solution

Results

Case 2: Message Queue Subscriber - The Rate Limiting Problem

The Scenario

Why CPU Spikes Without Rate Limiting

Why This Happens in Node.js

The Missing Rate Limiting

Recommended Solutions

1. Add Concurrency Control in Subscriber

2. Use p-limit for Simpler Control

3. Configure Message Queue Consumer Thread Pool

Key Takeaways

Why Requests Hanging Cause CPU Spikes

Prevention Strategies

Monitoring Checklist

Conclusion

References

Top comments (0)