<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Savas Ozturk</title>
    <description>The latest articles on DEV Community by Savas Ozturk (@savas_ozturk).</description>
    <link>https://dev.to/savas_ozturk</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3975550%2F98ae7150-52ae-42a4-8601-ae2ff1b728a8.png</url>
      <title>DEV Community: Savas Ozturk</title>
      <link>https://dev.to/savas_ozturk</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/savas_ozturk"/>
    <language>en</language>
    <item>
      <title>Why is my Node.js app slow? An OpenTelemetry debugging checklist</title>
      <dc:creator>Savas Ozturk</dc:creator>
      <pubDate>Tue, 09 Jun 2026 10:38:47 +0000</pubDate>
      <link>https://dev.to/savas_ozturk/why-is-my-nodejs-app-slow-an-opentelemetry-debugging-checklist-1apn</link>
      <guid>https://dev.to/savas_ozturk/why-is-my-nodejs-app-slow-an-opentelemetry-debugging-checklist-1apn</guid>
      <description>&lt;p&gt;Node.js makes single-threaded asynchronous I/O cheap. It also makes a single bad pattern in one corner of the codebase capable of slowing the whole process. This is the production-debugging checklist I'd actually run, in the order I'd run it, with the OpenTelemetry instrumentation that lets you skip the guesswork.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The event loop is blocked
&lt;/h2&gt;

&lt;p&gt;The single most common cause of "Node is slow." A CPU-heavy synchronous operation (a regex, a &lt;code&gt;JSON.parse&lt;/code&gt; on a 50MB string, a crypto operation) holds the event loop and every other request waits. Symptoms: latency on &lt;em&gt;unrelated&lt;/em&gt; endpoints spikes simultaneously, and the spike correlates with one particular endpoint's traffic.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OTel signal:&lt;/strong&gt; &lt;code&gt;nodejs.eventloop.delay.p95&lt;/code&gt; and nodejs.eventloop.delay.p99&lt;code&gt;via the &lt;br&gt;
&lt;/code&gt;@opentelemetry/instrumentation-runtime-node` package. Anything above 20ms is suspect; above &lt;br&gt;
  100ms is the cause of your incident. Chart against request latency. The correlation is &lt;br&gt;
  usually obvious.&lt;/p&gt;
&lt;h2&gt;
  
  
  2. Garbage collection is pausing the process
&lt;/h2&gt;

&lt;p&gt;Long-lived references that should have been short-lived (caches without size limits, closures capturing request objects, listeners not removed) push the heap up. Eventually V8 runs a major GC and the process pauses for hundreds of milliseconds.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OTel signal:&lt;/strong&gt; &lt;code&gt;nodejs.gc.duration&lt;/code&gt; (histogram) and  v8js.heap.size.used` (gauge) from the runtime instrumentation. A GC duration p99 above 200ms with a growing heap-used line tells the whole story.&lt;/p&gt;
&lt;h2&gt;
  
  
  3. A downstream call is the actual slow thing
&lt;/h2&gt;

&lt;p&gt;"My app is slow" usually means "the response is slow." About half the time, your Node service is fine and is waiting on something else: a database query, a downstream microservice, a third-party API.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OTel signal:&lt;/strong&gt; the trace waterfall. Open a slow trace. The long span is rarely your application code; it's almost always an outbound HTTP or DB span. &lt;br&gt;
  &lt;code&gt;@opentelemetry/instrumentation-http&lt;/code&gt; and the database-specific instrumentation packages produce these for free.&lt;/p&gt;
&lt;h2&gt;
  
  
  4. Database N+1 queries
&lt;/h2&gt;

&lt;p&gt;An ORM (Sequelize, TypeORM, Prisma) issuing one query per item in a result set is a near-universal Node.js pattern. The endpoint that fetches "100 orders with their line items" issues 1 + 100 + N queries instead of one join.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OTel signal:&lt;/strong&gt; count of database spans per trace. If a single trace has 50+ spans from the &lt;br&gt;
  same DB span name, you have a query loop. The &lt;code&gt;db.statement&lt;/code&gt; attribute (hashed if sensitive) &lt;br&gt;
  shows the repeated pattern.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Blocking sync I/O on the hot path
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;fs.readFileSync&lt;/code&gt;, &lt;code&gt;crypto.pbkdf2Sync&lt;/code&gt;, &lt;code&gt;JSON.parse&lt;/code&gt; on a huge body. Any synchronous operation in the request path holds the event loop. Often introduced by a contractor or a quick-fix PR that "worked locally."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OTel signal:&lt;/strong&gt; CPU-time spans (with manual instrumentation) or just the event-loop-delay correlation from #1. A specific endpoint where event-loop delay spikes on every request is the smoking gun.&lt;/p&gt;
&lt;h2&gt;
  
  
  6. Connection pool exhaustion
&lt;/h2&gt;

&lt;p&gt;The PostgreSQL client, the Redis client, the HTTP keepalive pool. Each has a max-connection setting that defaults to a small number. Under load, requests queue waiting for a connection, and the wait time looks like database latency from the outside.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OTel signal:&lt;/strong&gt; custom up-down counter on pool size, or the difference between &lt;code&gt;db.client.connections.usage&lt;/code&gt; and &lt;code&gt;db.client.connections.max&lt;/code&gt;. The OTel database instrumentation libraries are starting to emit these natively; verify with your specific version.&lt;/p&gt;
&lt;h2&gt;
  
  
  7. Logging overhead
&lt;/h2&gt;

&lt;p&gt;A logger emitting at &lt;code&gt;debug&lt;/code&gt; level in production, writing to stdout that's then piped through a sidecar agent, can become non-trivial CPU work. Especially if the logger is doing JSON serialisation of large objects.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OTel signal:&lt;/strong&gt; log rate metrics (count of log records per second) plus the runtime CPU usage. Sharp logging spike that correlates with latency is the giveaway. Setting the logger level back to &lt;code&gt;info&lt;/code&gt; is usually the fix.&lt;/p&gt;
&lt;h2&gt;
  
  
  8. Async leaks and uncaught rejections
&lt;/h2&gt;

&lt;p&gt;Promises that never resolve, async hooks that accumulate, unhandled promise rejections that log without crashing. Each one leaks resources over time. The process gets slower hour by hour and recovers only on restart.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;OTel signal:&lt;/strong&gt; &lt;code&gt;nodejs.eventloop.utilization&lt;/code&gt; climbing over hours, paired with &lt;code&gt;v8js.heap.size.used&lt;/code&gt; climbing. If your service runs fine after a deploy and gets progressively slower over the next 12 hours, this is what you're looking at.&lt;/p&gt;
&lt;h2&gt;
  
  
  The minimum instrumentation to get all of this
&lt;/h2&gt;

&lt;p&gt;Three packages, one config file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm &lt;span class="nb"&gt;install&lt;/span&gt; @opentelemetry/sdk-node &lt;span class="se"&gt;\&lt;/span&gt;
              @opentelemetry/auto-instrumentations-node &lt;span class="se"&gt;\&lt;/span&gt;
              @opentelemetry/instrumentation-runtime-node
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight javascript"&gt;&lt;code&gt;  &lt;span class="c1"&gt;// otel.js (required before your app)&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;NodeSDK&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/sdk-node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;getNodeAutoInstrumentations&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/auto-instrumentations-node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;RuntimeNodeInstrumentation&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;
  &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/instrumentation-runtime-node&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;OTLPTraceExporter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/exporter-trace-otlp-http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;OTLPMetricExporter&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/exporter-metrics-otlp-http&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="nx"&gt;PeriodicExportingMetricReader&lt;/span&gt; &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;require&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;@opentelemetry/sdk-metrics&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

  &lt;span class="kd"&gt;const&lt;/span&gt; &lt;span class="nx"&gt;sdk&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;NodeSDK&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="na"&gt;traceExporter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OTLPTraceExporter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://your-otlp-endpoint/v1/traces&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
      &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OTEL_API_KEY&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="na"&gt;metricReader&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;PeriodicExportingMetricReader&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
      &lt;span class="na"&gt;exporter&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;OTLPMetricExporter&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
        &lt;span class="na"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;https://your-otlp-endpoint/v1/metrics&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="na"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt; &lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="s1"&gt;x-api-key&lt;/span&gt;&lt;span class="dl"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nx"&gt;process&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;env&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nx"&gt;OTEL_API_KEY&lt;/span&gt; &lt;span class="p"&gt;},&lt;/span&gt;
      &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="p"&gt;}),&lt;/span&gt;
    &lt;span class="na"&gt;instrumentations&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
      &lt;span class="nf"&gt;getNodeAutoInstrumentations&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
      &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeNodeInstrumentation&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
  &lt;span class="p"&gt;});&lt;/span&gt;
  &lt;span class="nx"&gt;sdk&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;start&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then node -r ./otel.js app.js. Auto-instrumentation covers HTTP server, HTTP client, all major DB clients, gRPC, and message queues. Runtime instrumentation covers event loop, GC, heap. The eight signals above all appear without further work.&lt;/p&gt;

&lt;p&gt;A debugging order that usually works&lt;/p&gt;

&lt;p&gt;If you don't know where to start, this order resolves most incidents in under fifteen minutes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Open the trace waterfall for a slow request. If the long span is a downstream call, jump to that service. The problem is not Node.&lt;/li&gt;
&lt;li&gt;If the long span is in your service, check nodejs.eventloop.delay.p99 in the same time window. Spiking? You're CPU-bound or blocking; identify the endpoint by correlation.&lt;/li&gt;
&lt;li&gt;If event loop is fine but heap is growing, GC pauses. Look at nodejs.gc.duration p99.&lt;/li&gt;
&lt;li&gt;If event loop and GC look fine, count DB spans per trace. 50+ on one endpoint means N+1.&lt;/li&gt;
&lt;li&gt;None of the above → connection pool exhaustion, logging overhead, or async leak. Each has a distinct signal pattern from the runtime instrumentation.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  Further reading
&lt;/h2&gt;

&lt;p&gt;For the SDK setup walkthrough, see &lt;a href="https://telemtra.io/docs/opentelemetry-nodejs/" rel="noopener noreferrer"&gt;OpenTelemetry Node.js&lt;br&gt;
  instrumentation&lt;/a&gt;. &lt;a href="https://telemtra.io/blog/opentelemetry-best-practices/" rel="noopener noreferrer"&gt;OpenTelemetry best&lt;br&gt;
  practices for production&lt;/a&gt; covers the&lt;br&gt;
  surrounding hygiene, and &lt;a href="https://telemtra.io/blog/debug-500-errors-with-distributed-tracing/" rel="noopener noreferrer"&gt;Debug 500 errors with distributed&lt;br&gt;
  tracing&lt;/a&gt; is the incident&lt;br&gt;
  playbook for when the eight signals above point you somewhere.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://telemtra.io/blog/why-is-my-python-app-slow/" rel="noopener noreferrer"&gt;  Originally published on telemtra.io.&lt;/a&gt;&lt;/p&gt;

</description>
      <category>node</category>
      <category>opentelemetry</category>
      <category>performance</category>
      <category>devops</category>
    </item>
  </channel>
</rss>
