DEV Community

Neeraja Khanapure
Neeraja Khanapure

Posted on

Something I wish someone had told me five years earlier:

LinkedIn Draft — Insight (2026-04-10)

Something I wish someone had told me five years earlier:

Distributed tracing: the gap between having it and using it in incidents

Most orgs instrument distributed traces correctly and then debug incidents with grep. The investment in tracing pays off only when your debugging workflow changes — when you start from a trace ID instead of a log query. That's a culture change, not a tooling change.

Current state (most orgs):      Target state:

Incident fires                  Incident fires
     │                               │
Grep logs ──▶ Guess service ──▶  Pull trace ID from alert
     │                               │
More grep ──▶ Find error      Trace shows full request path
     │                               │
Escalate ──▶ More engineers   Latency waterfall identifies
     │                         bottleneck in 3 minutes
MTTR: 90 min                  MTTR: 15 min
Enter fullscreen mode Exit fullscreen mode

The non-obvious part:
→ Traces don't reduce MTTR on their own — runbooks that start from trace IDs do. The highest-leverage thing you can do after instrumenting is to rewrite your top 5 incident runbooks to start with 'get the trace ID from the alert, open it in Jaeger/Tempo, find the slowest span.' Engineers follow runbooks under pressure.

My rule:
→ Instrument your 3 highest-traffic endpoints first. Then rewrite one runbook to start from a trace ID. Measure incident time-to-hypothesis before and after.

Worth reading:
▸ OpenTelemetry instrumentation guides — language SDKs (opentelemetry.io/docs)
▸ Grafana Tempo + Loki correlation — trace-to-log workflow without leaving the dashboard

https://neeraja-portfolio-v1.vercel.app/insights/distributed-tracing-the-gap-between-having-it-and-using-it-in-incidents

If you're earlier in your career: bookmark this. It'll make more sense after your first real production incident.

devops #sre #observability #platformengineering

Top comments (0)