DEV Community

Cover image for I audited my tool, fixed 44 bugs - and it still didn’t work
Albert Alov
Albert Alov

Posted on

I audited my tool, fixed 44 bugs - and it still didn’t work

252 green tests, zero traces in Jaeger, and the one-line OpenTelemetry mistake that made my observability tool blind.

TL;DR

I shipped an observability tool with 252 green tests — but zero traces ever reached Jaeger. The root cause was an OpenTelemetry config detail that looked harmless (spanProcessors: []) but silently disabled trace export. Manual testing found it in minutes.

Act 1 ·

Act 2 ·

Root cause ·

Fix ·

Checklist ·

Links

I shipped v2.2.0 of my observability tool with 143 tests and a green CI run.

Then I did what I thought was the responsible thing: a deep code + DX audit. I found 44 issues, fixed them in a sprint, bumped the version a bunch of times, and ended at v2.4.4 with 252 tests.

I felt great — until I ran the tool like a real user would.

Zero traces were reaching the backend. Not “sometimes.” Not “misconfigured.” Just: never.

252 unit tests. All green. Traces were broken since day one.

This is how I found out, what the root cause was, and why tests (and code audits) didn’t see it.


Act 1: The audit (I found what I expected)

The audit was useful. It caught real problems — especially the kind that looks “reasonable” in code review and passes unit tests.

Three examples that matter for the story:

1) A privacy feature that leaked PII

I had an auditMasking mode meant to help debug redaction. Great intention, terrible output: it logged the original unmasked text to stdout.

If your logs go to CloudWatch/Datadog (they do), stdout isn’t “local debug.” It’s a data pipeline.

leaked

Caption: “Fix: audit mode no longer prints raw input (PII).”

2) diag.warn() was invisible by default

I used OpenTelemetry’s diag.warn() for user-facing warnings.

Problem: diag.* emits nothing unless diagnostics are explicitly configured. So warnings existed… but users never saw them. Typo? Missing SDK? Collector down? Silent.

Keep this in mind: “silent failure” becomes the recurring theme of this story.

3) npx CLI was completely dead

The CLI entry guard compared a symlink path to a real path, so npx toad-eye ... produced zero output. Entire CLI dead via npx.

CLI

Caption: “Fix: npx runs via a symlink — compare real paths or the CLI never executes.”

At this point, the audit felt like a win: 44 issues found, 44 fixed, tests grew from 143 → 252. Ship it.


Act 2: Manual testing (I found what I didn’t expect)

After the audit, I wrote a quick testing guide and ran the tool end-to-end:

1) npx init

2) import into a tiny app

3) run against a real Collector

4) confirm traces show up in Jaeger

This is where everything fell apart fast:

Step 1 — npx toad-eye init: silence

That was the broken npx guard (fixed as above). The tool looked “dead” for the most common installation path.

Step 2 — importing with tsx: ERR_PACKAGE_PATH_NOT_EXPORTED

Package exports were missing the "default" condition. Another “works locally” trap.

Step 3 — Jaeger: nothing

No service. No spans. No errors. No warnings (because diag.warn was invisible).

So I did what everyone does: I blamed Docker and infrastructure. I spent an hour tweaking Collector configs, flipping between gRPC and HTTP ports, restarting containers — all while assuming the problem was upstream in Jaeger or the Collector.

But the pipeline wasn’t broken in the middle.

It was broken at the source.


The root cause: I accidentally disabled trace export completely

Here’s the bug:

I passed spanProcessors: [] to OpenTelemetry’s NodeSDK.

That looks harmless. It’s not.

An empty spanProcessors array doesn’t mean “use defaults.”

It means “override defaults with nothing.”

No span processor → nothing exports.

Metrics still worked (separate pipeline), which made the bug even harder to spot. The tool looked “alive” while traces were dead.

Even worse: when instrument: ['ai'] was enabled, spanProcessors became non-empty… but the processor I provided only recorded metrics. I still didn’t include the default BatchSpanProcessor for exporting spans.

Different code path, same result: zero traces.

Important: this wasn’t a flaky config issue. Traces never worked for any user. Ever.


The one-line fix

The fix is almost insulting:

Don’t pass an empty array.

Let the SDK create its default BatchSpanProcessor unless you actually have span processors to set.

tracer

Caption: “Fix: don’t override the default BatchSpanProcessor with spanProcessors: [].”


Act 3: The takeaway (what changed in how I test)

After this, “252 tests” stopped feeling comforting.

Because the real problem wasn’t “insufficient assertions.”

It was: my tests weren’t testing reality.

Why unit tests didn’t catch it

My unit tests mocked the OpenTelemetry SDK.

So they verified:

  • “I call NodeSDK with these options”
  • “I register this instrumentation”
  • “I construct this processor”

But they didn’t verify the one thing an observability tool must do: do traces actually show up in the backend?


The checklist I’m keeping now

Practical, not preachy. If you ship devtools (especially observability), keep one test path that’s real.

Testing (reality checks)

  • Integration smoke test: run a real Collector + Jaeger and assert at least one span shows up
  • Don’t mock away the pipeline: at least one test should export for real
  • When debugging, start at the source (your app), not the destination (Jaeger)

Design (don’t disable defaults accidentally)

  • Avoid “empty override” configs ([]) unless you truly mean “disable defaults”
  • Treat streaming / special modes as separate first-class paths (parity tests)

UX (make failure loud)

  • Make failures visible by default (don’t rely on invisible diagnostics)
  • Run the “11pm developer” test: typos, missing Docker, empty dashboards — does the tool explain itself?

Results (numbers, no hype)

Before:

  • v2.2.0
  • 143 tests
  • traces: broken since day 1
  • npx CLI: silent/dead

After:

  • v2.4.4
  • 252 tests
  • code audit: 44 issues found, 44 fixed
  • manual testing: 5 critical bugs found, all fixed
  • traces: working
  • npx CLI: working
  • npm publishes: 6 (in one day)

Top comments (0)