DEV Community

Cover image for Why does performance debugging still suck so much?
Ali Amer
Ali Amer

Posted on

Why does performance debugging still suck so much?

Recently I got laid off.

Weirdly, I didn’t feel stressed. I actually felt… excited.

For a long time I’ve wanted to build something meaningful in open source, but work always got in the way. Suddenly I had the time to try.

So I asked myself a question:

Why does performance debugging on Linux still suck?

I remember a moment back in college that stuck with me.

One semester we had to build a project using .NET on Windows.

It was the first time I had to seriously use Windows for development.

Honestly, I never liked .NET very much.

But there was one thing I really liked: Visual Studio.

When you run an application in the debugger, Visual Studio shows you live metrics:

  • CPU usage
  • memory usage
  • allocations
  • performance spikes

And the coolest part was watching what happened when you interacted with the app.

You could click a button in the UI and instantly see:

  • CPU usage spike
  • memory allocations change
  • performance graphs jump

That made performance debugging feel visual and intuitive.

You could literally see:

“I clicked this button… and something expensive just happened.”

Then you start digging:

  • Which function caused the spike?
  • Why did memory jump here?
  • What part of the code is slow?

It turned performance optimization into a kind of investigation.

And I loved that experience.


But there was a problem

Years later, working mostly on Linux systems, I realized something strange.

We have powerful tools:

  • perf
  • bpftrace
  • flamegraphs
  • eBPF

But none of them feel as simple and immediate as that Visual Studio experience.

Most tools are:

  • hard to set up
  • hard to understand
  • designed for experts

So I decided to build one.


The idea

I’m building a tool called Nusku.

The goal is simple:

A fast, modern performance profiler built with Rust + eBPF.

It should:

  • run instantly
  • require almost no setup
  • work on production systems
  • show useful information immediately

No complicated configuration.

No massive dashboards.

Just run it and see where your CPU time is going.


What I’ve built so far

So far Nusku can:

  • Attach to a running process
  • Sample CPU usage using eBPF perf events
  • Capture user-space stack traces
  • Symbolize addresses into real function names
  • Aggregate hot functions in real time
  • Show CPU usage and memory usage in a live terminal view

Example output:

── PID 132612 ── 98 samples ── CPU  99.0% ── RSS 2.0 MiB ── VIRT 3.1 MiB ──
     %     COUNT  FUNCTION                          SOURCE                           ADDRESS
────────────────────────────────────────────────────────────────────────────────────────────────
 27.6%        27  <core::ops::range::Range<T> as …  range.rs:773          0x00005efa8cfd4b95
 11.2%        11  testing::hot_c                    main.rs:7             0x00005efa8cfd4cc7
 10.2%        10  <i32 as core::iter::range::Step…  range.rs:197          0x00005efa8cfd4acd
 10.2%        10  <core::ops::range::Range<T> as …  range.rs:775          0x00005efa8cfd4b9f
  9.2%         9  testing::hot_c                    main.rs:8             0x00005efa8cfd4cf0
  6.1%         6  <core::ops::range::Range<T> as …  range.rs:771          0x00005efa8cfd4b69
  5.1%         5  <i32 as core::iter::range::Step…  range.rs:198          0x00005efa8cfd4b0b
  4.1%         4  core::hint::black_box             hint.rs:482           0x00005efa8cfd4d5a
  4.1%         4  core::iter::range::<impl core::…  range.rs:856          0x00005efa8cfd4b40
  3.1%         3  <core::ops::range::Range<T> as …  range.rs:776          0x00005efa8cfd4bba
  2.0%         2  <core::ops::range::Range<T> as …  range.rs:772          0x00005efa8cfd4b7a
  2.0%         2  <core::ops::range::Range<T> as …  range.rs:780          0x00005efa8cfd4bce
  2.0%         2  testing::hot_c                    main.rs:9             0x00005efa8cfd4d09
  2.0%         2  <i32 as core::iter::range::Step…  range.rs:195          0x00005efa8cfd4ac4
  1.0%         1  core::hint::black_box             hint.rs:483           0x00005efa8cfd4d64

Top frame:
  <core::ops::range::Range<T> as core::iter::range::RangeIteratorImpl>::spec_next
  /home/ali/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/iter/range.rs
  line 773

Enter fullscreen mode Exit fullscreen mode

Everything runs in real time.

Under the hood it uses:

  • Rust
  • libbpf-rs
  • eBPF stack sampling
  • Blazesym for symbolization

What’s next

Right now the output is pretty basic.

Next steps include:

  • building a proper terminal UI (Ratatui)
  • improving symbol aggregation
  • better stack analysis
  • eventually generating flamegraphs

The long-term vision is a tool that feels as simple as:

  • run with pid:
nusku --pid 1234
Enter fullscreen mode Exit fullscreen mode
  • run binary:
nusku ./my-app
Enter fullscreen mode Exit fullscreen mode
  • run with commend:
nusku -c node app.js
Enter fullscreen mode Exit fullscreen mode

and immediately shows you where your program is spending CPU time.


Follow the project

I’ll be posting updates as I build this.

If you're interested in:

  • Rust
  • eBPF
  • performance debugging

follow along.

GitHub repo:
https://github.com/aliamerj/nusku

And if you have ideas or feedback, I’d love to hear it.

Top comments (0)