Ali Amer

Posted on Apr 8

Why does performance debugging still suck so much?

#rust #performance #architecture #devtool

Recently I got laid off.

Weirdly, I didn’t feel stressed. I actually felt… excited.

For a long time I’ve wanted to build something meaningful in open source, but work always got in the way. Suddenly I had the time to try.

So I asked myself a question:

Why does performance debugging on Linux still suck?

I remember a moment back in college that stuck with me.

One semester we had to build a project using .NET on Windows.

It was the first time I had to seriously use Windows for development.

Honestly, I never liked .NET very much.

But there was one thing I really liked: Visual Studio.

When you run an application in the debugger, Visual Studio shows you live metrics:

CPU usage
memory usage
allocations
performance spikes

And the coolest part was watching what happened when you interacted with the app.

You could click a button in the UI and instantly see:

CPU usage spike
memory allocations change
performance graphs jump

That made performance debugging feel visual and intuitive.

You could literally see:

“I clicked this button… and something expensive just happened.”

Then you start digging:

Which function caused the spike?
Why did memory jump here?
What part of the code is slow?

It turned performance optimization into a kind of investigation.

And I loved that experience.

But there was a problem

Years later, working mostly on Linux systems, I realized something strange.

We have powerful tools:

perf
bpftrace
flamegraphs
eBPF

But none of them feel as simple and immediate as that Visual Studio experience.

Most tools are:

hard to set up
hard to understand
designed for experts

So I decided to build one.

The idea

I’m building a tool called Nusku.

The goal is simple:

A fast, modern performance profiler built with Rust + eBPF.

It should:

run instantly
require almost no setup
work on production systems
show useful information immediately

No complicated configuration.

No massive dashboards.

Just run it and see where your CPU time is going.

What I’ve built so far

So far Nusku can:

Attach to a running process
Sample CPU usage using eBPF perf events
Capture user-space stack traces
Symbolize addresses into real function names
Aggregate hot functions in real time
Show CPU usage and memory usage in a live terminal view

Example output:

── PID 132612 ── 98 samples ── CPU  99.0% ── RSS 2.0 MiB ── VIRT 3.1 MiB ──
     %     COUNT  FUNCTION                          SOURCE                           ADDRESS
────────────────────────────────────────────────────────────────────────────────────────────────
 27.6%        27  <core::ops::range::Range<T> as …  range.rs:773          0x00005efa8cfd4b95
 11.2%        11  testing::hot_c                    main.rs:7             0x00005efa8cfd4cc7
 10.2%        10  <i32 as core::iter::range::Step…  range.rs:197          0x00005efa8cfd4acd
 10.2%        10  <core::ops::range::Range<T> as …  range.rs:775          0x00005efa8cfd4b9f
  9.2%         9  testing::hot_c                    main.rs:8             0x00005efa8cfd4cf0
  6.1%         6  <core::ops::range::Range<T> as …  range.rs:771          0x00005efa8cfd4b69
  5.1%         5  <i32 as core::iter::range::Step…  range.rs:198          0x00005efa8cfd4b0b
  4.1%         4  core::hint::black_box             hint.rs:482           0x00005efa8cfd4d5a
  4.1%         4  core::iter::range::<impl core::…  range.rs:856          0x00005efa8cfd4b40
  3.1%         3  <core::ops::range::Range<T> as …  range.rs:776          0x00005efa8cfd4bba
  2.0%         2  <core::ops::range::Range<T> as …  range.rs:772          0x00005efa8cfd4b7a
  2.0%         2  <core::ops::range::Range<T> as …  range.rs:780          0x00005efa8cfd4bce
  2.0%         2  testing::hot_c                    main.rs:9             0x00005efa8cfd4d09
  2.0%         2  <i32 as core::iter::range::Step…  range.rs:195          0x00005efa8cfd4ac4
  1.0%         1  core::hint::black_box             hint.rs:483           0x00005efa8cfd4d64

Top frame:
  <core::ops::range::Range<T> as core::iter::range::RangeIteratorImpl>::spec_next
  /home/ali/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/iter/range.rs
  line 773

Everything runs in real time.

Under the hood it uses:

Rust
libbpf-rs
eBPF stack sampling
Blazesym for symbolization

What’s next

Right now the output is pretty basic.

Next steps include:

building a proper terminal UI (Ratatui)
improving symbol aggregation
better stack analysis
eventually generating flamegraphs

The long-term vision is a tool that feels as simple as:

run with pid:

nusku --pid 1234

run binary:

nusku ./my-app

run with commend:

nusku -c node app.js

and immediately shows you where your program is spending CPU time.

Follow the project

I’ll be posting updates as I build this.

If you're interested in:

Rust
eBPF
performance debugging

follow along.

GitHub repo:
https://github.com/aliamerj/nusku

And if you have ideas or feedback, I’d love to hear it.

DEV Community