Finding Blocking Code in Async Rust Without Changing a Single Line

#opensource #rust #performance #async

You know the symptoms. Latency spikes under load. Throughput that should be higher. A Tokio runtime that's doing less work than it should be, and you can't see why. Something is blocking a worker thread, starving the other tasks, and nobody's throwing an error about it.

The standard advice is tokio-console. Add console-subscriber to your dependencies, rebuild, redeploy, reproduce the problem, and look at task poll times. It works well. It also requires code changes, a rebuild, and a redeployment, which means it's not what you reach for when staging is melting and you need answers now.

The other option is perf. Attach to the process, collect stack traces, generate a flamegraph, and interpret a wall of unsymbolized frames. It'll tell you everything that's happening on every thread. The signal-to-noise ratio for "which Tokio worker is blocked and by what" is not great.

There's a gap between those two. A tool that attaches to a running Tokio process, finds the blocking code, and shows you the result, without touching your source.

What Is hud?

hud is an eBPF-based profiler for Tokio applications, built by cong-or. You give it a process name or PID, and it hooks into the Linux scheduler via eBPF tracepoints to detect when Tokio worker threads experience high scheduling latency. When a worker is off-CPU longer than a configurable threshold (default 5ms), hud captures a stack trace, resolves it against DWARF debug symbols, and shows you what was on the stack. No recompile, no instrumentation, no code changes.

It runs as a real-time TUI or in headless mode with Chrome Trace JSON export. About 147 stars. For the problem it solves, it should have more.

The Snapshot


Project	hud
Stars	~147 at time of writing
Maintainer	Solo developer, 178 commits and 15 releases in 3 months
Code health	Clean workspace, good module boundaries, well-documented internals
Docs	Five dedicated doc files (architecture, development, exports, troubleshooting, tuning)
Contributor UX	Both PRs merged within minutes. Would contribute again.
Worth using	Yes, if you run Tokio on Linux and have ever wondered "what's blocking?"

Under the Hood

The project is a Rust workspace with three crates. hud-ebpf (~400 lines, #![no_std]) runs inside the kernel: a sched_switch tracepoint for off-CPU detection and a perf_event hook sampling at 99 Hz for stack traces. hud-common (~330 lines) defines the shared types that cross the kernel/userspace boundary. hud (~8,700 lines) is the userspace application: event processing, DWARF symbol resolution, a ratatui TUI, and Chrome Trace export. The whole thing builds with cargo xtask build-ebpf for the eBPF side and a regular cargo build for userspace.

The interesting engineering starts with worker discovery. Tokio worker threads need to be identified before hud can filter events to just the runtime. This turns out to be harder than it sounds. The first problem is /proc's 15-character TASK_COMM_LEN limit, which truncates tokio-runtime-worker-0 to tokio-runtime-w. The second is custom runtimes: if you called thread_name("my-pool"), the default prefixes don't match. hud handles this with a 4-step fallback chain: explicit prefix via --workers, default Tokio prefixes, stack-based classification (sample for 500ms and look for Tokio scheduler frames), and a largest-thread-group heuristic. That last one just picks the biggest group of threads following a {name}-{N} naming pattern.

Frame classification has its own complexity. Rust statically links dependencies into the main binary, so being "inside the executable" doesn't distinguish your code from tokio's code from serde's code. hud uses a 3-tier classifier: file path patterns first (.cargo/registry/ means third-party, .rustup/toolchains/ means stdlib), then function name prefixes (tokio::, std::, hyper::), then memory range as a last resort. The TUI highlights user code in green and dims everything else.

The README is refreshingly honest about limitations. It measures scheduling latency, which is a symptom of blocking, not the blocking itself. It captures the victim's stack, not the blocker's. System CPU pressure can cause false positives. The comparison table with tokio-console and Tokio's built-in detection doesn't oversell hud. It positions it as a triage tool: narrow down the suspects, then dig deeper with instrumentation if needed.

The rough spots are minor. Test coverage is decent for the core modules (classification, worker discovery, hotspot analysis) but thin for the event processing pipeline and TUI rendering. The project is three months old and iterating fast (15 releases), so some gaps are expected. The docs make up for it: five dedicated files covering architecture, development workflow, export format, troubleshooting, and threshold tuning. That's unusual care for a project at this scale.

The Contribution

I submitted two PRs, targeting different layers of the stack.

The first was test coverage for the blocking pool filter. Tokio's spawn_blocking creates threads that share the same Inner::run function at the bottom of their stacks as actual worker threads. This is because Tokio bootstraps workers through the blocking pool mechanism. The distinguishing factor is that workers also have scheduler::multi_thread::worker frames higher up the stack. The is_blocking_pool_stack() function filters on this distinction to suppress spawn_blocking noise from the TUI.

This function went through four release iterations (v0.4.2 through v0.5.0) in response to a bug report where spawn_blocking tasks were showing up as false positives. The maintainer shipped multiple fix releases in rapid succession. But the function had zero test coverage. I added 9 tests covering the core logic: genuine blocking pool stacks, genuine worker stacks, empty stacks, partial matches, closure wrappers, and two realistic deep-stack scenarios. I bundled in doc fixes where TROUBLESHOOTING.md listed 3 worker discovery steps instead of the actual 4, and where the README said "x86_64 architecture" while every other doc said "x86_64/aarch64."

The second PR was an eBPF fix. The get_cpu_id() function in the kernel-side code always returned 0, with a TODO comment saying "aya-ebpf doesn't expose bpf_get_smp_processor_id directly yet." It does. The helper is re-exported through pub use gen::* in the aya-ebpf helpers module, but it's #[doc(hidden)], so it never shows up in the generated docs. The fix was adding an import and replacing the stub with the real call. Three lines changed. Every exported trace event was silently reporting the wrong CPU core.

Both PRs were merged within minutes. The codebase was easy to navigate: clear module boundaries, descriptive file names, good internal documentation. The eBPF side requires nightly Rust and bpf-linker, which adds setup friction, but the build process is documented and worked on the first try.

The Verdict

hud is for Rust developers running Tokio on Linux who want to understand what's blocking their runtime without adding instrumentation. The workflow is sudo hud my-app and you're looking at results. If you've ever stared at a flamegraph trying to figure out which of those Tokio frames is yours, hud does that filtering for you.

The project is young (three months) and solo-maintained, but the trajectory is strong. The commit history shows a developer who responds to bug reports with same-day fix releases, who writes honest documentation about tradeoffs, and who merges external contributions without friction. The codebase is clean enough that I was reading eBPF kernel code within an hour of cloning the repo. That doesn't happen by accident.

What would push hud further? More metrics in the TUI (per-CPU breakdown, timeline visualization of blocking events), broader async runtime support beyond Tokio, and CI integration for the headless export mode (pipe the JSON through jq for regression detection). The architecture supports all of this. The Metric approach is indirect by design, and the project is honest about that. What it offers in return is zero-friction access to information that would otherwise require a rebuild.

Go Look At This

If you run Tokio on Linux, try hud. Download the pre-built binary, point it at a running process, and see what shows up. If nothing does, your runtime is clean. If something does, you just saved yourself a rebuild.

Star the repo. Here are the tests I added for the blocking pool filter and the eBPF fix for the cpu_id stub. Both small, both merged.

This is Review Bomb #7, a series where I find under-the-radar projects on GitHub, read the code, contribute something, and write it up. If you know a project that deserves more eyeballs, drop it in the comments.

This post was originally published at wshoffner.dev/blog. If you liked it, the Review Bomb series lives there too.