DTrace

#unix #freebsd #observability #programming

Technical Beauty — Episode 30

In 2001, Bryan Cantrill was debugging a production system at Sun Microsystems. The system was misbehaving. The logs said nothing useful. The metrics showed nothing abnormal. By every observable measure, the system was fine. Except it was not fine. It was doing something wrong, and nobody could see what.

Cantrill had helped build an entirely synthetic system: every instruction, every data structure, every byte was placed there by human beings. And yet the system could not answer the most basic question: what are you doing right now?

That bothered him rather a lot.

The Problem

Debugging production systems has always offered two options, both terrible.

Option one: add logging statements, rebuild, redeploy, wait for the issue to recur. This works in development. In production, it means restarting a database serving ten thousand connections to add a printf. The cure is worse than the disease. And if you guessed wrong about where to log, you restart again.

Option two: attach a debugger. Stop the process, inspect memory, step through code. This works beautifully on a developer's laptop. On a production trading system processing four million transactions per hour, stopping the process is not debugging. It is an outage.

Unix has had syscall tracing since the 1990s. truss on Solaris and FreeBSD. strace on Linux, written by Paul Kranenburg for SunOS in 1991, ported to Linux by Branko Lankester. Both trace system calls by intercepting them via ptrace. Both work by stopping the traced process for every single syscall, switching context to the tracer, recording the call, and resuming. The overhead is brutal. Running strace on a production web server is rather like performing surgery whilst repeatedly switching off the lights.

strace traces one process. It cannot follow what happens inside the kernel. It cannot correlate across services. It tells you what happened, but not why. And it makes everything slower in the process of telling you.

The Solution

DTrace does not stop anything.

Bryan Cantrill, Mike Shapiro, and Adam Leventhal designed and built DTrace at Sun Microsystems. The original ideation began in the late 1990s. The implementation was first integrated into Solaris in September 2003. It shipped with Solaris 10 in January 2005. Development took approximately four years of focused engineering.

The tagline: "Concise answers to arbitrary questions about the system."

DTrace works by compiling probe scripts, written in a purpose-built language called D, into safe bytecode that is injected directly into the running kernel. When a probe fires, the bytecode executes in kernel context: no context switch, no process stop, no overhead beyond the probe itself.

When a probe is not enabled, the overhead is zero. Not low. Not negligible. Zero. The original machine instruction runs unmodified. The probe point does not exist until you enable it. This is not an optimisation. It is the architecture.

The Language

The D language is deliberately constrained. It cannot allocate memory. It cannot loop infinitely. It cannot dereference invalid pointers. It cannot modify kernel state. It cannot crash the system.

These are not limitations. They are the reason DTrace is safe for production. The language was designed so that any valid D program is guaranteed not to harm the system it observes. Safety is not a runtime check. It is a compile-time invariant.

A DTrace one-liner to count system calls by process on a running FreeBSD server:

dtrace -n 'syscall:::entry { @[execname] = count(); }'

No recompilation. No restart. No risk. The answer appears whilst the system continues serving traffic.

The Proof

In 2006, DTrace won the Wall Street Journal's Technology Innovation Award, Gold. Not for a consumer product. Not for an app. For a kernel instrumentation framework that lets you ask a running operating system what it is doing. One does find that rather encouraging about the state of technology journalism, at least in 2006.

FreeBSD integrated DTrace in 2008. It ships in base. No packages to install, no modules to compile. macOS has included DTrace since Leopard (2007): every Mac ships with a kernel-level dynamic tracing framework that most users will never know exists. illumos, the open-source continuation of OpenSolaris, inherited DTrace from its birthplace.

On Linux, DTrace's influence is unmistakable. eBPF (extended Berkeley Packet Filter) and its higher-level interface bpftrace provide similar capabilities: safe bytecode execution in the kernel, dynamic instrumentation, production-safe tracing. Brendan Gregg, who wrote the definitive DTrace book, now writes the definitive eBPF material. The ideas travelled. The architecture was validated by imitation.

The Philosophy

Cantrill articulated the core insight clearly: "We had created an entirely synthetic system, yet we could not ask ourselves what the software was doing. There was a real lack of observability in the system."

Observability is not logging. Logging records what the developer anticipated might go wrong. Observability answers questions the developer never thought to ask. The difference is the difference between a searchlight and the sun.

DTrace does not require you to predict your questions in advance. You do not instrument your code for DTrace. You do not add tracing libraries. You do not configure exporters. The probes exist at every function boundary, every syscall, every I/O operation. You simply ask.

This is what makes DTrace beautiful: it treats the running system as something that should be fully transparent to the operator. Not partially visible through pre-configured dashboards. Not indirectly observable through aggregated metrics. Directly, completely, safely observable. In production. Under load. Right now.

The Point

Twenty-three years after its first integration, DTrace remains the standard against which production tracing tools are measured. Its core design has not changed because it did not need to. Zero overhead when disabled. Safe by construction. Concise answers to arbitrary questions.

Bryan Cantrill is now CEO of Oxide Computer Company, building rack-scale computers with the same philosophy: the system should be fully observable, fully debuggable, and fully understood. The principle survived the company that created it. One does find that rather beautiful.

DTrace. Ask the system. It will answer.

Read the full article on vivianvoss.net →

By Vivian Voss — System Architect & Software Developer. Follow me on LinkedIn for daily technical writing.