Shayan Holakouee

Posted on Apr 27

Rust's Zero-Cost Abstractions, What Monomorphization Actually Does to Your Code

#rust #backend #softwaredevelopment #webdev

"Zero-cost abstractions" is one of Rust's core promises. You can write generic, composable, high-level code and the compiler will produce the same machine code as if you had written the low-level version by hand. Most people accept this claim and move on. It is worth actually understanding the mechanism behind it, because it has real consequences for compile times, binary size, and performance in ways that will eventually surprise you.

The Problem Abstractions Usually Create

In most languages with generics or polymorphism, the runtime pays for flexibility. Java's generics are erased at compile time and everything becomes Object, which means heap allocation and virtual dispatch. Python functions accept anything but pay for it with dynamic type lookups on every operation. Even C++ with templates avoids runtime cost, but the mechanism it uses is the same one Rust uses.

The question is: how do you write one function that works on many types, without inserting a layer of indirection at runtime?

What Monomorphization Is

Monomorphization means the compiler generates a separate, concrete copy of a generic function for each type it is actually called with. You write one function. The compiler writes many.

fn largest<T: PartialOrd>(a: T, b: T) -> T {
    if a > b { a } else { b }
}

fn main() {
    largest(1i32, 2i32);
    largest(1.0f64, 2.0f64);
}

After monomorphization, the compiler has effectively produced this:

fn largest_i32(a: i32, b: i32) -> i32 {
    if a > b { a } else { b }
}

fn largest_f64(a: f64, b: f64) -> f64 {
    if a > b { a } else { b }
}

Each version is fully concrete. The CPU never sees a generic type. There is no branching on type, no vtable lookup, no boxing. The generated assembly for largest_i32 is exactly what you would get if you had written it for i32 from the start.

You can verify this yourself. Run cargo build --release and inspect the output with cargo-show-asm or objdump. Each monomorphized version shows up as a distinct symbol in the binary.

How Traits Fit Into This

Traits are the constraint mechanism. When you write T: PartialOrd, you are telling the compiler what operations are available on T. At the call site, the compiler knows the concrete type, so it knows the exact implementation of PartialOrd to use. It inlines it directly.

This is what makes iterator chains fast:

let sum: i32 = (0..1000)
    .filter(|x| x % 2 == 0)
    .map(|x| x * x)
    .sum();

Each of filter, map, and sum is generic over the iterator type. After monomorphization, the entire chain collapses into a single loop with no intermediate allocation and no function call overhead. The compiler sees through every layer. It knows the concrete type at each step because you started with Range<i32>, and each adapter wraps the previous one in a new concrete type.

Compare this to how you would write the equivalent in a language without monomorphization. You would either materialize an intermediate collection at each step, or you would use some form of dynamic dispatch with its associated overhead.

The Actual Cost: Code Bloat and Compile Time

Zero-cost at runtime does not mean zero-cost overall. You pay elsewhere.

Binary size. Every unique combination of generic function and concrete type produces a new copy in the binary. If you call Vec::new::<String>(), Vec::new::<i32>(), and Vec::new::<MyStruct>(), you get three copies of the Vec::new implementation. In a large codebase with many generic types, this adds up. It is a real concern for embedded targets and WebAssembly where binary size matters.

Compile time. Monomorphization happens during code generation, which is one of the most expensive phases of Rust compilation. Every unique instantiation of a generic function is work the compiler has to do. This is a significant contributor to Rust's notoriously slow compile times. When you see a crate that takes thirty seconds to compile, a large part of that is the code generation phase expanding generics.

Instruction cache pressure. Multiple copies of similar functions means more machine code in memory. If those functions are all hot paths called in rapid succession, you can end up with instruction cache misses that would not exist with a single shared implementation.

Static Dispatch vs Dynamic Dispatch

Monomorphization is static dispatch. The call target is resolved at compile time. Rust also supports dynamic dispatch through trait objects, and the tradeoff is exactly what you would expect.

// static dispatch: monomorphized, zero runtime overhead
fn process_static<T: Draw>(item: T) {
    item.draw();
}

// dynamic dispatch: single function, vtable lookup at runtime
fn process_dynamic(item: &dyn Draw) {
    item.draw();
}

With &dyn Draw, the compiler generates one function. At runtime, a vtable pointer accompanies the reference, and the call goes through an indirect jump. This costs roughly one extra memory lookup per call. In exchange, you get a single copy of process_dynamic in the binary, and you can put different concrete types into the same collection.

The practical rule: use static dispatch (generics) for performance-critical code where all types are known at compile time. Use dynamic dispatch (dyn Trait) when you need heterogeneous collections, when the concrete type is only known at runtime, or when binary size and compile time matter more than the last bit of call overhead.

Where Monomorphization Gets Interesting: Trait Objects in Generic Contexts

A subtlety that catches people: trait objects themselves can be used as concrete types in generic contexts, which then get monomorphized.

fn process<T: Draw>(items: Vec<T>) { ... }

// This is one monomorphization
process::<Box<dyn Draw>>(vec![...]);

Here T is Box<dyn Draw>. The function is monomorphized for that concrete type. Inside the function, calls through T still go through the vtable because T is a trait object. You have static dispatch to the function, and dynamic dispatch inside it. Understanding this layering matters when you are trying to reason about where the overhead actually lives.

The impl Trait Shorthand

impl Trait in argument position is syntactic sugar for a generic parameter:

// these are equivalent
fn process(item: impl Draw) { ... }
fn process<T: Draw>(item: T) { ... }

Both are monomorphized. Each unique concrete type passed to process produces a new instantiation. impl Trait is not dynamic dispatch despite looking less generic at first glance.

impl Trait in return position is different:

fn make_drawable() -> impl Draw {
    Circle::new()
}

This tells the caller "you will get something that implements Draw, but I am not telling you the concrete type." The concrete type is fixed at compile time and the function is not generic over it. The caller cannot name the type but the compiler knows it, so there is no dynamic dispatch. It is a way to hide implementation details while keeping static dispatch.

Inspecting the Output

The best way to build intuition for what monomorphization produces is to look at actual assembly. The cargo-show-asm tool makes this straightforward:

cargo install cargo-show-asm
cargo asm --release your_crate::your_function

Run it on a generic function called with two different types and you will see two separate assembly listings with distinct symbol names. Run it on the iterator chain example and you will see a tight loop with no function calls. The gap between what you wrote and what the CPU executes is much smaller in Rust than in most languages, and this tool makes that concrete rather than theoretical.

The Mental Model to Carry Forward

When you write generic Rust code, think of yourself as writing a template that the compiler will stamp out once per concrete type. Each stamp is as fast as hand-written code for that type. The tradeoff you are making is compilation time and binary size in exchange for runtime performance and the ability to write the logic once.

This is why Rust can match C performance while supporting generics, iterators, and trait-based abstractions. The abstraction cost is paid at compile time, not runtime. Zero-cost does not mean free. It means the cost is moved to a place where it hurts less.

DEV Community