Simmypeet

Posted on Jan 10

Rust 2.0: A Thought Experiment on Safe Specialization

#rust #theory #computerscience #programming

Specialization is a long-awaited feature in Rust that has never been stabilized. There have been many attempts to land this feature, but each comes with its own set of shortcomings and soundness holes.

In this blog post, I want to share an idea for impl specialization that has been floating around in my mind. We will walk through the trade-offs and try to explain exactly why specialization is such a challenging feature to add to the language.

Why Specialization?

One of the most compelling use cases for specialization is optimization. A concrete example is cloning a Vec<T>.

In general, cloning a Vec<T> works by iterating through each element and calling .clone() on it.

With specialization, we could write an extra, specialized case where T: Copy. When the compiler knows that T implements Copy, the Vec<T> implementation can optimize the cloning process by skipping the loop and using memcpy to bulk-copy the underlying memory buffer.

Primer: What's So Unsound About Specialization

The main specialization feature today hasn't been stabilized for many years
because it has some nuances with lifetimes. Mainly, it can lead to memory-safety
violations like use-after-free. I'll show you an example of such unsoundness
and then I'll explain why min_specialization has come to the rescue and
explores some alternatives.

We'll start of with this trait and two implementations:

#![feature(specialization)]

pub trait ItsABomb { fn boom(&mut self); }

impl<T> ItsABomb for T {
    default fn boom(&mut self) {
        println!("All is safe!");
    }
}

impl<T> ItsABomb for (T, T) {
    fn boom(&mut self) {
        std::mem::swap(&mut self.0, &mut self.1);
    }
}

Here's the example, simple enough. We have a trait ItsABomb with two
implementations.

The first general implementation works for all types T.
The second specialized implementation works for tuples of the form (T, T).

Here's the interesting part: the second implementation sees a tuple of
two identical types so it swaps the two elements in place.

Now, let's see how this can lead to unsoundness:

fn call_a_bomb<U>(bomb: &mut U) {
    bomb.boom();
}

fn create_a_bomb() -> &'static str {
    let local = "I'm a local string".to_string();

    let local_str = local.as_str();
    let static_str = "I'm a static string";

    // you can imagine both are of type (&'local str, &'static str)
    let mut pair = (local_str, static_str);

    // this calls the specialized impl that swaps the two elements
    call_a_bomb(&mut pair);

    // the second element becomes a "I'm a local string" now!
    println!("After boom: {}", pair.1);

    pair.1 // returning a reference to local data with no problem
}

fn main() {
    let leaked = create_a_bomb();

    // use-after-free occurs here!
    println!("Leaked string: {}", leaked);
}

Let's try put it all together and sees what happens here:

After boom: I'm a local string
Leaked string: #�'���+�

As of today (Jan 2025), the code compiles with flying colors and runs without
warnings (only saying that specialization is unsound, which it is :D), but
surely we hit a use-after-free at runtime!

If you have a chance to run this code, try it out yourself!, you'll see that the
it prints garbage for the leaked string.

What's Going On Here?

Let's break down what happened step by step:

create_a_bomb function create a tuple of two references: one to a local string and another to a static string.
You could imagine the type of pair is (&'local str, &'static str).
create_a_bomb calls call_a_bomb, passing a mutable reference to pair.
Inside call_a_bomb, the specialized implementation of ItsABomb is invoked, which swaps the two elements of the tuple.
After the swap, pair.1 now points to the local string.
However, create_a_bomb doesn't know that the swap has occurred. It still sees that pair.1 has a type of &'static str.
create_a_bomb returns pair.1, which is now a reference to the local string

What Should Have Been Done?

Here's the kicker, if you try to add a bound to call_a_bomb like this:

fn call_a_bomb<U: ItsABomb>(bomb: &mut U) {
    bomb.boom();
}

And compile with the exact same code, the compiler will refuse to compile it:

error[E0515]: cannot return value referencing local variable `local`
  --> src/main.rs:38:5
   |
26 |     let local_str = local.as_str();
   |                     ----- `local` is borrowed here
...
38 |     pair.1 // returning a reference to local data with no problem
   |     ^^^^^^ returns a value referencing data owned by the current function

The compiler now correctly rejects the code. Why does a simple bound fix the
issue?

The reason is that by adding the bound U: ItsABomb, we force the compiler to
resolve the impl for U at the call site (inside create_a_bomb), not
inside call_a_bomb.

Inside create_a_bomb, the compiler can see more complete picture of the types
involved, including the lifetimes.

Here's why this prevents the unsoundness:

When create_a_bomb calls call_a_bomb, it sees that there's an obligation to find an impl for ItsABomb for the type (&'local str, &'static str).
The type-checker with complete knowledge of the lifetimes involved, sees that the type (&'local str, &'static str) is a tuple of two different lifetimes.
The type-checker however picks the (T, T) implementation, but with additional constraint.
- Some might confuse how can (T, T) match (&'local str, &'static str) since 'local and 'static are different lifetimes. The answer is that generally when two types are being unified, the compiler is allowed to ignore the differences in lifetimes, but it generates additional constraints related to lifetimes to be solved later.
In this case, type-checker makes two types equal, it coerces (shrinks) the 'static lifetime to 'local, resulting in the type (&'local str, &'local str).
Now, when create_a_bomb tries to return pair.1, the compiler correctly identifies that pair.1 has type &'local str, preventing the use-after-free.

Applying the same logic to our original unsound example, here's why the compiler
didn't catch the issue:

Inside call_a_bomb, when we call bomb.bomb(), the compiler tries to check that "are there any impl for ItsABomb for type U?"
However, at this point, the compiler has no knowledge of what U is but it sees that there is a general implementation impl<T> ItsABomb for T.
The compiler eagerly concludes that, "Yes! There's an implementation for U!" It might not exactly know which one, but it knows that there's at least one.
The compiler type-checks call_a_bomb without any further complains.
This time, when create_a_bomb calls call_a_bomb, the compiler has no obligation to resolve the impl for ItsABomb for type (&'local str, &'static str).
Under incomplete knowledge, type-checker doesn't do anything between 'local and 'static lifetimes, treating them as completely separate.
The type-checker allows create_a_bomb to return pair.1 as &'static str, leading to the use-after-free.
At monomorphization time, when everything is concrete (no more generics), the specialized implementation is preferred, leading to generating the code that swaps the two elements.

The Problem

The root cause of the unsoundness is that the type-checker doesn't see the
complete picture when resolving the impl for a generic type parameter.

As we've pointed out, in the call_a_bomb<U> function, the compiler has no
knowledge of what U is but it sees that there is a general implementation
impl<T> ItsABomb for T so it type-checks call_a_bomb without any further
obligations.

When create_a_bomb calls call_a_bomb, the type-checker sees that this
function can be called with any type U and without any further obligations to
the caller. But in reality, the fact that it solves to the specialized (T, T)
implementation it already introduces additional constraints on the lifetimes
that the type-checker is unaware of.

How Could We Fix This?

At this point, we have shown an example of unsound specialization and explained
one of the possible fixes (adding a bound to defer the resolution to the caller).

The Rust team has acknowledged this unsoundness and this is one of the reasons
why the current specialization feature is not stabilized and advocates for
using min_specialization instead. If we were to use min_specialization, the
above compiler correctly rejects the code but for a different reason.

In the previous example that we added a bound to call_a_bomb, the compiler
rejects the code because it sees that we're about to return a local reference
as a 'static reference.

But under min_specialization, even without adding the bound, the compiler
rejects the code under the reason that the specialized implementation uses
generic parameter T twice, which could potentially introduce constraints that
the type-checker might miss.

In my proposed design for specialization, I'd like to take the idea of "making
the resolution transparent to the caller" (like how we added a bound to
call_a_bomb) and make it a required part of the specialization system.

Specialization with Deferred Binding

We introduce a new rule: The compiler is only allowed to resolve an impl when the types are "Grounded".

The term "grounded" refers to a property of a particular type. A type is said to be grounded if there aren't any generic parameters in it.

Vec<i32> is Grounded (Fixed structure).
Vec<T> is Not Grounded (T can be anything).

This property is important because if a type is not grounded (like Vec<U>), its structure is volatile, it could change into a specialized form in the future.

After applying this new rule, the previous call_a_bomb code example would produce an error:

fn call_a_bomb<U>(bomb: &mut U) {
    // Error! `U` is not grounded here!
    // It would produce error like:
    // `U: ItsABomb` cannot be satisfied
    bomb.boom();
}

Even though there's an implementation impl<T> ItsABomb for T, the compiler
refuses to resolve it because U is not grounded.

So what the solution is similar to what we did before: we add a bound to defer
the resolution to the caller!

fn call_a_bomb<U: ItsABomb>(bomb: &mut U) {
    // Error! `U` is not grounded here!
    // It would produce error like:
    // `U: ItsABomb` cannot be satisfied
    bomb.boom();
}

Conceptually, call_a_bomb is now saying: "Dear caller, please find the impl
for ItsABomb for type U and provide it to me."

Now, when create_a_bomb calls call_a_bomb, the compiler can resolve the
impl for (&'local str, &'static str) because this type is grounded.

Note: Under this rule, lifetime parameters are ignored when determining groundedness.

fn call_a_bomb<U: ItsABomb>(bomb: &mut U) { ... }

fn create_a_bomb() -> &'static str {
    // ...

    let mut pair = (local_str, static_str);

    // the type-checker resolves the `impl` here!
    call_a_bomb(&mut pair);

    // ...
}

When create_a_bomb gets to resolve the impl, it sees that it will use
the specialized (T, T) implementation and generates the additional constraints
on lifetimes (shrinking 'static to 'local), preventing the use-after-free.

The Chain of Delegation

Now we could also extend the chain of obligations further down the stack like so:

pub fn call_a_bomb_deep<U: ItsABomb>(bomb: &mut U) {
    bomb.boom();
}

pub fn call_a_bomb<U: ItsABomb>(bomb: &mut U) {
    call_a_bomb_deep::<U>(bomb);
}

fn create_a_bomb() -> &'static str {
    // ...

    let mut pair = (local_str, static_str);

    // the type-checker resolves the `impl` here!
    call_a_bomb(&mut pair);

    // ...
}

See, whenever the type-checker couldn't solve the impl because the type is not grounded, it forces the caller to provide the implementation instead. And this
chain continues until we reach a grounded type.

The Problem: Delegation Hell

With the Groundedness rule, we achieve sound specialization. However, it creates headaches in generic-heavy codebases.

Consider the standard library type Arc<T>. In Rust today, Arc<T> implements Clone regardless of what T is. Cloning an Arc is just incrementing an atomic counter; it doesn't require anything from T.

// crate: std
pub struct Arc<T> { ... }

// Here, NO bounds on T!
impl<T> Clone for Arc<T> {
    // Just increment atomic counter
}

// crate: app
fn clone_some_arcs<T>(arc: &Arc<T>) {
    let cloned = arc.clone();
}

This is perfectly fine code. However, with the new Groundedness rule, this code would not compile.

Why? Because Arc<T> inside clone_some_arcs<T> is not a grounded type. The compiler refuses to resolve the impl.

Even though we know that Arc's cloning logic is universal and won't change, the compiler is forced to be paranoid. It thinks: "I cannot bind to the standard implementation yet. What if, later down the line, there's a specialized implementation for Arc<i32>?"

Under the rule, we have no choice but to add where Arc<T>: Clone to the function. In a generic codebase, this pollution spreads everywhere:

fn this_is_such_a_pain<T, U>(a: &Arc<T>) {
    // 🤓 Compiler: "Nuh uh! You can't do that ..."
    let a = a.clone();

    // 🤓 Compiler: "Nuh uh! Actually there might be a specialized Option..."
    let b = Option::<U>::default();
}

To fix this, we have to state the obvious in the bounds:

fn this_is_such_a_pain<T, U>(a: &Arc<T>)
where
    Arc<T>: Clone,      // Isn't that obvious?
    Option<U>: Default, // Really?
    SomeInternalStruct<T, U>: Clone, // Now I'm leaking internal details, ugh!
{ ... }

Introduce `final impl`

We need a way to tell the compiler: "Relax, this impl is the FINAL and DEFINITIVE one. There will never be a more specialized version. Pick this one immediately."

Introducing the final impl.

The final impl is a way to mitigate the "paranoid compiler." It has two properties:

The Power: The compiler is allowed to bind this impl eagerly, even if the types are not grounded. This is exactly how Rust behaves today.
The Restriction: A final impl cannot be overridden. No one is allowed to write a more specific implementation that overlaps with it.

This solves the ergonomic issues. By declaring final impl, the compiler knows specialization is impossible, so eager binding is safe.

// crate: std
pub struct Arc<T> { ... }

// impl #1
// This `impl` can't be overridden!
final impl<T> Clone for Arc<T> {
    // ...
}

// crate: app
fn clone_some_arcs<T>(arc: &Arc<T>) {
    // Compiler is safe to bind to `impl #1` eagerly!
    // 1. Compiler sees `Arc<T>` is not grounded.
    // 2. But, it sees `final impl`.
    // 3. It knows this will never change.
    let cloned = arc.clone();
}

This creates two clear camps:

Regular impl: Specializable, but requires Grounded checks (deferred resolution).
final impl: Rigid, but allows eager resolution, improving ergonomics for structural types.

Some Further Crazy Ideas: Ranking

I want to throw out another wild idea to solve the final challenge of trait resolution: Ambiguity.

Consider a trait Log that is implemented for types that are Display, but falls back to Debug if Display isn't available.

// Impl #1: The 'preferred' one
impl<T: Display> Log for T {
    fn log(&self) { println!("{}", self); }
}

// Impl #2: The 'fallback' one
impl<T: Debug> Log for T {
    fn log(&self) { println!("{:?}", self); }
}

If a type implements both Display and Debug, the compiler has to guess. Since neither type is "more specialized" than the other, this is an ambiguity error.

What if we could assign an explicit ranking (where lower is higher priority) to the implementations?

// Rank 1 (High Priority)
#[rank(1)]
impl<T: Display> Log for T { ... }

// Rank 2 (Low Priority)
#[rank(2)]
impl<T: Debug> Log for T { ... }

This would only be allowed for regular impls, acting as a tie-breaker when type specificity is equal.

The revised resolution flow looks something like this:

def pick_regular_impl(available_impls, type, env):
    result_impl = None

    for candidate in available_impls:
        if not candidate.unify_with(type):
            continue

        if not candidate.is_bound_satisfied(env):
            continue

        # First valid match found? Keep it.
        if result_impl is None:
            result_impl = candidate
            continue

        # Compare specificity first
        order = candidate.order(result_impl)

        if order == Order.MoreSpecialized:
             result_impl = candidate

        # If equally specialized, use rank as a tie-breaker
        elif order == Order.Equal:
            if candidate.rank < result_impl.rank:
                 result_impl = candidate
            elif candidate.rank == result_impl.rank:
                raise AmbiguousImplError()

    return result_impl

That's it! Is this sound? I'm not sure 🤣. But it looks promising.

Reflection: How This Fits into Rust Today

All of the ideas discussed here would be breaking changes if applied to the language defaults.

However, the concept might not be that "alien" to the current state of Rust. In Rust today, trait implementation behaves exactly like the final impl proposed in this post (eager, non-overlapping).

We could imagine a path where we add a new keyword:

Keep impl as is: eager binding, not specializable.
Make default impl specializable, requiring groundedness checks.

However, it is still difficult to retrofit this feature into the standard library.

For example, if we were to enable Vec<T> cloning specialization, we would change the current impl to a default impl:

default impl<T: Clone> Clone for Vec<T> {
    // general cloning
}

impl Clone for Vec<u8> {
    // optimized memcpy version
}

Any time we call clone on a generic vector, we would have to add an explicit bound like where Vec<T>: Clone instead of just where T: Clone.

What's the catch?

In Rust today, the following code is valid because there's an implementation impl<T: Clone> Clone for Vec<T>.

fn clone_vec<T>(a: &Vec<T>)
where T: Clone
{
    let cloned = a.clone();
}

Under our new rules, because the implementation is no longer allowed to bind eagerly, the compiler isn't allowed to use the lazy impl for Vec<T> because Vec<T> is not grounded.

To fix this, we would have to rewrite the function signature:

fn clone_vec<T>(a: &Vec<T>)
where Vec<T>: Clone // CHANGED! Bound is on the container, not the content.
{
    let cloned = a.clone();
}

This means that simply enabling specialization for Vec would break every single function in the Rust ecosystem that clones a generic vector!

The Ugly

We've discussed all the good parts of this new specialization system, but what
are the downsides?

I think one of the biggest downsides is that it makes generic programming more
verbose and most importantly, it potentially leaks implementation details.

Taking from the previous clone_vec example, you can imagine that cloning
vector is something internal to the API of a library. However, with this new
specialization system, the library author is forced to expose the fact that
cloning a vector is done internally and forced to write where Vec<T>: Clone in
the public API instead of just where T: Clone.

The final impl does help in some cases, but it doesn't work when the trait
we use is specializable itself.

Bonus: Lifetimes

In this section I want to talk a bit about lifetimes and how I imagine whole system interacts with each other.

Generally, in Rust compiler, the lifetimes are only relevant during type-checking and borrow-checking, after that, the compiler erases all the lifetimes, the compiler sees &'a i32 and &'static i32 as the exact same thing: a pointer to an integer. This means that, up to the monomorphization stage, the compiler doesn't see any lifetimes in the function body.

Therefore, to make sure that during the monomorphization phase the compiler can still correct resolve the correct impl while having none knowledges related to lifetimes. We must ensure that the impl selection can't depend on the lifetime.

First, we treat lifetimes as irrelevant to type specificity and groundedness.

trait Test {}

// I want this impl to be the general case
default impl<'a, T> Test for &'a T {}

// I want this impl to be the specialized case
impl<'a, T> Test for &'static T {}

In this code example, we would treat them as ambiguous implementation. Both &'a T and &'static T are considered to have the same specificity.

Here's an another example:

trait Fizz { fn fizz(self); }

default impl<T: 'static> Fizz for &'static T { ... }

pub fn use_test<'a>(a: &'a i32) {
    // immediately resolves to the above impl with additional
    // `'a: 'static` bound. `'a` isn't considered in groundedness checking
    a.fizz();
}

Second: this system always consider different lifetimes equal.

Consider following this example:

trait Fizz { fn fizz(self); }

// the general impl
default impl<T> Fizz for T { fn fizz(self) { println("general"); }

// the specialized impl
impl<T> Fizz for (T, T) { fn fizz(self) { println("specialized"); }

fn use_fizz_weirdly<'a>(param: (&'a i32, &'static i32)) {
    // what should this be resolved to?
    param.fizz();
}

Under this system, I propose that it should be the specialized implementation even though the 'a and 'static lifetimes are different.

Surprisingly we treat &'a i32 and &'static i32 as equal! but with a twist: it generates further constraints.

In this case, it would consider to shrink the 'static lifetime instead because it's covariant.

In conclusion, these restrictions make the specialization related to lifetimes entirely impossible, however, this make the system sound in presence of lifetimes.

DEV Community

Rust 2.0: A Thought Experiment on Safe Specialization

Why Specialization?

Primer: What's So Unsound About Specialization

What's Going On Here?

What Should Have Been Done?

The Problem

How Could We Fix This?

Specialization with Deferred Binding

The Chain of Delegation

The Problem: Delegation Hell

Introduce `final impl`

Some Further Crazy Ideas: Ranking

Reflection: How This Fits into Rust Today

The Ugly

Bonus: Lifetimes

Top comments (0)

Why Specialization?

Primer: What's So Unsound About Specialization

What's Going On Here?

What Should Have Been Done?

The Problem

How Could We Fix This?

Specialization with Deferred Binding

The Chain of Delegation

The Problem: Delegation Hell

Introduce final impl

Some Further Crazy Ideas: Ranking

Reflection: How This Fits into Rust Today

The Ugly

Bonus: Lifetimes

Introduce `final impl`