Specialization is a long-awaited feature in Rust that has never been stabilized. There have been many attempts to land this feature, but each comes with its own set of shortcomings and soundness holes.
In this blog post, I want to share an idea for impl specialization that has been floating around in my mind. We will walk through the trade-offs and try to explain exactly why specialization is such a challenging feature to add to the language.
Why Specialization?
One of the most compelling use cases for specialization is optimization. A concrete example is cloning a Vec<T>.
In general, cloning a Vec<T> works by iterating through each element and calling .clone() on it.
With specialization, we could write an extra, specialized case where T: Copy. When the compiler knows that T implements Copy, the Vec<T> implementation can optimize the cloning process by skipping the loop and using memcpy to bulk-copy the underlying memory buffer.
Primer: The Current Nature of Trait Resolution
Currently, the Rust trait solver resolves impls eagerly whenever possible. This enables the following code to compile:
pub trait Fizz { fn fizz(); }
// impl #1
impl<T> Fizz for Vec<T> { ... }
pub fn use_fizz<U>() {
<Vec<U> as Fizz>::fizz()
}
Notice that the usage of <Vec<U> as Fizz>::fizz() doesn't cause a compilation error. This is because impl #1 (impl<T> Fizz for Vec<T>) is visible. The type system eagerly binds impl #1 to <Vec<U> as Fizz>.
The solver chooses to go with impl #1 even though, at the declaration of use_fizz, the compiler doesn't even know what U is.
You can imagine that behind the scenes, the solver immediately replaces the call <Vec<U> as Fizz>::fizz() with <impl #1>::fizz().
The code might look this behind the scenes:
pub fn use_fizz<U>() {
// Made up syntax: the solver **commits** to impl #1 immediately
<impl #1>::fizz();
}
However, if we try to call <U as Fizz>::fizz() directly inside use_fizz, we will definitely see a compilation error since there is no matching impl visible.
pub trait Fizz { fn fizz(); }
impl<T> Fizz for Vec<T> { ... }
pub fn use_fizz<U>() {
// Error! No matching impl found for generic U
<U as Fizz>::fizz()
}
The impl<T> Fizz for Vec<T> wouldn't work for U because U might be instantiated with i32, and Vec<T> and i32 would never unify.
The fix is to add a where clause: fn use_fizz<U>() where U: Fizz.
pub fn use_fizz<U>() where U: Fizz {
// Fine now
<U as Fizz>::fizz()
}
Unlike the vector example, here there is no specific impl to bind to. However, the code compiles because the bound where U: Fizz acts as a premise, saying: "We don't know the specific impl yet, but the caller promises to provide one."
Conceptually, the usage of <U as Fizz> gets replaced by a placeholder:
pub fn use_fizz<U>() where U: Fizz {
// Don't worry about the syntax, it's a made up one :D
// It represents a slot for the `impl` that the caller will provide
<where U: Fizz>::fizz();
}
Next, let's see what happens when use_fizz is called by another function.
pub trait Fizz { fn fizz(); }
impl<T> Fizz for Vec<T> { ... }
pub fn use_fizz<U>() where U: Fizz {
<where U: Fizz>::fizz();
}
struct Buzz;
fn main() {
use_fizz::<Buzz>();
}
The call to use_fizz here causes a compilation error because of the where U: Fizz bound.
Remember that <where U: Fizz>::fizz() in use_fizz says, "For now, we'll assume there is an impl; the caller will find it later." Now is the time to find that impl, and the compiler fails to find one for Buzz.
This is why every time you call a function with a trait bound like T: Clone, you have to satisfy it by either finding a concrete impl or adding where T: Clone to your own signature.
The act of adding where T: Clone allows us to postpone the impl resolution, creating a deferral chain:
fn deepest<T>(t: &T) -> T
where T: Clone // defer finding the `impl`
{
<where T: Clone>::clone(t);
}
fn deeper<T>(t: &T) -> T
where T: Clone // again, defer finding the `impl`
{ deepest(t) }
fn deep<T>(t: &T) -> T
where T: Clone // defer it again...
{ deeper(t) }
fn main() {
let vec = vec![0];
// Bingo! There's a `Clone` impl for `Vec<i32>`!
// We no longer defer; we resolve it here.
deeper(&vec);
}
Why Eager impl Binding is Problematic for Specialization
Now let's understand why the current method of "eagerly binding the impl" makes it challenging to add specialization.
Let's say we blindingly allow impl specialization and allow them to overlap. Consider this example:
pub trait Fizz {
fn fizz();
}
// impl #1
// The general rule: Vec of anything
impl<T> Fizz for Vec<T> {
fn fizz() { println!("from general vec"); }
}
// impl #2
// The specialized rule: Vec of boxed items
impl<T> Fizz for Vec<Box<T>> {
fn fizz() { println!("from boxed vec"); }
}
fn use_fizz<U>() {
<Vec<U> as Fizz>::fizz();
}
When the compiler analyzes the body of use_fizz, following the standard logic, it tries its best to find a matching impl immediately.
It considers the two impls:
- Does
impl #1(Vec<T>) work? Yes, it perfectly matchesVec<U>insideuse_fizz. - How about
impl #2(Vec<Box<T>>)? No.Umight bei32, in which caseVec<Box<T>>andVec<i32>would never unify.
So, the compiler eagerly commits to impl #1.
fn use_fizz<U>() {
// The compiler hardcodes this to impl #1
<impl #1>::fizz();
}
Here's the twist: what if we call use_fizz in main with Box<i32>?
fn main() {
// Pass U = Box<i32>
// This creates a <Vec<Box<i32>> as Fizz>::fizz() call inside
use_fizz::<Box<i32>>();
}
Now we hit a dilemma:
- Expectation:
use_fizzshould print "from boxed vec", because we are using a vector of boxes. - Reality:
use_fizzhas already committed toimpl #1at definition time, so it prints "from general vec".
This behavior is called Incoherence. The same types result in different behaviors depending on where they are used. This is the primary reason why current eager binding prevents specialization.
Rust 2.0: What If?
Okay, what if... what if Rust refused to solve for the impl eagerly whenever generic parameters are involved?
In the previous example, use_fizz<U> caused a problem because the compiler tried to lock into a specific impl without knowing what U would be.
- If
Ubecomesi32,impl #1is correct. - If
UbecomesBox<i32>,impl #2is correct.
Binding to any particular impl while the type is still unknown leads to incoherence.
However, if we only solve for the impl when the type is fully known (grounded), we can ensure that the type will not change structure in the future.
Specialization with Deferred Binding
We introduce a new rule: The compiler is only allowed to resolve an impl when the types are "Grounded".
The term "grounded" refers to a property of a particular type. A type is said to be grounded if there aren't any generic parameters in it.
-
Vec<i32>is Grounded (Fixed structure). -
Vec<T>is Not Grounded (Tcan be anything).
This property is important because if a type is not grounded (like Vec<U>), its structure is volatile, it could change into a specialized form in the future.
After applying this new rule, the previous unsound code example would produce an error:
fn use_fizz<U>() {
// ERROR: Refuse to find `impl` for `Vec<U>` at this moment.
// Even though `impl #1` matches, a better one might exist later.
<Vec<U> as Fizz>::fizz();
}
So what do we do? Easy! We add a trait bound where Vec<U>: Fizz to use_fizz, deferring the resolution to the caller.
fn use_fizz<U>() where Vec<U>: Fizz {
// Now we use whatever `impl` the caller supplies
<where Vec<U>: Fizz>::fizz();
}
Conceptually, use_fizz is now saying: "Dear caller, please find the impl for Vec<U> for me."
Now, what happens when main calls use_fizz?
fn main() {
// We supply U = Box<i32>
use_fizz::<Box<i32>>();
}
Here is the climax:
-
maincallsuse_fizz. -
use_fizzrequiresmainto find theimplforVec<Box<i32>>. -
mainsees thatVec<Box<i32>>is a grounded type. -
mainlooks for the best match and findsimpl #2(Vec<Box<T>>). -
mainprovidesimpl #2touse_fizz.
Hooray! With the help of deferred resolution, specialization works correctly!
The Chain of Delegation
In the previous example, it was a simple main -> use_fizz call. What if there is an intermediate call?
fn intermediate<V>() {
// Error! `use_fizz` asks for the `impl` for `Vec<V>`
// but `Vec<V>` is not grounded here either!
use_fizz::<V>();
}
fn main() {
intermediate::<Box<i32>>();
}
According to the rule, intermediate<V> cannot resolve the impl because Vec<V> isn't grounded. The solution is, again, adding a where Vec<V>: Fizz bound to intermediate<V> to further defer the resolution down the stack.
fn intermediate<V>() where Vec<V>: Fizz // Defer the resolution!
{
use_fizz::<V>();
}
Conceptually, intermediate<V> promises use_fizz<V> that "The impl for Vec<V> will be provided to you, but it will come from whoever calls me."
The Problem: Delegation Hell
With the Groundedness rule, we achieve sound specialization. However, it creates headaches in generic-heavy codebases.
Consider the standard library type Arc<T>. In Rust today, Arc<T> implements Clone regardless of what T is. Cloning an Arc is just incrementing an atomic counter; it doesn't require anything from T.
// crate: std
pub struct Arc<T> { ... }
// Here, NO bounds on T!
impl<T> Clone for Arc<T> {
// Just increment atomic counter
}
// crate: app
fn clone_some_arcs<T>(arc: &Arc<T>) {
let cloned = arc.clone();
}
In Rust today, this is perfectly fine code. However, with the new Groundedness rule, this code would not compile.
Why? Because Arc<T> inside clone_some_arcs<T> is not a grounded type. The compiler refuses to resolve the impl.
Even though we know that Arc's cloning logic is universal and won't change, the compiler is forced to be paranoid. It thinks: "I cannot bind to the standard implementation yet. What if, later down the line, there's a specialized implementation for Arc<i32>?"
Under the rule, we have no choice but to add where Arc<T>: Clone to the function. In a generic codebase, this pollution spreads everywhere:
fn this_is_such_a_pain<T, U>(a: &Arc<T>) {
// 🤓 Compiler: "Nuh uh! You can't do that ..."
let a = a.clone();
// 🤓 Compiler: "Nuh uh! Actually there might be a specialized Option..."
let b = Option::<U>::default();
}
To fix this, we have to state the obvious in the bounds:
fn this_is_such_a_pain<T, U>(a: &Arc<T>)
where
Arc<T>: Clone, // Isn't that obvious?
Option<U>: Default, // Really?
SomeInternalStruct<T, U>: Clone, // Now I'm leaking internal details, ugh!
{ ... }
Introduce final impl
We need a way to tell the compiler: "Relax, this impl is the FINAL and DEFINITIVE one. There will never be a more specialized version. Pick this one immediately."
Introducing the final impl.
The final impl is a way to mitigate the "paranoid compiler." It has two properties:
- The Power: The compiler is allowed to bind this
impleagerly, even if the types are not grounded. This is exactly how Rust behaves today. - The Restriction: A
final implcannot be overridden. No one is allowed to write a more specific implementation that overlaps with it.
This solves the ergonomic issues. By declaring final impl, the compiler knows specialization is impossible, so eager binding is safe.
// crate: std
pub struct Arc<T> { ... }
// impl #1
// This `impl` can't be overridden!
final impl<T> Clone for Arc<T> {
// ...
}
// crate: app
fn clone_some_arcs<T>(arc: &Arc<T>) {
// Compiler is safe to bind to `impl #1` eagerly!
// 1. Compiler sees `Arc<T>` is not grounded.
// 2. But, it sees `final impl`.
// 3. It knows this will never change.
let cloned = arc.clone();
}
This creates two clear camps:
- Regular
impl: Specializable, but requires Grounded checks (deferred resolution). -
final impl: Rigid, but allows eager resolution, improving ergonomics for structural types.
Some Further Crazy Ideas: Ranking
I want to throw out another wild idea to solve the final challenge of trait resolution: Ambiguity.
Consider a trait Log that is implemented for types that are Display, but falls back to Debug if Display isn't available.
// Impl #1: The 'preferred' one
impl<T: Display> Log for T {
fn log(&self) { println!("{}", self); }
}
// Impl #2: The 'fallback' one
impl<T: Debug> Log for T {
fn log(&self) { println!("{:?}", self); }
}
If a type implements both Display and Debug, the compiler has to guess. Since neither type is "more specialized" than the other, this is an ambiguity error.
What if we could assign an explicit ranking (where lower is higher priority) to the implementations?
// Rank 1 (High Priority)
#[rank(1)]
impl<T: Display> Log for T { ... }
// Rank 2 (Low Priority)
#[rank(2)]
impl<T: Debug> Log for T { ... }
This would only be allowed for regular impls, acting as a tie-breaker when type specificity is equal.
The revised resolution flow looks something like this:
def pick_regular_impl(available_impls, type, env):
result_impl = None
for candidate in available_impls:
if not candidate.unify_with(type):
continue
if not candidate.is_bound_satisfied(env):
continue
# First valid match found? Keep it.
if result_impl is None:
result_impl = candidate
continue
# Compare specificity first
order = candidate.order(result_impl)
if order == Order.MoreSpecialized:
result_impl = candidate
# If equally specialized, use rank as a tie-breaker
elif order == Order.Equal:
if candidate.rank < result_impl.rank:
result_impl = candidate
elif candidate.rank == result_impl.rank:
raise AmbiguousImplError()
return result_impl
That's it! Is this sound? I'm not sure 🤣. But it looks promising.
Reflection: How This Fits into Rust Today
All of the ideas discussed here would be breaking changes if applied to the language defaults.
However, the concept might not be that "alien" to the current state of Rust. In Rust today, trait implementation behaves exactly like the final impl proposed in this post (eager, non-overlapping).
We could imagine a path where we add a new keyword:
- Keep
implas is: eager binding, not specializable. - Add
lazy impl: deferred binding until groundedness is confirmed, allows specialization.
However, it is still difficult to retrofit this feature into the standard library.
For example, if we were to enable Vec<T> cloning specialization, we would change the current impl to a lazy impl:
lazy impl<T: Clone> Clone for Vec<T> {
// general cloning
}
lazy impl Clone for Vec<u8> {
// optimized memcpy version
}
Any time we call clone on a generic vector, we would have to add an explicit bound like where Vec<T>: Clone instead of just where T: Clone.
What's the catch?
In Rust today, the following code is valid because there's an implementation impl<T: Clone> Clone for Vec<T>.
fn clone_vec<T>(a: &Vec<T>)
where T: Clone
{
let cloned = a.clone();
}
Under our new rules, because the implementation is no longer allowed to bind eagerly, the compiler isn't allowed to use the lazy impl for Vec<T> because Vec<T> is not grounded.
To fix this, we would have to rewrite the function signature:
fn clone_vec<T>(a: &Vec<T>)
where Vec<T>: Clone // CHANGED! Bound is on the container, not the content.
{
let cloned = a.clone();
}
This means that simply enabling specialization for Vec would break every single function in the Rust ecosystem that clones a generic vector!
Bonus: Lifetimes
In this section I want to talk a bit about lifetimes and how I imagine whole system interacts with each other.
Generally, in Rust compiler, the lifetimes are only relevant during borrow-checking, after borrow-checking, the compiler erases all the lifetimes, the compiler sees &'a i32 and &'static i32 as the exact same thing: a pointer to an integer. This means that, up to the monomorphization stage, the compiler doesn't see any lifetimes in the function body.
Therefore, to make sure that during the monomorphization phase the compiler can still correct resolve the correct impl while having none knowledges related to lifetimes. We must ensure that the impl selection can't depend on the lifetime.
First, we treat lifetimes as irrelevant to type specificity and groundedness.
trait Test {}
// I want this impl to be the general case
lazy impl<'a, T> Test for &'a T {}
// I want this impl to be the specialized case
lazy impl<'a, T> Test for &'static T {}
In this code example, we would treat them as ambiguous implementation. Both &'a T and &'static T are considered to have the same specificity.
Here's an another example:
trait Fizz { fn fizz(self); }
lazy impl<T: 'static> Fizz for &'static T { ... }
pub fn use_test<'a>(a: &'a i32) {
// immediately resolves to the above impl with additional
// `'a: 'static` bound. `'a` isn't considered in groundedness checking
a.fizz();
}
Second: this system always consider different lifetimes equal.
Consider following this example:
trait Fizz { fn fizz(self); }
// the general impl
lazy impl<T> Fizz for T { fn fizz(self) { println("general"); }
// the specialized impl
lazy impl<T> Fizz for (T, T) { fn fizz(self) { println("specialized"); }
fn use_fizz_weirdly<'a>(param: (&'a i32, &'static i32)) {
// what should this be resolved to?
param.fizz();
}
Under this system, I propose that it should be the specialized implementation even though the 'a and 'static lifetimes are different.
Surprisingly we treat &'a i32 and &'static i32 as equal! but with a twist: it generates further constraints.
In this case, it would consider to shrink the 'static lifetime instead because it's covariant.
In conclusion, these restrictions make the specialization related to lifetimes entirely impossible, however, this make the system sound in presence of lifetimes.
Top comments (0)