jmaargh

Posted on Oct 16, 2023

An alternative Any type?

#rust #programming

Rust's Any type is pretty cool. You can use it to do runtime type reflection, or downcasting, or dynamic typing, or other fun things. However, there are a couple of slightly annoying things about it:

TypeId is currently 128 bits. This is because it's some hash of the concrete type, so needs to be long enough to reasonably avoid hash collisions.
Getting TypeId from &dyn Any requires two dereferences: first you follow the vtable pointer to find the pointer to Any::type_id(), then you call that function.

In the vast majority of cases this is totally fine (which is why the excellent libs team implemented it this way). You're unlikely to be bottlenecked on either of these. But neither is ideal: u128 operations can be pretty slow on older or embedded chips and nobody likes more indirections than are necessary.

It occurs to me that both can be circumvented, if you're willing to give up one thing: stability of TypeId values. That is, if you don't need to assume that TypeIds are the same between different binaries. This seems to be a fairly small thing to give up in most cases. How often are people serialising TypeIds? Doing so is already a bad idea as they're not guaranteed to be stable between Rust compiler releases.

The idea is to simply store the type ID directly in the vtable and have the compiler guarantee that, in the context of the current build, the ID is unique. No second indirection, no IDs longer than necessary.

Doing this "properly" would require some compiler hacking. But I did come up with a way it can be hacked around: I call it PointerAny and TypePointer. The trick is to use a pointer to a method of the PointerAny trait as the type ID itself.

Let me explain. First, we define the trait

pub trait PointerAny: 'static {
    fn type_ptr(&self) -> TypePointer;
}

This is exactly like core::any::Any, no surprises here.

We also need a TypePointer instead of TypeId. This will be the address of a function pointer (as discussed above), so let's do that:

#[derive(PartialEq)]
pub struct TypePointer(usize);

For the sake of simplicity I'll just use a usize here. Really you'd want NonZeroUsize or something.

Getting this TypePointer statically is easy, we just take the address of the function pointer that's stored in the vtable:

impl TypePointer {
    fn of<T: PointerAny + ?Sized>() -> Self {
        Self(<T as PointerAny>::type_ptr as _)
    }
}

But this isn't enough to be useful yet. We need a way of getting TypePointer from a &dyn PointerAny. In principle, I feel like there should be a good way of getting the compiler to tell us the address we're looking for. After all, the compiler knows how to call this function, so it therefore knows how to find its address. Unfortunately I don't know how to get the compiler to tell us that address, so instead I'm leaning on some very ugly unsafe code:

impl TypePointer {
    fn from(object: &dyn PointerAny) -> Self {
        let pointer = unsafe {
            let (_data, vtable): (*const (), *const usize) = core::mem::transmute(object);
            // vtable consists of:
            // - drop pointer
            // - size
            // - alignment
            // - method pointers
            // In that order. So this gets us pointing to the first method.
            let method_pointer = vtable.add(3);
            // We want the pointer for this first method
            *method_pointer
        };
        Self(pointer)
    }
}

This requires a little explanation. A wide-pointer like &dyn PointerAny consists of a pointer to the type's data, followed by a pointer to the vtable. That's what the transmute call is unpacking here.

Rust, unfortunately for us, doesn't guarantee any particular layout for vtables. However, from what I can gather the current implementation is as outlined in the comment. First there's a function pointer to the drop implementation, then there are usizes for both the size of the type and its alignment, then there are points to each method. Since we only have one method on PointerAny, that pointer should be an offset of 3-usizes from the base pointer. Which is what we take.

Now you may have noticed that we haven't actually implemented PointerAny yet. That's because we don't ever actually want to call the PointerAny::type_ptr method: we just want the compiler to give it a unique address per-type. Therefore, its implementation is the least important part of this puzzle (but still essential, as we need the compiler to actually generate it and its address). So we can just implement it in the obvious way:

impl<T: 'static + ?Sized> PointerAny for T {
    /// Be careful! If you have a `&dyn PointerAny`, then prefer calling
    /// `TypePointer::from` over this to avoid the extra indirection.
    fn type_ptr(&self) -> TypePointer {
        TypePointer::of::<T>()
    }
}

Note, if you call this function from a &dyn PointerAny then you lose the benefit of avoiding the indirection: prefer calling TypePointer::from or TypePointer::of directly.

It's also interesting that PointerAny::type_ptr is far nicer than TypeId::from, despite doing the same thing, because at this point we already know the concrete type so can just get the function pointer directly.

And that's it! We can now dynamically type-check just as with core::any::Any!

pub fn is_same_type(first: &dyn PointerAny, second: &dyn PointerAny) -> bool {
    TypePointer::from(first) == TypePointer::from(second)
}

pub fn is_type<T: PointerAny>(object: &dyn PointerAny) -> bool {
    TypePointer::from(object) == TypePointer::of::<T>()
}

Full code on playground.

So we've successfully addressed the two "shortcomings" discussed above:

Our new TypePointer is only a usize, which is ideal for almost every architecture.
We only do one pointer dereference in TypePointer::from.
We've also gained TypePointer being non-zero, which allows niche optimisations for Option etc. (if we'd used NonNullUsize)

On top of that we still have:

TypePointer::of is still a compile-time constant (no indirection)
In principle this could all be done in a compile-time const fn-compatible way (though you'd want to be really careful about the const fn use of pointers - perhaps this isn't possible yet).

So what are the tradeoffs? What have we lost?

Stability of TypePointer values: if you recompile your program, even with the same compiler, these may change. Don't ever serialize these TypePointers: they're just pointers after all.
Stability of implementation. I had to write some very ugly unsafe code to get this to work, because I couldn't fine a stable way to get the compiler to tell me the address of a vtable method from a wide pointer. In principle this needn't be so ugly, but I just could not find a way of doing it without assuming the structure of the vtable.
Correctness? The current implementation assumes that the compiler will generate exactly one version of PointerAny::type_ptr for any given type (when needed). That is, there is a one-to-one correspondence between addresses of PointerAny::type_ptr and types themselves. I'm not 100% sure this is a guarantee, but I've assumed it's true. It's known that Rust can generate multiple vtables for the same types - otherwise we could just use the vtable address itself and have zero indirections - but I've assumed that the pointers contained are stable.

It's also interesting that we could have implemented TypePoitner over core::any::Any rather than defining a new Any type. The only assumptions we need are that (a) the trait is implemented for every 'static type, (b) there are unique addresses for at least one method per type, and (c) we know how to find that address from a wide pointer.

I'd love to hear what people think of this. There are probably some things here that are wrong (well, even more wrong than the TypePointer::from implementation), so let me know!

Discuss on reddit

DEV Community

An alternative Any type?

Top comments (0)

Read next

Migrating from Azure Database for PostgreSQL to Neon

The Rise of AI-Driven Web Development

Part 12: Building Your Own AI - Model Evaluation and Tuning for Optimal Performance

Distributed computing made easy