Rust dynamic dispatch optimization

#rust

Programming in JS: Let's just just check every element until we find what user asked for.
Programming in Rust: Hmm, do I really need this extra byte in by struct?

So, I spent some time trying to understand how much actual time dynamic dispatch costs me.

I have a service which should choose between different trait implementations depending on input. Means that there are 2 options:

struct BackendB<'a, T: MyProcessor> {
    data: &'a mut T
    // But we will struggle instantiating this
}

struct BackendA<'a> {
    data: Box<&'a mut dyn MyProcessor>
}

Backend should internally call some MyProcessor methods, so I will be sad if these calls are slow. To benchmarks!

impl<'a> BackendA<'a> {
    #[inline(never)] // This is very important, or
    // compiler will work too good and replace calls with constants
    fn inc(&mut self, how_much: usize) {
        // This simulates internal calls
        for _i in 0..how_much {
            self.data.inc(); // vtable resolve should happen here
        }
    }
}

impl<'a, T: MyProcessor> BackendB<'a, T> {
    #[inline(never)]
    fn inc(&mut self, how_much: usize) {
        for _i in 0..how_much {
            self.data.inc();
        }
    }
}

And here is benchmark entry points:

fn criterion_benchmark(c: &mut Criterion) {
  c.bench_function("Static dispatch", |b| b.iter(|| { 
    let mut d = ImplMyProcessor {data: 0};
    let mut b = BackendB::<ImplMyProcessor> {data: &mut d};
    b.inc(black_box(500_000));
  }));

  c.bench_function("Dynamic dispatch", |b| b.iter(|| { 
    let mut d = ImplMyProcessor {data: 0};
    let mut b = BackendA {data: Box::new(&mut d)};
    b.inc(black_box(500_000));
  }));
}

But it gives almost same timings:

time:   [1.1774 ms 1.1916 ms 1.2068 ms]
time:   [1.2390 ms 1.2625 ms 1.2908 ms]

Ok, let's check what happens if vtable is actually involved:

fn criterion_benchmark(c: &mut Criterion) {
  c.bench_function("Static dispatch", |b| b.iter(|| { 
    let mut d = ImplMyProcessor {data: 0};
    let mut b = BackendB::<ImplMyProcessor> {data: &mut d};
    for _i in 0..500_000 {
        b.inc(black_box(1));
    }
  }));

  c.bench_function("Dynamic dispatch", |b| b.iter(|| { 
    let mut d = ImplMyProcessor {data: 0};
    let mut b = BackendA {data: Box::new(&mut d)};
    for _i in 0..500_000 {
        b.inc(black_box(1));
    }
  }));
}

Now we see the difference:

time:   [1.6077 ms 1.6248 ms 1.6431 ms]
time:   [2.6543 ms 2.7012 ms 2.7553 ms]

So the test is valid, we can actually see the regress of dynamic dispatch in Rust. But the good news is - it happens only ones per function call (optimizer does it's job!). And in my case - ones per API call, and I can stop worrying.

DEV Community

Rust dynamic dispatch optimization

Top comments (0)

Read next

RustyChat

Rewind AI + Cursor AI = screenpipe: how we built a high performance Rust frame streaming API (OSS)

Why 🦀 Rust? - It is good for both Microsoft and Linux!

Rust on a $2 dev board