loading...
Cover image for Using threads on Rust (part 3)

Using threads on Rust (part 3)

dandyvica profile image Dan DiVica ・3 min read

If you followed my previous articles on using Rust native threads, you know how to parallelize a function applied to a vector of elements.
To illustrate this, a good example is the factorial computation: the product of the first n integers. This kind of computation is a perfect target to be run
on several threads: the whole process can be chopped into different pieces, and the whole product is merely the product of partial products.

To be meaningful, the factorial of such a number should be large enough and beyond u128 capabilities. As Rust doesn't get a built-in BigInteger class as in Java, I used the num crate which provides the BigUint struct. Beware this is probably not the most optimal one (compared to GMP for example).

Obviously, using a vector to calculate the factorial of an integer number is not the most efficient way. This is done only to illustrate my example.

Just import the num crate to use the BigUint type:

extern crate num;
use num::bigint::BigUint;

First, get the command line arguments:

// get arguments
let upper_bound = match args[1].parse::<u32>() {
    Ok(n) => n,
    Err(e) => panic!("error {} converting {} to an integer !", e, &args[1]),
};
let nb_threads = match args[2].parse::<u32>() {
    Ok(n) => n,
    Err(e) => panic!("error {} converting {} to an integer !", e, &args[2]),
};

To get the factorial, we first need to populate a vector of the first n BigUint:

// fill-in vector
let v: Vec<BigUint> = (1..=upper_bound).map(|i| BigUint::from(i)).collect();

Now, the mono-threaded computation is very easy, using the Product trait:

// get time for the mono-threaded product
let mut start = Instant::now();
let mono_fact = v.iter().product::<BigUint>();
let duration_mono = start.elapsed();

Computing the partial sums and the final product is the same:

// get time for multi-threaded computation
for num_thread in 2..=nb_threads {
    start = Instant::now();
    let partial_fact = v.parallel_task(num_thread as usize, prod_fn);
    let multi_fact = partial_fact.iter().product::<BigUint>();
    let duration_multi = start.elapsed();

    // validity check: check if products are equal
    assert_eq!(mono_fact, multi_fact);

    println!(
        "n={}, #threads={}, mono_threaded={:?}, {}_threaded={:?}, ratio={:.6}",
        upper_bound,
        num_thread,
        duration_mono,
        num_thread,
        duration_multi,
        duration_multi.as_nanos() as f64 / duration_mono.as_nanos() as f64
    );
}

The assert_eq! line is just here to make sure computations are equal. This is a sane safeguard against errors !

The prod_fn function is the same as in my previous articles:

// product of elements
fn prod_fn<'a, T: Product<&'a T>>(chunk: &'a [T]) -> T {
    chunk.into_iter().product::<T>()
}

Now it's possible to compare elapsed time for mono-threaded and multi-threaded computations. To be meaningful, I just created a 16-cores Amazon AWS instance, compiled the whole code in release mode, and ran it on my instance. I ran the process for n=20k, 50k, 75k, 100k, 150k and 200k.

Following are the results:

The optimal time is found between 6 and 8 cores. Several factors can explain this outcome:

  • the AWS instance processor is an Intel(R) Xeon(R) Platinum 8124M CPU @ 3.00GHz processor with 16 cores, but with the hyperthreading feature, not real cores. It seems the CPU has 16 cores but physically it's only 8 physical cores, and probably the optimal figure for computation
  • CPU cache mechanism
  • the more threads, the more the final product of products is taking time

Hope you appreciate !

Photo by Gregory Culmer on Unsplash

Posted on by:

dandyvica profile

Dan DiVica

@dandyvica

Senior software engineer & sysadmin, technical architect, Linux expert. Always willing to learn new stuff.

Discussion

markdown guide