Igor Petruk

Posted on Nov 18, 2021

Async, refactoring and fewer bugs: Rust block expressions to the rescue

#rust #codequality #programming

One of the pretty neat features of Rust I'd like to talk about is block expressions. This subtle feature does not receive enough justice as everyone is focused on more prominent language features. They help sealing unnecessary variables, cleaning up scope and have many other advantages. I’d like to go over a few examples.

First of all, a little intro.

let a = func1();
let b = func2(a);
let c = func3(a, b);

func4(&c);
func5(&c, 10);

Here the code could be divided into two blocks. The first declares multiple variables that are ultimately used to produce c and then c is used later in the code. This pattern is not fully artificial, it can be found in many relatively long functions.

In Rust, blocks that are delimited by {...} are expressions and are evaluated to a value. There is a way to rewrite this code using block expressions.

let c = {
  let a = func1();
  let b = func2(a);
  func3(a, b)
};

func4(&c);
func5(&c, 10);

This code has some subtle differences to the first example. Blocks limit the scope of variables. a and b are internal to the block so they are not visible in the outer scope and drop is applied to them at the block closing line. As simple as that. Using block expressions is a matter of code style that can be applied to suitable code.

When applied, blocks bring some advantages that are not immediately obvious. Let's take a look.

Refactoring

Block expressions offer a good ground in preparation for future refactoring. When a block expression is used you can guarantee that internal variables are not used anywhere else in the outer function. This makes the code in the block ready to be easily turned into a standalone function.

// Make b
let a = func1();
let b = func2(a); // Well, let's imagine it is a lot of code to get here.
// Use b
func3(b);

Yay, let’s move that to a function.

fn compute_b() -> u32 {
   let a = func1();
   func2(a)
}

let b = compute_b();
func3(b);
...

// 100 lines below:
println!("Btw, important to know, a={}", a); /// Compilation error, uff!

Easy to fix probably, but it makes refactoring unpleasant, it does not satisfyingly click. Was it really important to use that a far below? Maybe yes, but often it does not matter and this code is a result of having scope hygiene as an afterthought. Block expressions help us limit the scope to just the right amount.

No boilerplate variables in the top scope

Let's up the game and see a Tokio example.

    let a = String::from("Hello World");

    let a_clone = a.clone();  //  I feel pain each time seeing this.
    let u = tokio::spawn(async move { a_clone.to_uppercase() });
    let l = tokio::spawn(async move { a.to_lowercase() });

    println!("upper={:?}, lower={:?}", u.await, l.await);

a_clone variable is ugly, but we need it. Two closures need to own their own copies of String (using Arc does not fix it), so a_clone is moved to the first closure, and original a ends up in the second closure. Let’s attempt a block expression style:

    let a = String::from("Hello World");

    let u = {
        let a = a.clone();
        tokio::spawn(async move { a.to_uppercase() })
    };
    let l = tokio::spawn(async move { a.to_lowercase() });

    println!("upper={:?}, lower={:?}", u.await, l.await);

This does not look simpler than what we had before at a first glance, but this code has a few benefits. a can remain a and does not need a new name. The outer scope remains clean so you can easily distinguish top variables by the indentation of their let and hide boilerplate variables to the second level of indentation.

No unnecessary mut variables in the top scope

Here is another example. PathBuf. PathBuf::push only works on mutable instances.

    let mut sub_dir = dir.ok_or_else(|| format_err!("Cannot get dir"))?;
    sub_dir.push("sub");

sub_dir remains mut for the rest of the scope and we don't like that in Rust, do we?

    let sub_dir = {
      let mut d = dir.ok_or_else(|| format_err!("Cannot get dir"))?;
      d.push("sub");
      d
   };

The mutability of the variable is confined inside the initialization block.

Fewer bugs

Now let’s use some Tokio channels.

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let (tx, mut rx) = tokio::sync::mpsc::channel(3);

    for sender in 0..2 {
        let tx = tx.clone();
        tokio::spawn(async move {
            for i in 0..5 {
                tx.send(i).await.unwrap();
                println!("Sent {} from sender {}", i, sender);
            }
        });
    }

    while let Some(x) = rx.recv().await {
        println!("Received {}", x);
    }

    Ok(())
}

Let’s check the output.

Sent 0 from sender 0
Sent 0 from sender 1
Sent 1 from sender 0
Received 0
Received 0
Received 1
Sent 2 from sender 0
Sent 3 from sender 0
Sent 1 from sender 1
Received 2
Received 3
Received 1
Sent 2 from sender 1
Sent 3 from sender 1
Sent 4 from sender 0
Received 2
Received 4
Received 3
Sent 4 from sender 1
Received 4

Looks correct… Nope, I’ve tricked you here. The text output is correct but the program does not exit! Can you spot the issue?

tx is being cloned in the loop, so each async co-routine has its own channel Sender. The problem is that the original tx remains existing until the end of the main function, but listening on rx is expected to only finish when all tx are dropped.

Indeed, drop fixes the issue and the program successfully terminates.

    drop(tx);
    while let Some(x) = rx.recv().await {

Yuck, this is like calling free() from C, otherwise it leaks. In my Rust. The Earl of Lemongrab screams “Unacceptable!”.

Since this article is about block expressions (a.k.a. “a hammer”), every problem is a nail. Let’s try. Thankfully block expressions are about things not leaking in scope further than needed.

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let mut rx = {
        let (tx, rx) = tokio::sync::mpsc::channel(3);
        for sender in 0..2 {
            let tx = tx.clone();
            tokio::spawn(async move {
                for i in 0..5 {
                    tx.send(i).await.unwrap();
                    println!("Sent {} from sender {}", i, sender);
                }
            });
        }
        rx
    };

    while let Some(x) = rx.recv().await {
        println!("Received {}", x);
    }

    Ok(())
}

It works and terminates! It does not need a drop call. Wait, but we have been promised that refactoring is easy with block expressions, let’s try that.

async fn spawn_senders() -> Receiver<u32> {
    let (tx, rx) = tokio::sync::mpsc::channel(3);
    for sender in 0..2 {
        let tx = tx.clone();
        tokio::spawn(async move {
            for i in 0..5 {
                tx.send(i).await.unwrap();
                println!("Sent {} from sender {}", i, sender);
            }
        });
    }
    rx
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let mut rx = spawn_senders().await;
    while let Some(x) = rx.recv().await {
        println!("Received {}", x);
    }
    Ok(())
}

Yes, it was. The block content is unchanged. We prepared for potential refactoring ahead of time and avoided a leak.

Performance

Longevity of objects can impact performance. I will show the most prominent example: lock guards.

Let’s say we need to process data from two RwLocks.

async fn slowly_process(a: i32, b: i32) -> i32 {
    tokio::time::sleep(Duration::from_millis(1000)).await;
    a + b
}

async fn process_data_from_two_locks(a: Arc<RwLock<i32>>, b: Arc<RwLock<i32>>) -> i32 {
    let a = a.read().await;
    let b = b.read().await;
    slowly_process(*a, *b).await
}

#[tokio::main]
async fn main() -> Result<(), Box<dyn Error>> {
    let a = Arc::new(RwLock::new(1));
    let b = Arc::new(RwLock::new(2));

    let writer = {
        let a = a.clone();
        tokio::spawn(async move {
            let start = Instant::now();
            tokio::time::sleep(Duration::from_millis(100)).await;
            // A bit late to be first to the lock...
            let mut a = a.write().await;
            *a = 10;
            println!(
                "Writing took 100ms! Wait... It took: {:?}",
                Instant::now() - start
            );
        })
    };

    let result = process_data_from_two_locks(a, b).await;
    println!("Result: {}", result);
    writer.await.unwrap();
    Ok(())
}

We run our program and it prints

Result: 3
Writing took 100ms! Wait... It took: 1.000505585s

Our writer was delayed a bit, processing took over and apparently locks were held for 1 second. The issue is caused by the fact that slowly_process() runs with both read locks held. The read locks are implicit scope guarded locks and they are only dropped at the end of the function, when the references go out of scope.

async fn process_data_from_two_locks(a: Arc<RwLock<i32>>, b: Arc<RwLock<i32>>) -> i32 {
    let a = a.read().await;
    let b = b.read().await;
    slow(*a, *b).await
}

This is a relatively well known pitfall with scope guarded locks, whether it is defer from Go or std::lock_guard from C++. If scope is used to lock and unlock the data, that scope must be minimal.

I am not going to say “Let’s fix it with Rust block expressions”. Instead I will say “If we used blocked expressions from the beginning, this would not have happened”. Or simply “I told you so”.

async fn process_data_from_two_locks(a: Arc<RwLock<i32>>, b: Arc<RwLock<i32>>) -> i32 {
    let a = { *a.read().await };
    let b = { *b.read().await };
    slow(a, b).await
}

As a result:

Writing took 100ms! Wait... It took: 101.377406ms

This example has a shortcut. It was smooth because i32 is a Copy type. Read locks in general only allow you to borrow the data inside while you hold the lock. To release it earlier you need to copy the data you need out of the block. For example:

let field = { a.read().await.field.clone() };

Trade-off is yours to consider.

Conclusion

I’ve shown benefits that such a shy Rust feature as block expressions can bring to your code. It should help you to keep your scope clean and can positively impact your programs at runtime.

Goes without saying, every tool must be used sparingly. The cost of block expressions is the depth of indentation and if overused it can make your programs unreadable. Let’s apply our best judgment.

I hope this was helpful. This is my first shot at writing articles at dev.to. I hope to keep this up.

Thanks,
Igor.

Get n8n VPS hosting 3x cheaper than a cloud solution

Get fast, easy, secure n8n VPS hosting from $4.99/mo at Hostinger. Automate any workflow using a pre-installed n8n application and no-code customization.

Start now

Top comments (3)

Caleb Sander • Nov 28 '21

Nice discussion! I would point out that you can also stop a variable from being mut by shadowing its binding:

// `values` is mutable so that we can add to it
let mut values = vec![];
for i in 1..10 {
  values.extend(vec![i; i]);
}
let values = values;
// `values` is no longer mutable

I agree that using a block seems more elegant, but like using drop(), this avoids a level of indentation.

Zihan Liu • Nov 27 '21

Will let a = *a.read().await; do?

Igor Petruk • Nov 27 '21

I'd say so, yes. Probably block expression helps more if you need to do a few operations on the unlocked value while it is still borrowed. Or if you would like to combine multiple unlocks, but you want to limit the scope for no longer than necessary.

DEV Community