Thread safety and Learning in Rust

#rust #backend #programming

Programming interviews can be a great learning experience for interviewers as well as interviewees. In fact, I recently had the opportunity to learn something new about Rust while interviewing an engineering candidate.

Background About our Interview Process

At OneSignal, we don't believe in handing someone a problem and watching them flounder with zero autocomplete and zero docs. Engineers who apply to join our team are allowed to use their own environments, access documentation, and treat the interviewer as a resource. If a candidate can solve a problem with some gentle assistance from the interviewer or by asking a few questions, then they're much closer to the real development experience. Senior engineers on our teams frequently act as resources for other developers, and if a candidate can get to a solution by treating the interviewer like a colleague then they're halfway home. Candidates often learn something new from the interviewer, and we've had great feedback on our process.

The Learning

The candidate was working on a problem involving concurrency and needed to share the same std::sync::mpsc::Sender with multiple threads concurrently. They wrote code analogous to the following (extremely simplified) example:

use std::sync::{RwLock, Arc, mpsc::channel};

pub fn main() {
    let (tx, rx) = channel();

    let x = Arc::new(RwLock::new(tx));

    std::thread::spawn(move || {
       x.read().unwrap().send(4u8).unwrap(); 
    });

    dbg!(rx.recv().unwrap());
}

Code that attempts to share a Sender<u8> across threads using an RwLock

Now you may initially be thinking that it's just wrong to put a channel sender inside of an RwLock — and you'd be right. Both of these types are useful for cross-thread synchronization, and it's odd to use both of them together. Each Sender is intended to be owned by a single thread, and if you need to coordinate between multiple threads you should be calling Sender::clone(). However, I urge you to recall that this was taking place during an interview, and concurrency was not the only thing going on in this program. The candidate had made other architectural decisions that led them here, and this small example has removed all of that context. Rust is a complex language with many types of synchronization primitives available, and during the time constraints of an interview, it's not unreasonable for someone to combine them in unusual ways.

The Rust compiler had a much stricter and less egalitarian view on this combination of types, however. When we tried to compile it, we were greeted with this error:

error[E0277]: `Sender<u8>` cannot be shared between threads safely
   --> src/main.rs:8:5
    |
8 | std::thread::spawn(move || {
    | ^^^^^^^^^^^^^^^^^^ `Sender<u8>` cannot be shared between threads safely
    |
    = help: the trait `Sync` is not implemented for `Sender<u8>`
    = note: required because of the requirements on the impl of `Sync` for `RwLock<Sender<u8>>`
    = note: required because of the requirements on the impl of `Send` for `Arc<RwLock<Sender<u8>>>`
    = note: required because it appears within the type `[closure@src/main.rs:8:24: 10:6]`

Compiler error that is generated by the above program

This confused both of us a lot. I know that Sender<T> can be sent across threads safely — that's the whole point of it! We tried a few things (using .write() on the RwLock instead of .read()) but we couldn't get it to compile. Eventually, I wondered if the code would work if we used the simpler Mutex instead of an RwLock. I had the candidate try this instead, and the program works. Here's what that program looks like:

use std::sync::{Mutex, Arc, mpsc::channel};

pub fn main() {
    let (tx, rx) = channel();

    let x = Arc::new(Mutex::new(tx));

    std::thread::spawn(move || {
       x.lock().unwrap().send(4u8).unwrap(); 
    });

    dbg!(rx.recv().unwrap());
}

Program which uses Mutex<Sender<u8>> instead of RwLock<Sender<u8>>

Running this program produces the output that we expected from the first one:

[src/main.rs:12] rx.recv().unwrap() = 4

It's worth pointing out that if you find yourself putting a Sender inside some type of Mutex or locking data structure, then you may want to reconsider your program's structure. But why does the RwLock version fail, and the Mutex version work?

To answer this question, we need to consider the meaning of the Send and Sync traits, the Sender, RwLock and Mutex types, and look closely at the implementations of Send and Sync for all of the types in question. Let's start with RwLock and Mutex.

RwLock vs. Mutex

Let's look at the differences between the two synchronization types that were used in this example, RwLock and Mutex. They are conceptually very similar — both allow you to guard a resource to ensure that it is accessed from multiple threads in a safe way. They have one important difference, however. Mutex holds a lock for both reads and writes, whereas RwLock treats reads and writes differently, allowing for multiple read locks to be taken in parallel but requiring exclusive access for write locks. The way that RwLock enforces single writer, multiple reader rules is the same as the single &mut, multiple & reference rules for single-threaded code. Here's a visual of how the same sequence of reads & writes occurs with a Mutex and an RwLock.

Thread safety and Learning in Rust — Visual on read vs write scheduling with `Mutex` and `RwLock`

As the diagram shows, when you have a read-heavy workload it is preferable to use an RwLock to avoid blocking reader threads when it's not required. The same sequence of events takes much longer with a Mutex than it does with an RwLock.

We thought that we could get away with using an RwLock around our Sender because the method that we needed to call (Sender::send(&self, value: T)) requires an immutable (read) reference to the underlying Sender rather than a mutable &mut self reference. This implies that we could use RwLock and RwLock::read() to guard calls to Sender::send(). This would allow us to spend much less time waiting on synchronization than if we used a Mutex around the Sender.

Send

The Send trait is the simpler of the two traits to explain conceptually. A type is Send if it is safe to transfer it across thread boundaries. This means that you can construct an instance of the type on one thread, transfer ownership to another thread, and Drop it on that other thread. Most types in Rust are Send, but there are a few exceptions. The most common non-Send types are Rc and all raw pointer types. The raw pointer types are not Send because Rust cannot make any guarantees about the synchronization mechanisms that are in the underlying types, or when the pointers will be accessed. Rc is not Send because it uses a non-atomic reference counter which is not safe to send between threads.

Sync

Sync is a bit trickier to explain. A type T is Sync if and only if &T is Send. This means that it must be safe to share immutable references of T between threads. A simpler way to think about this is to imagine if it would be safe to have multiple immutable references of type T being read from in parallel from multiple threads. The most common non-Sync types are UnsafeCell, Cell, and RefCell. These types all rely on unsynchronized mutation, so they are inherently not thread-safe.

Sender

The std::sync::mpsc::Sender type is the producer-half of a multi-producer, single-consumer channel. It's intended to be used by calling Sender::clone and using a single owned Sender per-thread. This pattern permits Sender to be non-Sync because each individual instance of Sender does not need to synchronize with itself across multiple threads. It only needs to be Send so that instances can be moved from thread to thread with ownership staying consistent.

Send + Sync in our programs

Let's look at the Send & Sync implementations for all of the types that are in our example programs. Our programs used Arc<Mutex<Sender<u8>>> (works) and Arc<RwLock<Sender<u8>>> (does not work).

impl<T: ?Sized + Send> Send for RwLock<T>
impl<T: ?Sized + Send + Sync> Sync for RwLock<T>

impl<T: ?Sized + Send> Send for Mutex<T>
impl<T: ?Sized + Send> Sync for Mutex<T>

impl<T: Send> Send for Sender<T>
impl<T> !Sync for Sender<T>

impl<T: ?Sized + Sync + Send> Send for Arc<T>
impl<T: ?Sized + Sync + Send> Sync for Arc<T>

The !Sync syntax indicates that Sender<T> is explicitly not Sync. The ?Sized syntax is not relevant to this discussion.

There are a few important things to note here:

Sender<T> is not Sync. It's not safe to have multiple immutable references to a Sender live across multiple threads at the same time. Considering that Sender::send(&self) requires only an immutable reference, there must be some kind of interior mutability going on.

Mutex<T> is both Send and Sync if T inside of it is Send. If something is inside of a Mutex, it will never have multiple immutable references live at the same time since Mutex does not allow multiple locks (read or write) to be taken at the same time. Because of this, Mutex<T> effectively bypasses the restrictions of Sync.

RwLock<T> is Send where T is Send, but it is only Sync if T is both Send + Sync. Since most types are Send + Sync, this is a non-issue. It only causes problems because the Sender<T> within it is not Sync. Recall from our explanation that RwLock allows multiple read locks to be open in parallel.

Arc<T> is only Send and Sync if the underlying T is both Send + Sync, meaning that you cannot send an Arc<T> across thread boundaries where T: !Sync.

These things all cascade to cause the compiler error that we saw in the first program. Sender<T> is not Sync meaning that RwLock<Sender<T>> is not Sync, meaning that Arc<RwLock<Sender<T>>> is neither Send nor Sync. That means that this type, despite containing three different supposedly thread safe synchronization types is not thread safe at all.

So what?

I think that this compiler error and the resulting exploration of the lock types and Send & Sync was a fantastic demonstration of the power of Rust's type system. Coming from other systems programming languages, these ideas are often much more nebulous. A type is often labeled (only via documentation) as "thread-safe" or "not thread-safe." In Rust, the Send and Sync traits allow us to express much more granular ideas about thread safety, and communicate them much more clearly. A Sender<T> is "thread safe" in that it can be sent between threads, but it cannot be shared between threads. That's what Send and !Sync tells us. There are so many layers of safety to multithreaded programming, and those layers exist in other systems languages, but in Rust we have the compiler to help us through them instead of relying on our own memory of how they work.

In an interview setting, would the un-safety of Arc<RwLock<Sender<u8>>> have caused program crashes? Maybe, maybe not. It would have caused issues under a production workload for sure, and I'm glad that we have Rust to point these things out to us for those times.

I really enjoyed going on this dive into thread safety in Rust, and it was all due to collaborating with an interview candidate on solving an interesting problem. If you like solving interesting problems, check out our jobs board and join our team!