Marc Cámara

Posted on Jan 16

A Practical Guide to Rust Smart Pointers

#rust #smartpointers

Smart pointers are one of Rust's most powerful features for managing memory and ownership. Rust has a lot of smart pointers, so keeping track of all of them is a daunting task.

In this article, I'll walk through the most common smart pointers you'll encounter in Rust, with practical examples of when to reach for each one.

Rc

Rc is one of the most commonly used smart pointers in Rust. Use it when ownership is genuinely shared between multiple owners and lifetimes become complex. Rc keeps a reference count of the number of owners of the data (basically, when a Rc variable is cloned, the reference count is incremented). When the reference count reaches zero, the data can be safely deallocated. Rc does not implement the Send or Sync traits, so it cannot be sent to other threads, Arc fixes it...

Arc

Arc is probably the most popular smart pointer across all Rust projects. Pretty similar to Rc, use it when you need to have multiple owners of the same data in different places of your app and there is no way to know at compile time when this data can be freed. The difference between Rc and Arc is that Arc implements the Sync and Send traits, so it can be sent to other threads.

Quick note from the developer

You are looking at one of the articles on my personal blog, please support me by visiting it, thanks!

Box

Box is simpler than it might seem at first. It allocates data on the heap instead of the stack, which is useful in a few specific scenarios, especially when you have multiple implementations of a trait, but you don't really know which one will be used at runtime. Like in this example:

fn main() {
    let consoles: Vec<Box<dyn Console>> = vec![
        Box::new(Nintendo),
        Box::new(Sony),
    ];

    for console in &consoles {
        console.play();
    }
}

Box also works well when a struct contains a field of its very same type:

struct Node {
    value: i32,
    next: Option<Box<Node>>, // Box breaks the infinite size chain
} 

fn main() {
    let node2 = Node { value: 2, next: None };
    let node1 = Node { value: 1, next: Some(Box::new(node2)) };
}

RwLock

RwLock is great too! Thread safety, mutable global state that can be shared across the app and it can be read from multiple places at the same time (locks on reading are not exclusive). The only downside is that when you request a write lock while there is a RwLock guard in place, the request for that write lock will block until the guard is dropped, but that's the price to pay for data consistency.

Mutex

Mutex is similar to RwLock but a bit simpler. The difference is that Mutex locks the data whether it is being written or read. This means that if you have a Mutex guard in place, you cannot read the data until the guard is dropped. Use Mutex when your data needs exclusive access for both reads and writes, or when you want simpler semantics without worrying about read/write lock distinctions. In my particular case, I usually find myself using RwLock most of the times.

Cell

Cell is a mutable container that can be used in a single-threaded context. It allows you to mutate the value inside without having to worry about thread safety as it offers interior mutability but it only works for Copy types and makes mutation less visible.

RefCell

RefCell is pretty similar to Cell but it does not need the contained value to have the Copy trait. It can become really dangerous as if you call the borrow_mut function while there is any borrow in place (mutable or immutable), your app will panic. This is one of the few places where Rust enforces borrowing rules at runtime instead of compile time.

LazyLock

LazyLock is pretty popular for using the singleton-like global initialization in a multi-threaded context. It implements the Sync trait, being able to be used in statics, making it perfect for database connections, http clients, etc shared across threads. It is lazily initialized, meaning that the initialization code is only executed when the value is first accessed. The only downside is that the initialization code must be synchronous. Lazy smart pointers must have the initialization logic upfront and the closure must be known at compile time.

LazyCell

LazyCell is similar to LazyLock but without locking and without implementing the Sync trait. As LazyLock, it is lazily initialized. I haven't found a use case for my projects yet, as I always lean towards LazyLock or OnceLock. Only good for single-threaded contexts.

OnceLock

OnceLock fixes the LazyLock async problem by not requiring a closure to initialize and allows you to set the value later, even after performing an async operation. Once smart pointers can be initialized with different logic across your app depending on the context but can only be initialized once and stored by using the set function.

OnceCell

OnceCell is another Once smart pointer, similar to LazyCell (not thread-safe) but being able to be initialized with different logic across your app.

Lazy vs Once, a quick comparison

// Lazy vs Once
// Lazy
pub static STRIPE_CLIENT: LazyLock<Client> = LazyLock::new(|| {
    // LazyLock panics if initialization panics
    let api_key = env::var("KEY").unwrap();
    stripe::Client::new(api_key)
});

// Once
// Can be set from anywhere, even async contexts
static DB_POOL: OnceLock<Pool> = OnceLock::new();

#[tokio::main]
async fn main() {
    let pool = PgPool::connect(postgres_url).await?;
    let _ = DB_POOL.set(pool);
}

Cow

Cow implements clone on write functionality, it is used in scenarios where you have to work with a shared reference to some existing data or creating a new variable and maintain ownership of it. Rc::make_mut and Arc::make_mut also have clone on write semantics but keep a reference count. Cow can be useful when you want to avoid unnecessary cloning of data just for reading. An example is better than a hundred words:

fn process_string(input: Cow<str>) -> Cow<str> {
    if input.contains("ERROR") {
        // As it needs to be modified, cow clones here
        Cow::Owned(input.replace("ERROR", "WARNING"))
    } else {
        // No modification needed - just return the borrowed data with no clones
        input
    }
}

Pin ensures the memory location for the data stored never changes. It works well to store async functions that you don't want to call await at the same moment they are declared. This is required when some futures are self-referential as moving a future after it has been polled can invalidate internal pointers and result in undefined behavior. Use pin in a situation like the following:

#[tokio::main]
async fn main() {
    let futures: Vec<Pin<Box<dyn Future<Output = &'static str>>>> = vec![
        Box::pin(async {"Hello world!"}),
        Box::pin(async {"Bye world!"}),
    ];

    // Now we can call await on all the pinned futures
    let results = future::join_all(futures).await;

    //...
}

Weak

Weak is not really popular as the reference-count alternatives, it keeps a weak reference count but does not keep the data alive. Once all strong references are dropped, the data is deallocated and any remaining Weak pointers will fail to be accessed. It is only useful when you want to read data but it may or may not be there.

Wrapping up

And that's it! That was a quick tour of one of the most important concepts in Rust - smart pointers. I find them incredibly useful for managing memory and ensuring data integrity in concurrent environments but they can be dangerous if you don't choose the right one for the right job.

Personally, I tend to lean towards RwLock, Rc and Arc for most cases. However, all of them can be useful in different scenarios, so it's good to know their strengths and weaknesses.

DEV Community