Imaculate

Posted on Oct 3, 2020

That's so Rusty!: Smart pointers

#rust #safety #pointers

If you've been following this series, the post about Ownership probably gave you the impression that Rust is a no-brainer and C++ should never be used in production. Smart pointers may change your mind. In modern terms, Smart pointers are pointers which are a little (um ...) extra. They are essentially memory addresses that manage the object they point to and automatically clean up when the object is no longer in use. This eliminates lots of bugs caused by memory mismanagement and makes programming a lot less tedious. While C++ smart pointers provide a safe alternative to raw pointers, Rust smart pointers extend the language capabilities while maintaining safety.

Smart pointers are dereferenced like regular pointers but exhibit varying behavior on assignment and deallocation. As a result, there are different kinds of smart pointers. In this post we'll explore how they are used to implement variations of Linked Lists namely:
1) A singly linked list
2) A shared linked list
3) A doubly linked list

1. Simple linked list

A linked list is a linear collection of nodes in which each node points to the next. In a singly linked list, each Node has its own data and a pointer to the next Node, the last Node points to NULL indicating end of list.

A. Rust

In Rust a Singly Linked List Node could be defined as below:

struct Node {
    value: i32,
    next: Node,
}

But it won't compile for various reasons. First, since next can be NULL, next should be an Option<Node>, the NULL equivalent in Rust. Moreover, a Rust struct must have known size at compile time. If Option<Node> is a field of Node, the size of Node can be as long as the length of list, potentially infinite. To workaround that, pointers are used since they have finite size, they are afterall just addresses. The most straight forward smart pointer is the Box (Box<T>). It allocates data on the heap and when it goes out of scope both the pointer and data it points to are dropped. On assignment, Boxes follow Rust ownership rules; on assignment, ownership of both data and pointer is moved. Changing the type of next to Box<Option<T>> precisely captures the essence of a node.

Below example shows how two nodes are singly linked to a list.

struct Node {
    value: i32,
    next: Box<Option<Node>>,
}

fn main() {
    let a = Node {
        value: 5,
        next: Box::new(None),
    };
    let b = Node {
        value: 10,
        next: Box::new(Some(a)),
    };
    println!("b is {:?}", b);
    // println!("a is {:?}", a);
}

It runs successfully but uncommenting last print statement leads to compile error since a was moved when it was assigned to b.next.

B. C++

The C++ equivalent to Boxes are unique pointers. As the name implies, unique pointers own objects explicitly, when deallocation conditions are met it deletes the managed object irrespective of other pointers pointing to it. For this reason, there should be only one unique pointer managing an object. To assign the object to another unique pointer, it has to be moved; ownership is transferred and the previous pointer is nullified. Sound familiar? It should, because Rust ownership system behaves similarly. C++ unique pointers have similar benefits but they don't provide compile-time guarantee of memory safety; dereferencing a nullified pointer will error-out at runtime. Below is an example of how a linked list node can be implemented with unique pointers.

#include <iostream>
#include <memory>
#include <utility>
using namespace std;

struct Node
{
    int value;
    unique_ptr<Node> next;
};

void printNode(unique_ptr<Node> n)
{
    if (n != nullptr)
    {
        cout << "value: " << n->value << ", ";
        printNode(move(n->next));
    }
    cout << '\n';
}

int main()
{
    Node a{5, nullptr};
    unique_ptr<Node> upA(&a);
    Node b{10, move(upA)};
    unique_ptr<Node> upB(&b);
    printNode(move(upB));
    // printNode(move(upA));
}

printNode() has been implemented since C++ doesn't generate toString() implementation like Rust. Unique pointer upA is moved to be assigned to next of node b, the pointers also have to be moved to be passed to functions. Uncommenting the last print statement results in segmentation fault since the upA is null.

2. Shared linked list

A shared linked list is one in which two or more lists share one or more segments. The diagram below shows an example where segment C-D is shared by two lists that start at A and B respectively.

A. Rust

To support shared lists, nodes have to be able to have multiple owners. Can we use Boxes for this type of lists?

#[derive(Debug)]
struct Node {
    value: i32,
    next: Box<Option<Node>>,
}

fn main() {
    let a = Node {
        value: 5,
        next: Box::new(None),
    };
    let b = Node {
        value: 10,
        next: Box::new(Some(a)),
    };
    println!("b is {:?}", b);
    let c = Node {
        value: 20,
        next: Box::new(Some(a)),
    };
}

The compiler disagrees because boxes can only have one owner.

To support multiple ownership Rust has Reference Counted smart pointers, shortened as Rc<T>.Rc<T> pointers are shared by cloning which creates a copy that points to the same data and increments the reference count. This count is decremented when the pointers become invalid.

To make the node shareable, the type is next is changed to Rc<Option<Node>> in place of Box<Option<Node>>. This change justifies defining another struct, SharedNode to distingush from the simple Node. The node in a is shared by b and c by cloning its smartpointer. This time around, the compiler is satisfied.

#[derive(Debug)]
struct SharedNode {
    value: i32,
    next: Rc<Option<SharedNode>>,
}

use std::rc::Rc;

fn main() {
    let a = Rc::new(Some(SharedNode {
        value: 5,
        next: Rc::new(None),
    }));
    let b = SharedNode {
        value: 10,
        next: Rc::clone(&a),
    };
    let c = SharedNode {
        value: 20,
        next: Rc::clone(&a),
    };
    println!("a is {:?}", a);
    println!("b is {:?}", b);
    println!("c is {:?}", c);
}

Reference counts

It is also possible to trace how the reference count is updated using the function Rc::strong_count(). In the example below, reference count of SharedNode managed by a increases on cloning it to link to nodes b and c. The count decreases when c goes out of scope.

let a = Rc::new(Some(SharedNode {
    value: 5,
    next: Rc::new(None),
}));
println!("Rc count of a after creating a = {}", Rc::strong_count(&a));
let b = SharedNode {
    value: 10,
    next: Rc::clone(&a),
};
println!("Rc count of a after creating b = {}", Rc::strong_count(&a));
{
    let c = SharedNode {
        value: 20,
        next: Rc::clone(&a),
    };
    println!("Rc count of a after creating c = {}", Rc::strong_count(&a));
}
println!(
    "Rc count of a after c goes out of scope = {}",
    Rc::strong_count(&a)
);

Mutation

We established in mutables post that Rust doesn't prefer default mutability partly because multiple mutable references lead to data races and race conditions. It is crucial that smart pointers are mutable, otherwise they would have limited functionality. To bridge the gap, Rust offers RefCell<T>, another type of smart pointer pointer which offers interior mutability: a way to mutate immutable references by deferring borrowing rules enforcement to runtime. Interior mutability is useful but since references are analyzed at runtime vs compile time, it can allow unsafe code to blow-up at runtime and also cause performance regression.

The example below demonstrates how Rc<T> and Box<T> types can be mutated. RefCell<T> has borrow_mut() function which returns a mutable smart pointer RefMut<T>, the pointer can be dereferenced (with * operator) and mutated. Borrowing rules still apply therefore if more than one RefCell<T> is used in same scope, the program will panic at runtime.

#[derive(Debug)]
struct Node {
    value: i32,
    next: Box<RefCell<Option<Node>>>,
}

#[derive(Debug)]
struct SharedNode {
    value: i32,
    next: Rc<RefCell<Option<SharedNode>>>,
}
use crate::List::{Cons, Nil};
use std::cell::RefCell;
use std::rc::Rc;

fn main() {
    println!("Mutating node");
    let node_a = Node {
        value: 5,
        next: Box::new(RefCell::new(None)),
    };
    let a = Box::new(RefCell::new(Some(node_a)));
    let b = Node { value: 10, next: a };
    println!("Before mutation b is {:?}", b);

    if let Some(ref mut x) = *b.next.borrow_mut() {
        (*x).value += 10;
    }
    println!("After mutation b is {:?}", b);

    println!("Mutating shared node ...");
    let node_a = SharedNode {
        value: 5,
        next: Rc::new(RefCell::new(None)),
    };
    let a = Rc::new(RefCell::new(Some(node_a)));

    let b = SharedNode {
        value: 10,
        next: Rc::clone(&a),
    };
    let c = SharedNode {
        value: 20,
        next: Rc::clone(&a),
    };
    println!("Before mutation a is {:?}", a);
    println!("Before mutation b is {:?}", b);
    println!("Before mutation c is {:?}", c);

    if let Some(ref mut x) = *a.borrow_mut() {
        (*x).value += 10;
    }

    println!("After mutation a = {:?}", a);
    println!("After mutation b = {:?}", b);
    println!("After mutation c = {:?}", c);
}

B. C++

The C++ equivalent to Rc<T> is the shared pointer. It exhibits similar reference counting behaviour and has the advantage of easier mutation. The snippet below shows how it is used to create a shared linked list.

#include <iostream>
#include <memory>
#include <utility>
using namespace std;

struct SharedNode
{
    int value;
    shared_ptr<SharedNode> next;
};

void printSharedNode(shared_ptr<SharedNode> n)
{
    if (n != nullptr)
    {
        cout << "value: " << n->value << " -> ";
        printSharedNode(n->next);
    }
    cout << '\n';
}

int main()
{
    SharedNode node_a{5, nullptr};
    shared_ptr<SharedNode> a(&node_a);
    cout << "Reference count of a: " << a.use_count() << endl;
    SharedNode node_b{10, a};
    shared_ptr<SharedNode> b(&node_b);
    cout << "Reference count of a, after linking to b: " << a.use_count() << endl;
    SharedNode node_c{20, a};
    shared_ptr<SharedNode> c(&node_c);
    cout << "Reference count of a, after linking to c: " << a.use_count() << endl;

    // mutation
    a->value = 2;

    printSharedNode(a);
    printSharedNode(b);
    printSharedNode(c);

    a.reset();
    cout << "Reference count of a on reset: " << a.use_count() << endl;
    // cout << " a is " << a->value << endl;
}

The output looks like:

Although shared pointers are easier to work with, they are not immune to C++ safety concerns. Uncommenting the last print statement above results in seg-fault at runtime.

3. Doubly linked list

In a doubly linked list, each node has pointer to the next as well previous node in the list. Therefore, a doubly linked Node has prev field that is of the same type as next.

A. Rust

Using the pointers we've used so far we can create doubly linked node named DoubleNode. Setting and updating prev and next fields requires interior mutability hence the need for RefCell<T>. To allow the DoubleNode to be owned by next and previous node we'll use Rc<T>. The prev and next fields of border nodes are nullable we'll use Option<DoubleNode>. Therefore, the type of prev and next fields becomes Rc<RefCell<Option<DoubleNode>>>.

For simplicity, we create a list of two nodes node_a and node_b and their corresponding pointers a and b. node_b is created with a clone of a as its next element and using interior mutability, the prev element of node_a is made to point to node_b. This is done in the snippet below which prints node information and reference count before and after linking the nodes.

use std::cell::RefCell;
use std::rc::Rc;

#[derive(Debug)]
struct DoubleNode {
    value: i32,
    next: Rc<RefCell<Option<DoubleNode>>>,
    prev: Rc<RefCell<Option<DoubleNode>>>,
}

fn main() {
    let node_a = DoubleNode {
        value: 5,
        next: Rc::new(RefCell::new(None)),
        prev: Rc::new(RefCell::new(None)),
    };
    let a = Rc::new(RefCell::new(Some(node_a)));
    let node_b = DoubleNode {
        value: 10,
        next: Rc::clone(&a),
        prev: Rc::new(RefCell::new(None)),
    };
    let b = Rc::new(RefCell::new(Some(node_b)));

    println!(
        "Before linking a is {:?}, rc count is {}",
        a,
        Rc::strong_count(&a)
    );
    println!(
        "Before linking b is {:?}, rc count is {}",
        b,
        Rc::strong_count(&b)
    );

    if let Some(ref mut x) = *a.borrow_mut() {
        (*x).prev = Rc::clone(&b);
    }

    println!("After linking a rc count is {}", Rc::strong_count(&a));
    //println!("After linking a is {:?}", a);
    println!("After linking b rc count is {}", Rc::strong_count(&b));
    //println!("After linking b is {:?}", b);
}

It compiles and runs just fine but the output gets interesting when the last two commented print statements are uncommented.

Printing any node goes infinitely in cycles, resulting in stack overflow. This is because to extract the string representation of a node, we have expand all its fields. To print node_a, we print its fields: value (5), next (None) and prev(node_b), prev points to a DoubleNode therefore we print it in a similar fashion: value (10), next (node_a) and prev(None), next points to DoubleNode so we expand on it, returning us to printing node_a again, the cycle continues infinitely. This is an example reference cycle which has manifested in stack overflow.

Another consequence of reference cycles is memory leak which happens when memory is not released. On successfully running the snippet above, it is observed that both pointers a and b have a reference count of 2. At the end of main, Rust will attempt to drop b, this leaves 1 reference to node_b, the prev pointer of node_a. This reference count will remain 1 indefinitely, preventing the node_b from being dropped. As a result, both nodes will not be dropped, leading to memory leak. Since the program above exits shortly, the OS cleans the memory. Memory leaks are more severe in long running programs like servers. This is one of the few bugs that can slip past the Rust compiler.

Does this mean it is impossible to implement doubly linked lists in Rust? No. It is possible with another kind of pointer called weak pointer. A weak pointer is a smart pointer that holds a non-owning reference to an object that is managed by a shared pointer.
Denoted as Weak<T>, weak pointers are similar to Rc<T> in that they can share ownership, but they don't weigh against deallocation. The example below shows how they solve our Doubly Linked List mystery.

use std::cell::RefCell;
use std::rc::{Rc, Weak};

#[derive(Debug)]
struct DoubleNode {
    value: i32,
    next: Rc<RefCell<Option<DoubleNode>>>,
    prev: Weak<RefCell<Option<DoubleNode>>>,
}

fn main() {
    let node_a = DoubleNode {
        value: 5,
        next: Rc::new(RefCell::new(None)),
        prev: Weak::new(),
    };
    let a = Rc::new(RefCell::new(Some(node_a)));
    let node_b = DoubleNode {
        value: 10,
        next: Rc::clone(&a),
        prev: Weak::new(),
    };
    let b = Rc::new(RefCell::new(Some(node_b)));

    println!(
        "Before cycle a is {:?}, rc count is {}",
        a,
        Rc::strong_count(&a)
    );
    println!(
        "Before cycle b is {:?}, rc count is {}",
        b,
        Rc::strong_count(&b)
    );

    if let Some(ref mut x) = *a.borrow_mut() {
        (*x).prev = Rc::downgrade(&b);
    }

    println!(
        "After cycle a rc count is {}, weak count is {}",
        Rc::strong_count(&a),
        Rc::weak_count(&a)
    );
    println!("After cycle a is {:?}", a);
    println!(
        "After cycle b rc count is {}, weak count is {}",
        Rc::strong_count(&b),
        Rc::weak_count(&b)
    );
    println!("After cycle b is {:?}", b);
}

The absence of stack overflow on printing the node is a sign reference cycles have been removed.

This has been achieved by making the prev pointer a weak pointer. A weak pointer is made by downgrading a shared pointer instead of cloning it and doesn't affect the effective reference count.

By tracing reference counts we can see how reference cycles are prevented. After doubly linking the nodes, a has strong count of 2 and b has strong count of 1 and weak count of 1. At the end of main, Rust will attempt to drop b, leaving node_b with weak count of 1. Since weak pointers have no bearing on deallocation, the node is dropped. After node_b is dropped, its link to a is removed, reducing node_a's strong count to 1. When a goes out of scope, node_a's strong count becomes 0, allowing it to be dropped. Essentially, reference cycles have been solved by reducing the importance of some references. This is also evident in the output where weak pointers are not expanded but simply annotated as (Weak).

B. C++

The equivalent of Rust weak pointers in C++ are well, weak pointers. They are used in a similar way to prevent reference cycles. They can be used to implement doubly linked list as shown below.

#include <iostream>
#include <memory>
#include <utility>
using namespace std;

struct DoubleNode
{
    int value;
    shared_ptr<DoubleNode> next;
    weak_ptr<DoubleNode> prev;
};


void printDoubleNode(shared_ptr<DoubleNode> n)
{
    if (n != nullptr)
    {
        cout << "value: " << n->value << ", prev: (Weak) ->";
        printDoubleNode(n->next);
    }
    cout << '\n';
}

int main()
{
    DoubleNode node_a{5, nullptr, weak_ptr<DoubleNode>()};
    shared_ptr<DoubleNode> a(&node_a);
    DoubleNode node_b{10, a, weak_ptr<DoubleNode>()};
    shared_ptr<DoubleNode> b(&node_b);
    cout << "Before linking, rc count of a: " << a.use_count() << endl;
    cout << "Before linking, rc count of b: " << b.use_count() << endl;
    printDoubleNode(a);
    printDoubleNode(b);

    a->prev = b;
    cout << "After linking, rc count of a: " << a.use_count() << endl;
    cout << "After linking, rc count of b: " << b.use_count() << endl;
    printDoubleNode(a);
    printDoubleNode(b);
}

The output confirms that weak pointers do not affect reference counts.

Apart from syntactical differences, it looks like Rust smart pointers are very similar to C++. They were designed to address similar problems. While Rust smart pointers maintain compile time guarantees (except for reference cycles), C++ smart pointers are easier to manipulate and reference count operations are thread safe. Which ones do you prefer?

Top comments (3)

Pengfei Gao • May 8 '22

This article is really helpful. I have a little question. In the C++ version of Simple linked list, the Node a and Node b defined in main are in the stack I think. After the call to printNode, they may be freed. However, when the main procedure ends, I think Node a and Node b will be freed again, leading to double free. Is that correct? Thanks a lot!