Arunanshu Biswas

Posted on Nov 4, 2021 • Edited on Nov 28, 2021 • Originally published at arunanshub.hashnode.dev

Self Referential Structs in Rust (Part 1)

#rust #programming #tutorial

TL;DR

Use Pin<Box<T>> for the held value (here, Me).
Mark Me as unpinnable by setting a field as std::marker::PhantomPinned.

Eg, _pinned: std::marker::PhantomPinned.
Use Rc<RefCell<H>> for holder, (here, Holder) and store Me as raw
pointers.

Eg, me: BTreeSet<*const Me>.
Store Holder within Me using Rc::clone(...).
Store Me within Holder by converting it into a raw pointer. It is completely upto you if you want to store a mutable raw pointer or a const raw pointer.
Implement Drop trait such that it removes all references to Me from Holder once it is dropped.

Click here to jump to solution.

Introduction

There are mainly two ways to represent self-referential struct in Rust.

Reference within the struct:

struct RefWithinMe {
    value: String,

    pointer_to_value: &'? str,
}

These can easily be solved using ouroboros.

#[macro_use]
extern crate ouroboros;

#[self_referential]
struct RefWithinMe {
    value: String,

    #[borrows(value)]
    pointer_to_value: &'this str,
}

We won't be discussing about these.

Reference outside the struct (A referential cycle):

use std::collections::BTreeSet;

#[derive(Ord, PartialEq, Eq, PartialOrd)]
struct Me<'a> {
    /// The my_holder that I hold.
    my_holder: &'a Holder<'a>,
}

/// The my_holder that holds `Me`.
#[derive(Ord, PartialEq, Eq, PartialOrd)]
struct Holder<'a> {
    set_of_me: BTreeSet<&'a Me<'a>>,
}

impl Me<'a> {
    fn new(my_holder: &'a mut Holder<'a>) -> Self {
        let this = Self {
            my_holder,
        };
        // Insert myself here.
        my_holder.set_of_me.insert(&this);

        // but the value is being moved 🤔
        this
    }
}

We are going to discuss about this.

Problem Description

Given that you have two structs, a holder (Holder, say) and a struct to hold something (Me, say), how will you establish a circular reference?

Experimentation

Try to run the above code and see what happens.

You get an error where Rust complaints about:

error[E0502]: cannot borrow `my_holder.set_of_me` as mutable because it is also borrowed as immutable
   |
15 | impl<'a> Me<'a> {
   |      -- lifetime `'a` defined here
...
18 |            my_holder,
   |            ---------
   |            |
   |            immutable borrow occurs here
   |            requires that `*my_holder` is borrowed for `'a`
...
21 |        my_holder.set_of_me.insert(&this);
   |        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ mutable borrow occurs here

error[E0597]: `this` does not live long enough
   |
15 | impl<'a> Me<'a> {
   |      -- lifetime `'a` defined here
...
21 |        my_holder.set_of_me.insert(&this);
   |        ---------------------------^^^^^-
   |        |                          |
   |        |                          borrowed value does not live long enough
   |        argument requires that `this` is borrowed for `'a`
...
25 |    }
   |    - `this` dropped here while still borrowed

error[E0505]: cannot move out of `this` because it is borrowed
   |
15 | impl<'a> Me<'a> {
   |      -- lifetime `'a` defined here
...
21 |        my_holder.set_of_me.insert(&this);
   |        ---------------------------------
   |        |                          |
   |        |                          borrow of `this` occurs here
   |        argument requires that `this` is borrowed for `'a`
...
24 |        this
   |        ^^^^ move out of `this` occurs here

error: aborting due to 3 previous errors

Aaaand, we hit a lifetime error!

I'll assume that you have a basic knowledge about Rust's lifetimes. If not, don't worry! I'll try to keep this article as simple as possible. But if you're interested, You can read my post on Rust Lifetimes.

Understanding the Error

So, here we see that Rust complaints us about this variable not "living long enough".

Pay attention to the message, "borrowed value does not live long enough". Rust wants us to understand that if this is moved, then my_holder.set_of_me will have a reference to a nonexistent object. And it's never a good idea to dereference a reference (or, a pointer) to something that does not exist. *SEGFAULT intensifies 😖*

In short, Rust prevents the creation of dangling pointer.

But, this prevents us from creating a self-referential struct. Worry not, because we can pull up some tricks to make the Rust compiler "think" that there are no lifetime violations. However, it requires you to go down to the unsafe level.

But first, let us talk about moves and why they are a big deal.

Understanding Moves in Rust

A "move" in a function happens when you return a value:

fn foo() -> String {
    let value = "ABC".to_owned();
    value  // move occurs
}

Now, for ordinary objects, "moving" is fine. But since objects (and their references) created within the function have the lifetime of the function itself, Rust disallows it (thankfully!).

That also means, we need to find a way to tell Rust that "our move will keep the reference so please don't drop it!".

Raw Pointers of Rust

Let's jump right to the point. Raw pointers are your typical C-like pointers that are not bound to Rust's borrow checker.

But Rust's pointer suffer from a fatal flaw. They are not safe to use when two variables are swapped.

Let me demonstrate it. In fact, the following is adapted from The Rust Language Tutorial for Async. Follow this link to know more.

#[derive(Debug)]
struct Test {
    value: String,
    pointer_to_value: *const String,
}

impl Test {
    fn new(txt: &str) -> Self {
        let mut this = Test {
            value: String::from(txt),
            // In `C`, you'd call this initialize to `NULL`.
            pointer_to_value: std::ptr::null(),
        };
        this.pointer_to_value = &this.value;
        this
    }

    fn get_value(&self) -> &str {
        &self.value
    }

    fn get_pointer_to_value(&self) -> &String {
        unsafe { &*(self.pointer_to_value) }
    }
}

Here we have a simple struct called Test. pointer_to_value points to value. Two methods (get_value and get_pointer_to_value) have been provided as getters.

Now, let's say we create two instances of Test and use our getters to inspect the contents.

fn main() {
    let mut test1 = Test::new("test1");
    let mut test2 = Test::new("test2");

    println!(
        "test1: value: {}, ptr to value: {}",
        test1.get_value(),
        test1.get_pointer_to_value()
    );
    println!(
        "test2: value: {}, ptr to value: {}",
        test2.get_value(),
        test2.get_pointer_to_value()
    );
}

Executing the code gives:

test1: value: test1, ptr to value: test1
test2: value: test2, ptr to value: test2

Nothing out of the ordinary right? You get the contents of value and pointer_to_value and print it. But the real danger happens when we swap the variables.

fn main() {
    // ...
    println!("Swapping...");
    std::mem::swap(&mut test1, &mut test2);

    println!(
        "test1: value: {}, ptr to value: {}",
        test1.get_value(),
        test1.get_pointer_to_value()
    );
    println!(
        "test2: value: {}, ptr to value: {}",
        test2.get_value(),
        test2.get_pointer_to_value()
    );
}

You might expect this to print:

Swapping...
test1: value: test2, ptr to value: test2
test2: value: test1, ptr to value: test1

But to our surprise, this prints:

Swapping...
test1: value: test2, ptr to value: test1
test2: value: test1, ptr to value: test2

Unfolding the Mystery of Raw Pointers

So why did that happen? The answer is quite simple. "A pointer follows its pointee".

When we swapped the variables, the contents got swapped too. But that does not mean the pointers will change their pointee. This image demonstrates the point:

But, this is not good as this might potentially lead to situations like "double free" and "incorrect access".

We need a way to tell Rust that our variables should not be moved. Or in other words, we need something that will encapsulate our objects and make them immovable.

`Pin<T>` the objects

Meet Pin<T>! They "pin" objects to its location in memory².

Now, you can either pin to the stack or pin to the heap. For this case, pinning to the heap is the simplest and most viable option.

In Rust, we know that we can put a variable to the heap by using Box<T>, Rc<T> or Arc<T>. For this case, Box<T> is sufficient.

To our convenience, Box<T> provides a method called Box::pin(...). We can use this to pin our object to encapsulate and prevent it from incorrect moves.

So how do we use Pin<T>? Let's modify our code to find out.

use std::pin::Pin;

#[derive(Debug)]
struct Test {
    // same as before
}

impl Test {
    fn new(txt: &str) -> Pin<Box<Self>> {  // return Pin<Box<Self>>
        let mut this = Box::pin(Test {
            value: String::from(txt),
            // In `C`, you'd call this initialize to `NULL`.
            pointer_to_value: std::ptr::null(),
        });
        this.as_mut().pointer_to_value = &this.value;
        this
    }

    fn get_value(self: Pin<&Self>) -> &str {  // modify `self` parameter
        &self.get_ref().value
    }

    // modify `self` parameter
    fn get_pointer_to_value(self: Pin<&Self>) -> &String {
        unsafe { &*(self.pointer_to_value) }
    }
}

Note that whenever you use Pin<T>, the self argument is modified to accept a Pin<T> type only.

Running the main function as before:

fn main() {
    let mut test1 = Test::new("test1");
    let mut test2 = Test::new("test2");

    println!(
        "test1: value: {}, ptr to value: {}",
        test1.as_ref().get_value(),
        test1.as_ref().get_pointer_to_value(),
    );
    println!(
        "test2: value: {}, ptr to value: {}",
        test2.as_ref().get_value(),
        test2.as_ref().get_pointer_to_value(),
    );

    println!("Swapping...");
    std::mem::swap(&mut test1, &mut test2);

    println!(
        "test1: value: {}, ptr to value: {}",
        test1.as_ref().get_value(),
        test1.as_ref().get_pointer_to_value()
    );
    println!(
        "test2: value: {}, ptr to value: {}",
        test2.as_ref().get_value(),
        test2.as_ref().get_pointer_to_value()
    );
}

This prints:

test1: value: test1, ptr to value: test1
test2: value: test2, ptr to value: test2
Swapping...
test1: value: test2, ptr to value: test2
test2: value: test1, ptr to value: test1

As expected!

Note that we can still get a mutable reference to our enclosed Test by using as_mut().get_mut(). This cannot be good!

`PhantomPinned` to the Rescue

By using std::marker::PhantomPinned, we tell Rust that our object cannot be unpinned (Rust calls it !Unpin. Read as "Not Unpinnable").

Now, any calls to get_mut() will fail.

So, how do you use it?

modify your struct to hold that extra field.

use std::{
+   marker::PhantomPinned,
    pin::Pin,
};

#[derive(Debug)]
struct Test {
    value: String,
    pointer_to_value: *const String,
+   _pinned: PhantomPinned,
}

Initialize PhantomPinned in new()

impl Test {
    fn new(txt: &str) -> Pin<Box<Self>> {
        let mut this = Box::pin(Test {
            value: String::from(txt),
            // In `C`, you'd call this initialize to `NULL`.
            pointer_to_value: std::ptr::null(),
+           _pinned: PhantomPinned,
        });
        this.as_mut().pointer_to_value = &this.value;
        this
    }

// ...
}

Try to run it, and you'll face with yet another complaint from Rust! This time, it will say: "cannot assign to data in a dereference of Pin<&mut Test>". So, what does that mean?

It means: in the presence of PhantomPinned, you cannot modify, or mutate through a dereference. This is a side effect of PhantomPinned.

To overcome this problem, we have to "force our way" into the enclosed object. That means, we will use unsafe.

impl Test {
    fn new(txt: &str) -> Pin<Box<Self>> {
        let mut this = Box::pin(Test {
            value: String::from(txt),
            // In `C`, you'd call this initialize to `NULL`.
            pointer_to_value: std::ptr::null(),
            _pinned: PhantomPinned,
        });
+       unsafe {
+           this.as_mut().get_unchecked_mut().pointer_to_value = &this.value;
+       }
        this
    }
// ...
}

Aaaand that's it! We have constructed a self-referential struct using Pin.

But why did we go through all this? This doesn't even solve our original problem?!

We went through this in order to understand the mechanism of Rust's Pin and its associated intricacies.

Now that we have understood how pinning works, let us go back to our original problem.

Solving the Original Problem

Let us wrap Me with Pin<T>.

use std::{
   marker::PhantomPinned,
   pin::Pin,
}

#[derive(Debug)]
struct Me<'a> {
    my_holder: &'a Holder,
    _pinned: PhantomPinned,
}

impl Me {
    pub fn new(my_holder: &'a mut Holder) -> Pin<Box<Self>> {
        let this = Box::pin(Self {
            my_holder,
            _pinned: PhantomPinned,
        });

        let this_ptr: *mut _ = unsafe { val.as_mut().get_unchecked_mut() };
        my_holder.set_of_me.insert(this_ptr);

        val
    }
}

But we have a small problem with this scheme. But first, let us check what Holder looks like:

#[derive(Debug)]
struct Holder {
    set_of_me: BTreeSet<*mut Me>,
}

Holder holds several instances of Me. Or in other words, Me inserts itself in Holder, and it can insert at any time at any place. But if we keep using references for Holder (ie, &'a mut Holder), we'll soon realize that Rust does not allow us to go beyond what Borrow Checker permits.

We need to find a way to share references without letting Rust know that it is a reference. And it should be destroyed once its job is done.

`Rc<T>` and `RefCell<T>`

Rc<T> is the reference counted version of Box<T>, and RefCell<T> allows you to mutate a seemingly immutable object. It is useful mostly in these contexts. So how do we use it?

First, we create a type alias for shorthand!

type RcCell<T> = Rc<RefCell<T>>;

Next, we modify Holder to return a RcCell<Holder>.

use std::{
   cell::RefCell,
   rc::Rc,
}

#[derive(Debug)]
struct Holder {
    set_of_me: BTreeSet<*mut Me>,
}

impl Holder {
    fn new() -> RcCell<Self> {
        Rc::new(RefCell::new(Self {
            set_of_me: Default::default(),
        }))
    }
}

Let's discuss a bit about RefCell<T>.

Why `RefCell<T>`?

Let's see what happens when we try to modify anything in an Rc<T>:

fn main() {
    let mut x = Rc::new(String::from("EEE"));
    x.push('e');
}

The last line will complain: "cannot borrow data in an Rc as mutable". That means, Rc<T> acts as an immutable enclosure around the object.

Now let's see what happens when we use RefCell<T>:

fn main() {
    let mut x = Rc::new(RefCell::new(String::from("EEE")));
    x.borrow_mut().push('e');
}

RefCell<T> adds a new method called borrow_mut() that allows you to receive a mutable reference to the underlying object. And the added benefit of using RefCell<T> is that it strictly follows Rust's borrowing rules. This ensures nothing funny or obscure happens with the object.

Putting the Pieces Together

Let us tell Me to accept a RcCell<Holder>.

#[derive(Debug)]
struct Me {
+   my_holder: RcCell<Holder>,
    _pinned: PhantomPinned,
}

impl Me {
    pub fn new(
+       my_holder: RcCell<Holder>,
    ) -> Pin<Box<Self>> {
        let mut this = Box::pin(Self {
            my_holder,
            _pinned: PhantomPinned,
        });

        let this_ptr: *mut _ = unsafe { this.as_mut().get_unchecked_mut() };
+       this.my_holder.borrow_mut().set_of_me.insert(this_ptr);

        this
    }
}

But before you jump and rejoice, think about what happens in the end? What happens to the references in Holder's set_of_me after an instance of Me has been destroyed?

Right! They are still there as DANGLING POINTERS!

We need a way to remove Me from Holder's set_of_me when it will be destroyed. Fortunately, we have something called Drop trait in Rust.

impl Drop for Me {
    fn drop(&mut self) {
        let this = &(self as *mut _);
        self.my_holder.borrow_mut().set_of_me.remove(this);
    }
}

And that's it.

Why not `Weak<T>` for storing `Me`?

Now you could ask, why go through all the pain of raw pointers and Pins when I could've used Weak<T>.

Yes you could've, but Weak<T> does not implement Ord, PartialOrd, PartialEq etc. This will limit usability to a great degree.

Similarly, you cannot use Rc<T> for storing Me because Rc<T> will not help you in resolving the circular reference and will create a memory leak.

Solution

Here is the entire solution. Run it and experiment with it to understand how it works.

use std::{
    cell::RefCell, collections::BTreeSet, marker::PhantomPinned, pin::Pin,
    rc::Rc,
};
type RcCell<T> = Rc<RefCell<T>>;

#[derive(Debug)]
struct Holder {
    set_of_me: BTreeSet<*mut Me>,
}

impl Holder {
    fn new() -> RcCell<Self> {
        Rc::new(RefCell::new(Self {
            // this is initially empty
            set_of_me: Default::default(),
        }))
    }

    /// Mutate every value of `Me`.
    /// Note how a pinned value is reconstructed.
    fn mutate_value_of_me(&self, val: i32) {
        self.set_of_me.iter().for_each(|a| {
            let a = unsafe { Pin::new_unchecked(&mut **a) };
            a.mutate_me(val);
        })
    }
}

#[derive(Debug)]
struct Me {
    name: String,
    mutate_by_holder: i32,
    my_holder: RcCell<Holder>,
    _pinned: PhantomPinned,
}

impl Me {
    /// Accept a name
    pub fn new(
        holder: RcCell<Holder>,
        name: impl Into<String>,
    ) -> Pin<Box<Self>> {
        let mut this = Box::pin(Self {
            name: name.into(),
            mutate_by_holder: 0,
            my_holder: holder,
            _pinned: PhantomPinned,
        });

        let this_ptr: *mut _ = unsafe { this.as_mut().get_unchecked_mut() };
        this.my_holder.borrow_mut().set_of_me.insert(this_ptr);

        this
    }

    /// Allows you to mutate a value within me.
    /// Run this from `Holder` to see what happens.
    fn mutate_me(self: Pin<&mut Self>, val: i32) {
        let this = unsafe { self.get_unchecked_mut() };
        this.mutate_by_holder += val;
    }
}

impl Drop for Me {
    fn drop(&mut self) {
        println!("Dropping {:#?}", self);
        let this = &(self as *mut _);
        self.my_holder.borrow_mut().set_of_me.remove(this);
    }
}

/// A test function to play with `Holder`s.
fn make_ref_of_holder(holder: RcCell<Holder>) {
    let holder = Rc::clone(&holder);
    println!("Making a ref of {:?}", holder);
    println!("No. of refs = {}", Rc::strong_count(&holder));
}

pub fn main() {
    let holder = Holder::new();

    // be explicit about `Rc`'s cloning
    let a = Me::new(Rc::clone(&holder), "a");
    let b = Me::new(Rc::clone(&holder), "b");

    holder.borrow().mutate_value_of_me(455);
    make_ref_of_holder(Rc::clone(&holder));
}

If you look at the code carefully and the code that was shown before, you'd notice that we are suffering from one big issue: Management of duplicate objects.

Duplicate Objects

To see what the problem is, add the following line to the code given in the solution.

// ...
pub fn main() {
    let holder = Holder::new();

    // be explicit about `Rc`'s cloning
    let a = Me::new(Rc::clone(&holder), "a");
    let b = Me::new(Rc::clone(&holder), "b");
+   let c = Me::new(Rc::clone(&holder), "b");

    holder.borrow().mutate_value_of_me(455);
    make_ref_of_holder(Rc::clone(&holder));
}

We might think, "Oh since we are storing our Me objects in a set, c will not be added."

But notice that the definition of set_of_me is BTreeSet<*mut Me>. Which means, the set stores unique pointers to Me, not unique Mes itself! And that's why c gets stored in holder.

So what's the solution?

One way to solve this issue is to create a wrapper type over *mut Me that will allow dereferencing, and thus comparison of objects.

We will discuss about that in Part 2.

Conclusion

I hope this article helps you in understanding Rust's Self Referential Structs a bit better.

Questions? Comments? Concerns? Please put them down below and I'd be happy to help you.

Cover Image Source: Manjaro's /usr/share/backgrounds folder 😃

Image taken from Rust Language's Async Tutorial Book ↩
Taken from https://doc.rust-lang.org/std/pin/index.html ↩

DEV Community

Self Referential Structs in Rust (Part 1)

TL;DR

Introduction

Problem Description

Experimentation

Understanding the Error

Understanding Moves in Rust

Raw Pointers of Rust

Unfolding the Mystery of Raw Pointers

`Pin<T>` the objects

`PhantomPinned` to the Rescue

Solving the Original Problem

`Rc<T>` and `RefCell<T>`

Why `RefCell<T>`?

Putting the Pieces Together

Why not `Weak<T>` for storing `Me`?

Solution

Duplicate Objects

Conclusion

Top comments (0)

Read next

2558. Take Gifts From the Richest Pile

Closures in JS🤯

7 Essential Accessibility Testing Tools for Web Developers: Ensure Inclusive Design

Day 1: Getting Started with SQL - Basics | Beginners' Guide : Mastering

TL;DR

Introduction

Problem Description

Experimentation

Understanding the Error

Understanding Moves in Rust

Raw Pointers of Rust

Unfolding the Mystery of Raw Pointers

Pin<T> the objects

PhantomPinned to the Rescue

Solving the Original Problem

Rc<T> and RefCell<T>

Why RefCell<T>?

Putting the Pieces Together

Why not Weak<T> for storing Me?

Solution

Duplicate Objects

Conclusion

Read next

2558. Take Gifts From the Richest Pile

Closures in JS🤯

7 Essential Accessibility Testing Tools for Web Developers: Ensure Inclusive Design

Day 1: Getting Started with SQL - Basics | Beginners' Guide : Mastering

`Pin<T>` the objects

`PhantomPinned` to the Rescue

`Rc<T>` and `RefCell<T>`

Why `RefCell<T>`?

Why not `Weak<T>` for storing `Me`?