Federico Moretti

Posted on Sep 26 • Edited on Sep 30

The Rust Journey of a JavaScript Developer • Day 4 (5/5)

#rust #programming #beginners #learning

Finally, the last chapter of this journey about ownership in Rust. It’ll be a recap of what we learned, so nothing new here, but I’m still following the book: next time we’ll talk about structuring data, and it’ll be a lot shorter. Anyway, I’ve already seen other interesting chapters that will last longer…

What I really love about Rust is that their developers use to document everything. Do you want to understand how it deals with CLIs? Of course, there’s a book for it. And they not only explain Rust-related things, but also general programming concepts you may find learning other languages.

✌🏻 Ownership Recap

I didn’t study computer science at university. I studied communication and sociology. I took three o four exams (in Italy we use to have lots) on the subject, and I also studied it at high school, but I have a background in the humanities. That’s why spending time on these topics matters for me.

It’s a chance to better understand concepts like stack, heap, pointers, and so on: I’ve already taken three CS50 exams at Harvard, and I’ve been working in the field for over a decade, then they weren’t unfamiliar to me, but looking at them from Rust’s point of view helps me get to know them better.

I’m writing in English for the same reason: to improve my knowledge of it. This morning I attended a two-hours meeting in English, but that doesn’t mean I don’t still have lot to learn. OK, now let’s go back to Rust. The Rust Programming Language I strongly recommend you to read here offers a comparison.

Ownership vs. Garbage Collection

In all these years, I have never seen anyone worry much about this issue. You know why? We’re used to have virtually unlimited resources: memory allocation is a thing a JavaScript developer often forgets, so you’ll end having a browser that consumes most of your RAM, and maybe you don’t even understand why.

Of course, JavaScript is the worst example, since it doesn’t work like Python or C. Most of the time it doesn’t need to manually free memory up. But we still have best practices that helps us to use less resources, and maybe I’ll talk about them. Now let’s focus on Python, which was used for comparison.

The book proposes to create a Document class that stores an array of words and include two functions which will be used to either add new words or get the existing words list. We’ll see how Python handles garbage collection versus how Rust uses ownership, building an equivalent structure.

class Document:     
    def __init__(self, words: List[str]):
        self.words = words

    def add_word(self, word: str):
        self.words.append(word)

    def get_words(self) -> List[str]:  
        return self.words

Then, we can create a new document that includes a non-empty array, clone it, and add a second word to the cloned array. We’ll have a variable and two different instances of the Document class: a total of three arrays, two of which are identical. It doesn’t make sense to me, but let’s pretend it’s logically correct.

words = ["hello"]
d = Document(words)

d2 = Document(d.get_words())
d2.add_word("world")

These Python lines set a word variable pointing to an array containing the word hello, create a document d that also contains the same word in an array, clone it in a second document d2, and finally add the word world in this latest document’s array. This wouldn't have been possibile in Rust.

We have two kinds of problems here—in the Rust perspective, at least. First, having three pointers of the same array makes it difficult to predict when data will be garbage collected by reading the sources, because it can be only after all three variables has come out of scope.

Second, d now also contains the two words hello and world, since it’s a pointer to the array words that was changed by adding the second word to d2. This is something I haven’t initially figured out, but makes sense, because we changed the original array. What if we didn’t want to change d as well?

type Document = Vec<String>;

fn new_document(words: Vec<String>) -> Document {
    words
}

fn add_word(this: &mut Document, word: String) {
    this.push(word);
}

fn get_words(this: &Document) -> &[String] {
    this.as_slice()
}

Above, an example of what we can do with Rust. The book doesn’t make use of struct, since we didn’t talk about it already, so uses a type alias instead. Here we immediately know that the ownership of Document will be consumed once it goes out of scope in new_document.

The only way of changing the starting array is by creating a mutable reference of Document as a parameter of the function add_word, otherwise it would be immutable, but this changed reference doesn’t change the original array too like the Python version did.

Finally, get_words returns an immutable reference of the strings in Document. This is definitely not a 1:1 translation from Python, but we can understand that things work differently. We won’t have three pointers for the same array, because Rust doesn’t allow us to do so.

fn main() {
    let words = vec!["hello".to_string()];
    let d = new_document(words);

    let words_copy = get_words(&d).to_vec();
    let mut d2 = new_document(words_copy);
    add_word(&mut d2, "world".to_string());

    assert!(!get_words(&d).contains(&"world".into()));
}

We’re doing the same thing we did in Python, ish: we have words that is an array of one element hello, which is a string, again. Then, we have the result of calling new_document in a variable called d, but doing so makes words itself no more usable in a second call of new_document in Rust.

So, to have a d2 variable, we must create a copy of words in advance: new_document will consume once again the ownership of words_copy, but we can still call add_word on d2 to change its initial array. It’s easier to understand than you think, trust me, even if it doesn’t seem to be now.

Using assert is a sort of TDD feature: it just ensures that d doesn’t contain the word world after the execution, something that we didn’t have in Python. Of course, I’m not talking about assert, which we do have too, but the ability of changing d2 without affecting d.

The Concepts of Ownership

It’s time to quickly recap all the concepts of ownership in Rust, which is good, because I used the same index of the book, then I can link my previous articles using the same order. I don’t deny that I really need it, since over the weeks for sure I lost something.

Ownership at Runtime

We have seen that Rust allocates local variables in stack frames when a function is called and deallocates them at the end of the call: this is also interesting in JavaScript, since global variables should be avoided in order to better manage memory allocation.

Local variables can hold either data of different types which I talked about already or pointers. Pointers can be created through boxes, owning data on the heap, or references, which don’t own data: at this point I had to come back to the previous chapters to review some concepts.

fn main() {
    let mut a_num = 0;
    inner(&mut a_num);
}

fn inner(x: &mut i32) {
    let another_num = 1;
    let a_stack_ref = &another_num;

    let a_box = Box::new(2);  
    let a_box_stack_ref = &a_box;
    let a_box_heap_ref = &*a_box;

    *x += 5;
}

Here the book propose three questions. If you don’t want to read my answers before giving yours, stop here for a moment. The questions are:

Why does a_box_stack_ref point to the stack, while a_box_heap_ref point to the heap?
Why is the value 2 no longer on the heap at L2?
Why does a_num have the value 5 at L2?

L2 is a label given on the book, so I invite you to read the source to understand what it means. Spoiler: it means after the inner execution.

As I mentioned, I had to review what I studied to give my answers. Talking about the first question, a_box_stack_ref is a mutable reference, so it points to a pointer in the stack, while a_box_heap_ref is a dereferenced mutable reference, so it points to the value of the same pointer in the heap.

Then, 2, which is the value a_box_heap_ref points to, isn’t anymore in the heap at L2, because the inner function call ended, so Rust deallocated it. I already mentioned that L2 means after the inner execution called in main, indeed. So what about a_num?

Being a_num a mutable variable, and being 0 its original value, passing it to the inner function as a parameter changed its value to 5. Notice that the inner function doesn’t return this value: it has just increased a_num original value by adding 5 to 0.

Do you remember the slice type? This is pretty recent, since I covered it in the last article about my journey, so you should remember it better than the previous concepts. Anyway, it’s always a good idea to give it a review: I’m talking about an array of contiguous elements in a series.

fn main() {
    let s = String::from("abcdefg");
    let s_slice = &s[2..5];
}

This function results in having an s variable in the stack that hold a String pointer which own a value of abcdefg in the heap and a s_slice variable that is a slice of it, owning only a part of the original value in the heap, defined in a range from index 2 to index 5 or cde. Indexes start from 0 in Rust just like they do in JavaScript.

Ownership at Compile-Time

By default, Rust prevents runtime ownership issues and evaluates permissions at compile-time. It checks for them every time a variable must be used: if we don’t specify mut, for example, the write permission is missing and our variable cannot be mutated.

fn main() {
    let n = 0;
    n += 1; // this prevents the function from being compiled
}

Moreover, moving or borrowing a variables changes its permissions. If you move a variable of a non-copyable type, it looses its reading an owning permissions, then it can’t be used anymore. To move a variable means, for example, using it as a parameter of a function.

fn main() {
    let s = String::from("hello world");
    consume_a_string(s);
    println!("{s}");
}

fn consume_a_string(_s: String) {
    […]
}

Same for borrowing, that means creating a reference to a variable. An immutable reference of a mutable variable can be printed, but it can’t be mutated, and the referenced one gets its permissions back once it ends its lifecycle. So, the next example is 100% legit, because s can be printed after s_ref.

let mut s = String::from("Hello");
let s_ref = &s;
println!("{s_ref}");
println!("{s}");

On the other hand, since you can’t move an immutable reference, the following example will fail to compile. Here I don’t share the console output, but trust me if I say that rustc error messages are the best I’ve ever seen: they help a lot, especially if you read the book.

let mut s = String::from("Hello");
let s_ref = &s;
s_ref.push_str(" world");
println!("{s}");

It’s also possible to combine these conditions in different ways to obtain other types of errors, but rustc will always print useful warnings: I don’t know if you’ve ever tried to compile the examples I took from the book, but I did. Not because I was skeptical about the results, but I wanted to see what would happen.

Connecting Ownership Between Compile-Time and Runtime

Sometimes it’s just a matter of execution order. We talked about variables lifecycle: whenever a reference ends, the variable it refers to will get back its permissions. So, if you swap lines 3 and 4, the code below will compile with no errors nor warnings.

let mut v = vec![1, 2, 3];
let n = &v[0];
v.push(4);
println!("{n}");

Because you can’t use n after the move of v, but you can move v after printing n, because the reference already ended its lifecycle in this second case, while the referred variable hadn’t its permissions back in the first. That’s how you can avoid the undefined behavior.

Although it took me about five weeks to complete, this is only the beginning of ownership in Rust. It will be revisited with the introduction of new elements of this fantastic language: it allowed me to review concepts I had studied in the past, and we are still in the early chapters.

^{The Rust Programming Language is doubled-licensed Apache 2.0 and MIT. Only some chapters’ names and examples are taken from it as-is. Please, let me know if this violates some terms.}