Bruno Oliveira

Posted on Dec 27, 2019

Learning Rust - Understanding vectors

#rust

Introduction

Mutating data structures is a core activity of any non-trivial program, and understanding how things work in a new language in what concerns built-in data structures can be complicated as each language introduces its own mechanisms and quirks.
We'll focus on Rust vectors and see what we can extract from it that can be generalized or applied to other data structures.

Introducing Rust vectors - Vec

A vector in Rust is defined as:

A contiguous growable array type, written Vec<T> but pronounced 'vector'

A vector like the one just defined above is one of the main data structures of any language, as it's versatile, flexible, supports iteration and many more useful things.

In Rust, there are several ways to initialize a vector.

Vector initialization

1. Using new()

In order to initialize a vector via the new() method call, we use the double colon operator:

let mut vec = Vec::new();

This call constructs a new, empty Vec<T>.

The vector will not allocate until elements are pushed onto it.

In order to push elements to the vector, we can call the push method on the newly created instance:

vec.push(1)

will add the element 1 of type i32 to the vec, and allocate enough space for it in the process.

2. Using the vec! macro

As initializing an empty vector and simply pushing elements into it can be tedious and even error-prone, Rust provides a macro to make initialization from a known, small set of values much more convenient:

let v = vec![1, 2, 3];

this initializes a vector with 3 integer elements.

There is also an alternative syntax for the vec! macro that initializes a vector with a specific value, and a given capacity as well:

let vec = vec![0; 5];

when the elements are separated with ;, the second element will indicate the capacity, and the first will be the element to be added to the vector.

This may be more efficient than performing allocation and initialization in separate steps, especially when initializing a vector of zeros (a common operation in the science and engineering domains).

3. With a specified capacity

let mut vec = Vec::with_capacity(10);

Here, we are simply stating that the vector will have a capacity to hold 10 elements before reallocating memory. See below for a difference between capacity and length.

4. Reading data from an external source into a vector

Another way to initialize a vector, is through an external data source. For example, we might want to read the contents of a file into a vector.

A basic way to do it, is as follows:

let contents = fs::read_to_string(filename)
        .expect("Something went wrong reading the file");
     let vec: Vec<&str> = contents.split(",").collect();
    println!("{:?}",vec);

In this example, let us assume that the file has the following contents:

1,2,3,4

When we execute our program with the above input, we get the following output:

["1", "2", "3", "4\n"]

So essentially, we can read the contents of a file and create a vector out of it, via the usage of collect().

Vector indexing and mutability

In order to access the contents of a vector, after it has been initialized and created, I will discuss two basic ways Rust offers us.

1. Using index notation

Let's consider the following example:

let my_vector = vec![1,2,3,4,5]

Using this vector, let's perform some basic operations on it:

Vector size and capacity:

Using my_vector.len() will return an integer containing the number of elements in the vector.

Together with length, we have also the vector's capacity, which tells us the number of elements the vector can hold without reallocating.

let mut my_vector = Vec::with_capacity(10);
println!("Capacity is {:?}", my_vector.capacity()); //(1)
println!("Size is {:?}", my_vector.len());  //(2)
my_vector.push(1); //(3)
println!("Capacity is {:?}", my_vector.capacity()); //(4)
println!("Size is {:?}", my_vector.len()); //(5)

In lines (1) and (2), the vector is simply queried for some properties after being created with a specified capacity via the ::with_capacity() method call.
Thus, line 1 will print Capacity is 10 and line 2 will print Size is 0.

This happens because the call to the ::with_capacity() method won't actually allocate anything just yet, it will simply indicate how many elements we can add to the vector before it will potentially need to reallocate more memory. Note that this is not a hard upper bound on the number of elements it can contain, it's more of a limit on the initial size of reserved memory for the vector, which can make some operations more efficient (as opposed to having to perform multiple reallocations due to an unknown size).

On line 3, we add an element to the vector via push, so the vector now has one element, while its capacity is still the same.

So, lines 4 and 5, will respectively print: Capacity is 10 and Size is 1

To access a specific element using index notation, we can simply do:

 let mut vector = vec![1,2,3,42,5];
 println!("The fourth element is {:?}",my_vector[3]);

As in many other languages, indexing starts at 0.

To reassign an element using index notation, it's as simple as doing:

my_vector[3] = 0;

Pretty straightforward.

** - Indexing a vector with pointer arithmetic**

Rust is very powerful, and there are lots of high level abstractions at our disposal, but, one of its greatest strengths is the fact that it can also go as low-level as needed.
If you are familiar with C language, you will know that pointers and vectors have an intimate relationship in the sense that holding a pointer to the first element of the vector, allows one to traverse and work with the whole vector.

This is made possible because elements of a vector are stored contiguously in memory, and, we can advance a pointer to a certain memory address by exactly, X bits, and we can get to the memory address of the next element.
Here X is dependent on the data type stored in the vector.

Let's access the fourth element of a vector using pointer arithmetic:

 let mut vector = vec![1,2,3,42,5];
     let pointer = vector.as_ptr();
unsafe {
    println!("The fourth element is {:?}",*pointer.add(3)); 
}

The declaration of the pointer variable, pointer, creates an immutable pointer, which means, that we can have read accesses into our vector with it, as we are, in fact, doing in the line we are printing, but, we cannot have write accesses.

This means, that, if we try to forcibly set the fourth element to be 4 instead of 42 via pointer arithmetic, we might be tempted to do something like:

*pointer.add(3) = 4;

and the compiler will emit an error:

error[E0594]: cannot assign to data in a `*const` pointer
  --> aoc2.rs:15:2
   |
15 |     *pointer.add(3) = 4;
   |     ^^^^^^^^^^^^^^^^^^^^ cannot assign

error: aborting due to previous error

This is quite interesting:

Despite the fact that we have a mutable vector, we obtain an immutable pointer to its contents, which effectively prevents us from changing the vector, since the "pointer view" of the vector is, in itself immutable.

The converse is quite useful as well, meaning we can obtain mutable references to immutable data structures, which provides a high degree of control on how data in our code is managed and manipulated.

If we want to perform an assignment via pointer to a specific element in our vector, we need to obtain a mutable pointer to it, which we can do with: let pointer = vector.as_mutable_ptr();

Unsafe operations

To wrap up the article, there's still a small word to say about Rust's opinionated choice of forcing the programmer down the safe road.

When we attempted to access a specific element using pointer arithmetic, we had to wrap the access in an unsafe { ... } block.

This is because Rust aims to be a safe language, but, one that gets out of the way when you as a programmer, chooses to go down the unsafe road, although it forces you to be very explicit about it.

Pointer arithmetic is inherently dangerous and it's a lot easier to cause memory corruption or to perform invalid accesses when using pointers than when using the more standard index notation. In other words, pointer arithmetic is error-prone and thus unsafe, and Rust ensures that anybody that's both writing and reading the code is aware of that.
This is a great way to enforce programmers to write safer code.

Conclusion

We learned the basics about vectors in Rust, saw how to perform initialization in various ways, reassignment, mutable and immutable references and we learnt about how Rust is very opinionated about safety.

Stay tuned for more!

Top comments (2)

4ndy • Mar 31 '23

With newer version of compiler you couldn't compile this:

let mut my_vector = Vec::with_capacity(10);

without annotation which datatype would Vec hold. Which makes sense. How can you pre-allocate memory with unknown element size. Nice snack size article though.