DEV Community

Kostiantyn Shyrolapov
Kostiantyn Shyrolapov

Posted on • Edited on

Implementing Vec in Rust

In Rust, Vec<T> or vector type is a dynamically-sized, growable array that allows you to store and manipulate a collection of elements of the same type. In this article, we will explore how to implement a simplified version of Vec<T>.

Array

To understand how vector works and start implementing it, we need to understand what an array is, and how it is represented in memory.

Array is a linear data structure that stores a collection of similar data types at contiguous locations in a computer's memory.

Since array is contiguous (no space between elements it holds), we can move between its elements using index, if we know the size of its type.
Array of u8

In the example above, we have an array of 8-bit unsigned integers. ptr is a pointer that points to start address of the array. By adding size of array type multiplied by index to ptr, we get address that we can dereference to read or write to value of arr[index].

Important properties of array:

  • Pointer: reading/writing to memory, keeps track of where array starts.
  • Capacity: amount of items that array has allocated memory for (ready-to-use memory).
  • Length: amount of items that array contains (used memory), can never be greater than capacity.

capacity > length
In this picture, capacity is 8, but length is 5.
Note that capacity in arrays is fixed and is determined at compile time, which may not suit your needs when you need to grow collection at runtime... and this is where vector comes in!

Vector

Vector grows its capacity by reallocating itself when it's full. There is a grow_amortized function in RawVec, which describes how vector grows - it doubles every time length reaches capacity (after capacity of 4).

We will implement push and get operations, as well as Drop trait, to be able to see results without memory leaks. With that covered, let's start coding!

Constructor

use std::ptr::NonNull;

pub struct MyVec<T> {
    ptr: NonNull<T>,
    len: usize,
    capacity: usize,
}

impl<T> MyVec<T> {
    pub fn new() -> Self {
        Self {
            ptr: NonNull::dangling(),
            len: 0,
            capacity: 0,
        }
    }

    pub fn capacity(&self) -> usize {
        self.capacity
    }

    pub fn len(&self) -> usize {
        self.len
    }
}
Enter fullscreen mode Exit fullscreen mode

Here we create MyVec::new() that initializes our vector with dangling pointer and 0 for length and capacity, and getters for these two.

Now let's implement push method!

Push

Vec::new().push(1) will add 1 to the end of the vector, we know that. push is also what determines if vector needs to grow when user decides to add element to it.

When capacity is 0 and push is called, let's grow it to 4 and write the element in memory.

Implementation

What I like to do first is to define code paths, and todo macro is perfect for that.

First thing that we do that is essential before getting into manual memory management - we need to check if size of type that is being pushed is not 0 (i.e empty tuple). In Rust, compiler optimizes zero-sized types, so Vec<()> never allocates memory. In our case, we will simply panic on assert.

use std::mem::size_of;

// ...
impl<T> MyVec<T> {
    // ...
    pub fn push(&mut self, item: T) {
        assert_ne!(size_of::<T>(), 0, "No zero-sized types!");

        if self.capacity == 0 {
            todo!();
        } else if self.len < self.capacity {
            todo!();
        } else {
            todo!();
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Case: capacity is 0

Welcome to unsafe Rust!

To allocate memory ourselves we will need to use unsafe keyword, which is a sign to Rust compiler that between those brackets we don't want it to enforce its memory safety guarantees. It doesn't mean it's not safe... if we are careful enough.

use std::alloc::{alloc, Layout};
Enter fullscreen mode Exit fullscreen mode

We'll use Rust's allocator, which needs Layout as its argument. Layout consists of size and alignment, where size is the total number of bytes to allocate and alignment is the number of bytes between start of each element.

Let's say we have an array of u32 - this means that every 32 bits or 4 bytes computer encounters new value - alingment of 4 bytes. For u8 alignment would be 8 bits or 1 byte.

In our case, we can take a shortcut and use Layout::array, which will give us exactly what we need.

let new_capacity = 4;
let layout = Layout::array::<T>(new_capacity);
Enter fullscreen mode Exit fullscreen mode

Now we can get our pointer as mutable so we can write to it. Then we try to convert it to NonNull, and if it's null - panic.

let ptr = unsafe { alloc(layout) } as *mut T;
let ptr = NonNull::new(ptr)
    .expect("Pointer is null!");
Enter fullscreen mode Exit fullscreen mode

At this point we have allocated space for 4 elements and first element is ready to be written to memory. Now let's connect pieces above, write to memory and update fields for MyVec.

if self.capacity == 0 {
    let new_capacity = 4;
    let layout = Layout::array::<T>(new_capacity)
        .expect("LayoutError");

    let ptr = unsafe { alloc(layout) } as *mut T;
    let ptr = NonNull::new(ptr)
        .expect("Pointer is null!");

    unsafe { 
        ptr.as_ptr().write(item); 
    }

    self.ptr = ptr;
    self.capacity = 4;
    self.len = 1;
}
Enter fullscreen mode Exit fullscreen mode

Case: length < capacity

Let's say we pushed an item to vector, now we have a length of 1 and capacity of 4, pointer is pointing at the start of array, everything is great.
In this case, all we need to do is convert NonNull pointer to raw pointer, add length (distance from start of array) and write to that location. Note that add takes into account size_of::<T>() already, so we can just pass an amount of elements to it. Then we update length.

// ...
else if self.len < self.capacity {
    unsafe {
        self.ptr.as_ptr().add(self.len).write(item);
    }

    self.len += 1;
}
// ...
Enter fullscreen mode Exit fullscreen mode

Case: vector needs to grow

This should be familiar from initial allocation block, except we are reallocating memory and trying to follow safety guidelines. realloc takes an old pointer and new size of memory to be allocated. After that we write item to the end of vector.

else {
    let new_capacity = self.capacity.checked_mul(2)
        .expect("Capacity overflow");
    let new_size = size_of::<T>() * new_capacity;
    let layout = Layout::array::<T>(new_capacity)
        .expect("LayoutError");

    let ptr = unsafe { realloc(self.ptr.as_ptr() as *mut u8, layout, new_size) } as *mut T;
    let ptr = NonNull::new(ptr)
        .expect("Pointer is null!");

    unsafe {
        ptr.as_ptr().add(self.len).write(item);
    }

    self.ptr = ptr;
    self.capacity = new_capacity;
    self.len += 1;
}
Enter fullscreen mode Exit fullscreen mode

Get

Typically, you access array values with [index], and when there's no value at this index - it results in error. It's not like we run into this a lot, because we've probably learnt from our mistakes and now we put little guard checks here and there. But still there are cases like expecting an argument from command line, where you want to specifically handle absence of each argument in place. This is were get shines I think. get returns and Option(Some(&value) or None) that you can match, unwrap_or, etc.

In this block we take index, if its out of bounds - return None, else return Some with newly created immutable reference from dereferenced pointer. Basically we're making our mutable reference immutable.

impl<T> MyVec<T> {
    //...
    pub fn get(&self, index: usize) -> Option<&T> {
        if index >= self.len {
            return None;
        }

        Some(unsafe { &*self.ptr.as_ptr().add(index) })
    }
}
Enter fullscreen mode Exit fullscreen mode

Drop

We're almost there! But before finishing our basic implementation of Vec, let's not forget about freeing memory we've allocated. Allocated memory that is never deallocated leads to what is known as a memory leak, which may lead to system running out of available memory, reducing performance, responsiveness.

To prevent this, we can use Drop trait that will execute when value goes out of scope / its lifetime ends. In drop function we can manually describe its behavior:

  1. Properly drop a block of memory (slice) where MyVec elements are stored.
  2. Deallocate all reserved memory (from capacity).
use std::ptr::drop_in_place;
use std::slice::from_raw_parts_mut;

//...
impl<T> Drop for MyVec<T> {
    fn drop(&mut self) {
        unsafe {
            drop_in_place(from_raw_parts_mut(self.ptr.as_ptr(), self.len));
            let layout = Layout::array::<T>(self.capacity)
                .expect("LayoutError");
            dealloc(self.ptr.as_ptr() as *mut u8, layout)
        }
    }
}
Enter fullscreen mode Exit fullscreen mode

Finale

Try running this block. If it prints nothing, we won!

fn main() {
    let mut vec: MyVec<usize> = MyVec::new();
    vec.push(5);
    vec.push(8);
    vec.push(2);
    vec.push(1);
    vec.push(9);

    assert_eq!(vec.len(), 5);
    assert_eq!(vec.capacity(), 8);
    assert_eq!(vec.get(2), Some(&2));
    assert_eq!(vec.get(4), Some(&9));
    assert_eq!(vec.get(5), None);
}
Enter fullscreen mode Exit fullscreen mode

...You won!


Thank you for reading this, I hope it was of value, even if just a little bit. Please tell me in any case, I would love to hear your feedback.

Good luck and happy coding!

Top comments (1)

Collapse
 
xbinj profile image
xBINj

Good job.