In Rust, Vec<T>
or vector type is a dynamically-sized, growable array that allows you to store and manipulate a collection of elements of the same type. In this article, we will explore how to implement a simplified version of Vec<T>
.
Array
To understand how vector works and start implementing it, we need to understand what an array is, and how it is represented in memory.
Array is a linear data structure that stores a collection of similar data types at contiguous locations in a computer's memory.
Since array is contiguous (no space between elements it holds), we can move between its elements using index, if we know the size of its type.
In the example above, we have an array of 8-bit unsigned integers. ptr
is a pointer that points to start address of the array. By adding size of array type multiplied by index to ptr
, we get address that we can dereference to read or write to value of arr[index]
.
Important properties of array:
- Pointer: reading/writing to memory, keeps track of where array starts.
- Capacity: amount of items that array has allocated memory for (ready-to-use memory).
- Length: amount of items that array contains (used memory), can never be greater than capacity.
In this picture, capacity is 8, but length is 5.
Note that capacity in arrays is fixed and is determined at compile time, which may not suit your needs when you need to grow collection at runtime... and this is where vector comes in!
Vector
Vector grows its capacity by reallocating itself when it's full. There is a grow_amortized
function in RawVec, which describes how vector grows - it doubles every time length reaches capacity (after capacity of 4).
We will implement push
and get
operations, as well as Drop
trait, to be able to see results without memory leaks. With that covered, let's start coding!
Constructor
use std::ptr::NonNull;
pub struct MyVec<T> {
ptr: NonNull<T>,
len: usize,
capacity: usize,
}
impl<T> MyVec<T> {
pub fn new() -> Self {
Self {
ptr: NonNull::dangling(),
len: 0,
capacity: 0,
}
}
pub fn capacity(&self) -> usize {
self.capacity
}
pub fn len(&self) -> usize {
self.len
}
}
Here we create MyVec::new()
that initializes our vector with dangling pointer and 0 for length and capacity, and getters for these two.
Now let's implement push
method!
Push
Vec::new().push(1)
will add 1 to the end of the vector, we know that. push
is also what determines if vector needs to grow when user decides to add element to it.
When capacity is 0 and push
is called, let's grow it to 4 and write the element in memory.
Implementation
What I like to do first is to define code paths, and todo
macro is perfect for that.
First thing that we do that is essential before getting into manual memory management - we need to check if size of type that is being pushed is not 0 (i.e empty tuple). In Rust, compiler optimizes zero-sized types, so Vec<()>
never allocates memory. In our case, we will simply panic on assert
.
use std::mem::size_of;
// ...
impl<T> MyVec<T> {
// ...
pub fn push(&mut self, item: T) {
assert_ne!(size_of::<T>(), 0, "No zero-sized types!");
if self.capacity == 0 {
todo!();
} else if self.len < self.capacity {
todo!();
} else {
todo!();
}
}
}
Case: capacity is 0
Welcome to unsafe Rust!
To allocate memory ourselves we will need to use unsafe
keyword, which is a sign to Rust compiler that between those brackets we don't want it to enforce its memory safety guarantees. It doesn't mean it's not safe... if we are careful enough.
use std::alloc::{alloc, Layout};
We'll use Rust's allocator, which needs Layout
as its argument. Layout
consists of size and alignment, where size is the total number of bytes to allocate and alignment is the number of bytes between start of each element.
Let's say we have an array of u32
- this means that every 32 bits or 4 bytes computer encounters new value - alingment of 4 bytes. For u8
alignment would be 8 bits or 1 byte.
In our case, we can take a shortcut and use Layout::array
, which will give us exactly what we need.
let new_capacity = 4;
let layout = Layout::array::<T>(new_capacity);
Now we can get our pointer as mutable so we can write to it. Then we try to convert it to NonNull
, and if it's null - panic.
let ptr = unsafe { alloc(layout) } as *mut T;
let ptr = NonNull::new(ptr)
.expect("Pointer is null!");
At this point we have allocated space for 4 elements and first element is ready to be written to memory. Now let's connect pieces above, write to memory and update fields for MyVec
.
if self.capacity == 0 {
let new_capacity = 4;
let layout = Layout::array::<T>(new_capacity)
.expect("LayoutError");
let ptr = unsafe { alloc(layout) } as *mut T;
let ptr = NonNull::new(ptr)
.expect("Pointer is null!");
unsafe {
ptr.as_ptr().write(item);
}
self.ptr = ptr;
self.capacity = 4;
self.len = 1;
}
Case: length < capacity
Let's say we pushed an item to vector, now we have a length of 1 and capacity of 4, pointer is pointing at the start of array, everything is great.
In this case, all we need to do is convert NonNull
pointer to raw pointer, add length (distance from start of array) and write to that location. Note that add
takes into account size_of::<T>()
already, so we can just pass an amount of elements to it. Then we update length.
// ...
else if self.len < self.capacity {
unsafe {
self.ptr.as_ptr().add(self.len).write(item);
}
self.len += 1;
}
// ...
Case: vector needs to grow
This should be familiar from initial allocation block, except we are reallocating memory and trying to follow safety guidelines. realloc
takes an old pointer and new size of memory to be allocated. After that we write item to the end of vector.
else {
let new_capacity = self.capacity.checked_mul(2)
.expect("Capacity overflow");
let new_size = size_of::<T>() * new_capacity;
let layout = Layout::array::<T>(new_capacity)
.expect("LayoutError");
let ptr = unsafe { realloc(self.ptr.as_ptr() as *mut u8, layout, new_size) } as *mut T;
let ptr = NonNull::new(ptr)
.expect("Pointer is null!");
unsafe {
ptr.as_ptr().add(self.len).write(item);
}
self.ptr = ptr;
self.capacity = new_capacity;
self.len += 1;
}
Get
Typically, you access array values with [index]
, and when there's no value at this index - it results in error. It's not like we run into this a lot, because we've probably learnt from our mistakes and now we put little guard checks here and there. But still there are cases like expecting an argument from command line, where you want to specifically handle absence of each argument in place. This is were get
shines I think. get
returns and Option
(Some(&value)
or None
) that you can match
, unwrap_or
, etc.
In this block we take index, if its out of bounds - return None
, else return Some
with newly created immutable reference from dereferenced pointer. Basically we're making our mutable reference immutable.
impl<T> MyVec<T> {
//...
pub fn get(&self, index: usize) -> Option<&T> {
if index >= self.len {
return None;
}
Some(unsafe { &*self.ptr.as_ptr().add(index) })
}
}
Drop
We're almost there! But before finishing our basic implementation of Vec
, let's not forget about freeing memory we've allocated. Allocated memory that is never deallocated leads to what is known as a memory leak, which may lead to system running out of available memory, reducing performance, responsiveness.
To prevent this, we can use Drop
trait that will execute when value goes out of scope / its lifetime ends. In drop
function we can manually describe its behavior:
- Properly drop a block of memory (slice) where
MyVec
elements are stored. - Deallocate all reserved memory (from capacity).
use std::ptr::drop_in_place;
use std::slice::from_raw_parts_mut;
//...
impl<T> Drop for MyVec<T> {
fn drop(&mut self) {
unsafe {
drop_in_place(from_raw_parts_mut(self.ptr.as_ptr(), self.len));
let layout = Layout::array::<T>(self.capacity)
.expect("LayoutError");
dealloc(self.ptr.as_ptr() as *mut u8, layout)
}
}
}
Finale
Try running this block. If it prints nothing, we won!
fn main() {
let mut vec: MyVec<usize> = MyVec::new();
vec.push(5);
vec.push(8);
vec.push(2);
vec.push(1);
vec.push(9);
assert_eq!(vec.len(), 5);
assert_eq!(vec.capacity(), 8);
assert_eq!(vec.get(2), Some(&2));
assert_eq!(vec.get(4), Some(&9));
assert_eq!(vec.get(5), None);
}
...You won!
Thank you for reading this, I hope it was of value, even if just a little bit. Please tell me in any case, I would love to hear your feedback.
Good luck and happy coding!
Top comments (1)
Good job.