By Austin Aigbe
Introduction
The Rust programming language is said to be memory-safe but how does it achieve this? Why should we even care about memory safety? It turns out that the major cause of security vulnerabilities in modern day software systems (including Desktop and mobile applications) is due to memory safety issues and to address this, there is a general consensus, which I personally agree to, that a memory-safe systems programming language like Rust
is required. This is one of the reasons big tech companies have decided to invest in Rust and re-engineer some of their key products and services with it.
How does Rust
Achieve Memory Safety?
Memory safety is achieved through three key concepts: ownership
(a language feature the compiler uses to free and allocate memory based on the scope of a variable binding), borrow checking
and lifetimes
. All these analyses are done during compile-time.
For simplicity, I will focus more on the ownership
concept and show us how the compiler guarantees memory safety at compile-time using the scope of a variable binding to determine when to allocate and deallocate memory on the stack.
Before we proceed, let's review how our code is analyzed by the Rust compiler (rustc
) for memory safety.
Code Compilation With Memory Management
In simple terms, a memory
is a storage space (e.g RAM
) on your computer where instructions to be executed by the computer's CPU are stored. These instructions are the lines of Rust code you have written and compiled with rustc
(the Rust compiler) or cargo
into a machine executable file (e.g .exe
file format on Windows and elf
on Linux). The executable file tells the Operating System how to load your Rust
program into memory
for execution by the CPU.
Now, let's compile a very simple Rust
program with rustc
and examine how memory management (ownership
) works in Rust.
// hello.rs
fn main() {
let rust_edition = 2021;
let message = "Hello, Rust";
println!("{message} {rust_edition}!");
}
Compiling the above code with rustc .\hello.rs
on Windows will produce an executable file,hello.exe
(about 148KB in size), in the same directory as the .rs file.
Next, let's see how the hello.exe
was produced by rustc
and how the memory-safety checks were done.
The Rust compiler, rustc
, performed several analyses on the hello.rs
file before it produced the hello.exe
executable file. I will give a high-level overview of this process so we understand how memory safety is achieved by the compiler and at what phase of the compilation process it is done.
Figure 1: An simplified view of rustc
compilation process
Phase 1: hello.rs
was translated to basic tokens using the rustc_lexer
crate and then to Rust tokens
using the rustc_parser
crate. Tokens are easier for the compiler to work with than the text format of your .rs file.
Phase 2:Tokens
were translated to AST
(Abstract Syntax Tree) format. Syntax analysis is done here. Use cargo inspect --unpretty=ast-tree .\hello.rs
to examine the AST output.
Figure 2: AST tree of hello.rs
with cargo-inspect
Phase 3 The println!
macro was refined (or desugared) to a std::io::_print
statement and core::fmt::Arguments
function calls.The data type of our expressions were inferred and checked. We can say that type safety is guarateed at this phase of the code compilation. The HIR (High-Level Intermediate Representation) is the output of this stage. You can use crago inspect --unpretty=hir .\hello.rs
to view the HIR representation of your Rust code.
Figure 3:println!
macro desugared in the HIR.
Phase 4 HIR was converted to MIR (Mid-level Intermediate Representation). Ownership, borrow-checking and optimizations are done here. In fact, the MIR shows the scope of each variable and helps the compiler know at what point a variable binding (or ownership) is out of scope and when to dellocate memory from the stack. How the compiler tracks the scope of each variable is indicated in the screenshot below. In fact, the assembly code generated by LLVM (in phase 6) is the machine representation of the MIR (after the ownership, borrow checking and optimizations have been done). In the next phase, the asm
.S file is examined to see how the variables are allocated on the Stack
when they are in scope and how they are deallocated when they are out of scope. In this phase, memory safety is guaranteed by the compiler.
Figure 4: MIR representation of hello.rs
Phase 5 and 6 This is the code generation phase - LLVM was used to generate the final executable file hello.exe
from the optimized and memory-safe MIR representation. Further optimizations can also be done by LLVM.
Let's briefly examine the assembly code (.S file) generated by LLVM. You can use rustc --emit asm .\hello.rs
to generate the file. To keep things simple, I will only examine how the ownership
memory-safety feature was achieved by the allocation and deallocation of memory on the stack.
Figure 5: A simplified memory layout of the stack from the perspective of the compiler generated main
function.
The assembly code for the main
function is shown above (in Figure 5). The entry point is not our hello::main
function. As we will see later, Rust has a runtime (std::rt::lang_start_internal
). This runtime handles a lot of complexities for us that we don't need to bother about when writing our Rust code.
- Line 516: The compiler generated main function is the entry point for our program.
Line 518: 40 bytes of memory is allocated on the stack.
rsp
is the 64-bit stack pointer register for x86_64. It always points to the top the stack.Line 523:
_ZN5hello4main17h0767239aa2b5c6caE
is the mangled symbol for ourhello.rs
main()
function. The address of ourhello::main
function is stored in %rcx and then passed to the runtime function as a reference in line 103.Line 524: This line calls the Rust runtime function
std::rt::lang_start
(defined in line 94) and subsequently, the internal runtime functionstd::rt::lang_start_internal
(defined in line 104). Rust has a runtime that executes ourhello.rs main()
function.
Figure 6: Rust runtime function executes our hello::main function
Figure 7: Execution of hello::main
(part 1)
Figure 8: Execution of hello::main
(part 2)
Figures 7 and 8 show how sufficient memory (200 bytes) was first allocated on the stack by the compiler before allocating memory to the two local variables rust_edition
(an i32
), and message
(a &str
). Before returning from the hello::main
function, the compiler deallocates memory on the stack and frees up resources.
Conclusion
We have seen how Rust guarantees memory safety during compile-time by examining the compilation phases and how memory is allocated and deallocated on the stack in assembly code. A very simple Rust program (hello.rs
) was used to examine the output of each compilation phase and the memory management of the stack in assembly code.
We also learnt that the Rust compiler generates a main
function for us as the entry point of our program. This main function uses the Rust runtime to execute the main function of our Rust program.
In summary, Rust tries to guaranty memory safety during compile time and it does a pretty good job in ensuring this.
Top comments (0)