SomeB1oody

Posted on May 1

[Rust Guide] 10.7. Input and Output Lifetimes and the 3 Rules

#rust #programming #learning

If you find this helpful, please like, bookmark, and follow. To keep learning along, follow this series.

10.7.1 A Deeper Understanding of Lifetimes

1. The Way Lifetime Parameters Are Specified Depends on What the Function Does

Take the code from the previous article as an example:

fn longest<'a>(x: &'a str, y: &'a str) -> &'a str {  
    if x.len() > y.len() {  
        x  
    } else {  
        y  
    }  
}

The reason this function signature is written this way is that it is not known whether the return value will be x or y. If I modify the code so that the return value is fixed as x, then there is no need to give y an explicit lifetime:

fn longest<'a>(x: &'a str, y: &str) -> &'a str {  
    x
}

So this function signature does not constrain y’s lifetime.

2. When a Function Returns a Reference, the Lifetime Parameter of the Return Type Must Match One of the Input Lifetimes

If the returned reference does not point to any parameter, the returned content becomes a dangling reference, because a value created inside the function leaves scope when the function ends, and the returned reference points to memory that has been freed.

Take this example:

fn longest<'a>(x: &'a str, y: &str) -> &'a str {  
    let result = String::from("Something");
    result.as_str()
}

In this function, a String value named result is created, and then the as_str method is called on result to return a string slice (&str), which is really just a reference. That then causes an error:

error[E0515]: cannot return value referencing local variable `result`
  --> src/main.rs:13:5
   |
13 |     result.as_str()
   |     ------^^^^^^^^^
   |     |
   |     returns a value referencing data owned by the current function
   |     `result` is borrowed here

The error message says that a value referencing the local variable result cannot be returned, because the returned value is data owned by the function itself. This is the same reason mentioned just now: once the internal data goes out of scope, it is cleaned up.

What if I want to return a value created inside the function? Then I do not return a reference; I return the value directly:

fn longest(x: &str, y: &str) -> String {  
    let result = String::from("Something");
    result
}

This is equivalent to transferring ownership of the function’s value to the caller, and the caller is responsible for cleaning up that memory. This version also does not need an explicit lifetime, because the return value has nothing to do with the parameters, and only references have lifetime problems.

From this example, you can see that lifetime syntax is fundamentally used to relate the lifetimes of a function’s different parameters and return values. Once those relationships are established, Rust has enough information to support operations that preserve memory safety and to reject operations that could lead to dangling pointers or other violations of memory safety.

10.7.2 Lifetime Annotations in Structs

In earlier articles, we only defined self-owned types in structs, such as i32 and String. In fact, struct fields can also be reference types, and if they are references, you need to add lifetime annotations to each reference.

Take this example:

struct ImportantExcerpt<'a> {
    part: &'a str,
}

fn main() {
    let novel = String::from("Call me Ishmael. Some years ago...");
    let first_sentence = novel.split('.').next().unwrap();
    let i = ImportantExcerpt {
        part: first_sentence,
    };
}

ImportantExcerpt has only one field, part, and its type is a string slice, which is a reference type. Because it is a reference type, a lifetime annotation is required.

The way to annotate a lifetime is the same as with generics: add <> after the struct name and write the lifetime generic parameter inside it. Here that is 'a. The part reference must live longer than the struct instance itself. As long as the instance exists, the part reference must also exist; if part disappears first, the instance will definitely be invalid.

Look at main: it first creates a String named novel, then uses split and next to extract the first sentence from the string (unwrap is used to unwrap the Option type, which was introduced in 9.2. Result Enum and Recoverable Errors Pt.1). The type of this sentence is &str, which is a reference. Then it creates an instance i of ImportantExcerpt and uses that reference as the value of the part field.

This is valid because the scope of first_sentence is from line 7 to line 11, while the scope of i is from line 8 to line 11. So the part field lives longer than the instance and fully covers i’s lifetime.

10.7.3 Lifetime Elision

Every reference has a lifetime, and functions or structs that use lifetimes need lifetime parameters.

Then why does this code, taken from 4.5. Slice, compile without any lifetime annotations?

fn main() {
    let s = String::from("Hello world");
    let word = first_word(&s);
    println!("{}", word);
}
fn first_word(s:&str) -> &str {
    let bytes = s.as_bytes();
    for (i, &item) in bytes.iter().enumerate() {
        if item == b' ' {
            return &s[..i];
        } 
    }
    &s[..]
}

The reason this function compiles without lifetime annotations has historical roots: in early versions of Rust (before 1.0), this code would not compile, because every reference was required to have an explicit lifetime. The function signature would have had to look like this:

fn first_word<'a>(s: &'a str) -> &'a str {

Later, the Rust team found that in certain situations Rust programmers kept writing the same lifetime annotations over and over, and those situations were predictable. They had clear patterns, so the Rust team encoded those patterns directly into the compiler, allowing the borrow checker to infer lifetimes automatically in those cases without explicit annotations from the programmer.

The significance of knowing this history is that more deterministic patterns may be discovered in the future and added to the compiler. In the future, there may be fewer lifetime annotations to write. Thank goodness.

The patterns built into Rust’s reference analysis are called the lifetime elision rules. Programmers do not need to follow them manually; they are special cases handled by the compiler. If your code matches these cases, explicit lifetime annotations are unnecessary.

However, lifetime elision does not provide complete inference. If a reference is still ambiguous after the rule is applied, a compilation error will still occur. The solution is to add lifetimes manually to show the relationships between references.

10.7.4 Input and Output Lifetimes

If a lifetime appears in a function or method parameter, it is called an input lifetime.

If it appears in a function or method return value, it is called an output lifetime.

10.7.5 The Three Rules of Lifetime Elision

The compiler uses three rules to determine lifetimes when they are not explicitly annotated:

Rule 1 is used for input lifetimes
Rules 2 and 3 are used for output lifetimes
If the compiler still cannot determine the lifetime after applying all three rules, it reports an error
These three rules apply not only to function or method definitions, but also to impl blocks

Rule 1: Each reference parameter gets its own lifetime. A single-parameter function has one lifetime, a two-parameter function has two lifetimes, and so on.

Rule 2: If there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters. In other words, if there is only one input lifetime, that lifetime is the lifetime of every possible return value of the function.

Rule 3: If there are multiple input lifetime parameters, but one of them is &self or &mut self (that is, the function is a method), then the lifetime of self is assigned to all output lifetime parameters.

1. Successful Example

Now that the rules are clear, let’s look at an example:

fn first_word(s:&str) -> &str {
    //...
}

Put yourself in the compiler’s place and think about how to use the three rules to find the omitted lifetime in this function signature.

First, apply Rule 1—each reference parameter gets its own lifetime. There is only one parameter here, so there is only one lifetime. At this point, the compiler infers:

fn first_word<'a>(s:&'a str) -> &str {
    //...
}

Because there is only one input lifetime, Rule 2 also applies here—if there is exactly one input lifetime parameter, that lifetime is assigned to all output lifetime parameters. So the input lifetime is assigned to the output lifetime. At this point, the compiler infers:

fn first_word<'a>(s:&'a str) -> &'a str {
    //...
}

Because there is only one input lifetime, and this function is not a method, Rule 3 does not apply.

Now every reference in the function has a lifetime, so the compiler can continue analyzing the code without the programmer manually annotating the lifetimes in the function signature.

2. Failure Example

Look at the second example:

fn longest(x:&str, y:&str) -> &str {
    //...
}

This function signature has two reference inputs, and the return type is also a reference. Try these three rules:

First, apply Rule 1—each reference parameter gets its own lifetime. There are two parameters here, so there are two lifetimes:

fn longest<'a, 'b>(x:&'a str, y:&'b str) -> &str {
    //...
}

Because there are two reference parameters, Rule 2 does not apply.

Because this function is not a method, Rule 3 does not apply.

After applying all three rules, the return value’s lifetime is still undetermined, so the compiler reports an error. In other words, you must declare the lifetime explicitly.

DEV Community