[Rust Guide] 12.3. Refactoring Pt.1 - Improving Modularity

#rust #programming #learning

12.3.0 Before We Begin

In Chapter 12, we will build a real project: a command-line program. This program is a grep (Global Regular Expression Print), a tool for global regular-expression search and output. Its job is to search for the specified text in the specified file.

This project has several steps:

Receive command-line arguments
Read files
Refactor: improve modules and error handling (this article)
Use TDD (test-driven development) to develop library functionality
Use environment variables
Write error messages to standard error instead of standard output

If you find this helpful, please like, bookmark, and follow. To keep learning along, follow this series.

12.3.1 Why Refactor

The purpose of refactoring is to improve modularity and error handling.

Here is all the code written up to the previous article:

use std::env;  
use std::fs;  

fn main() {  
    let args:Vec<String> = env::args().collect();  
    let query = &args[1];  
    let filename = &args[2];  

    println!("search for {}", query);  
    println!("In file {}", filename);  

    let contents = fs::read_to_string(filename)  
        .expect("Something went wrong while reading the file");
    println!("With text:\n{}", contents);  
}

This code has four problems:

The main function is doing too much. It handles command-line parsing and file reading. The guiding principle of program design is that each function should handle only one responsibility, so the function should be split up.
The variables query and filename store program configuration, while contents stores file contents. As code and variables accumulate, it becomes harder to track what each variable actually means. These values should be stored in a struct.
File-reading errors are handled with expect, which always prints an error message and panics no matter what went wrong. That is not ideal, because a file read failure might mean the file does not exist, or it might be a permissions problem. The panic message "Something went wrong while reading the file" does not help the user diagnose the issue.
If expect is used throughout the program, users will see error messages coming from Rust internals, such as "Index out of bounds", which makes it hard to understand what actually caused the problem. It is better to centralize error handling so future maintainers only need to consider one place when changing the logic, and so the error messages shown to users are understandable.

12.3.2 A Guiding Principle for Separating Concerns in Binary Programs

Many Rust binary projects run into the same organizational problem: they put too much functionality and too many responsibilities into main. The Rust community has a guiding principle for separating concerns in binary programs:

Split the program into main.rs and lib.rs, and put business logic in lib.rs
If the logic is small, keeping it in main.rs is fine
As the logic becomes more complex, extract it from main.rs into lib.rs

After this split, the responsibilities that should remain in main in this example are:

Call the command-line parsing logic using the argument values
Perform other configuration
Call the run function in lib.rs
Handle any problems that run may return

12.3.3 Separating Logic

Take another look at the code:

use std::env;  
use std::fs;  

fn main() {  
    let args:Vec<String> = env::args().collect();  
    let query = &args[1];  
    let filename = &args[2];  

    println!("search for {}", query);  
    println!("In file {}", filename);  

    let contents = fs::read_to_string(filename)  
        .expect("Something went wrong while reading the file");
    println!("With text:\n{}", contents);  
}

First, extract the command-line argument handling:

fn parse_config(args: &[String]) -> (&str, &str) {  
    let query = &args[1];  
    let filename = &args[2];  
    (query, filename)  
}

&[String] means a slice of a Vector whose elements are String
There is no need to print query and filename here, so that part is removed

Then change main to call parse_config:

fn main() {  
    let args:Vec<String> = env::args().collect();  
    let (query, filename) = parse_config(&args);  

    let contents = fs::read_to_string(filename)  
        .expect("Something went wrong while reading the file");  
    println!("With text:\n{}", contents);  
}

12.3.4 Using a Struct

parse_config returns query and filename together as a tuple, and then main splits those two tuple values back into two variables. This back-and-forth splitting and combining shows that the abstraction in the program is not ideal.

query and filename are both part of the configuration and are related to each other, so putting them in a tuple does not express that relationship well enough. A struct is a better fit:

struct Config {  
    query: String,  
    filename: String,  
}  

fn main() {  
    let args:Vec<String> = env::args().collect();  
    let config = parse_config(&args);  

    let contents = fs::read_to_string(config.filename)  
        .expect("Something went wrong while reading the file");  
    println!("With text:\n{}", contents);  
}  

fn parse_config(args: &[String]) -> Config {  
    let query = args[1].clone();  
    let filename = args[2].clone();  
    Config {  
        query,  
        filename,  
    }  
}

In parse_config, pay attention to the types of query and filename: the parameter args has type &[String], which is a reference and therefore does not own the data, so query and filename are also references. But Config expects String, not &String, so we need to clone to gain ownership and convert &String into String.

Cloning uses more time and memory than storing references directly, but it saves us from dealing with lifetimes and makes the code more direct and simpler. In some scenarios, giving up a bit of performance in exchange for simplicity is well worth considering.

Of course, using String::from to wrap the values also works:

fn parse_config(args: &[String]) -> Config {  
    let query = &args[1];  
    let filename = &args[2];  
    Config {  
        query: String::from(query),  
        filename: String::from(filename),  
    }  
}

There are other valid ways to write this code too, but here I will use the cloning approach.

12.3.5 Turning a Function Into a Struct Method

Since parse_config creates a Config instance, it is effectively a constructor. A constructor can be written like this:

impl Config {  
    fn new(args: &[String]) -> Config {  
        let query = args[1].clone();  
        let filename = args[2].clone();  
        Config {  
            query,  
            filename,  
        }  
    }  
}

Just place this function on the Config implementation block (for details on methods, see 5.3. Methods on Structs). I also renamed parse_config to new, because I am treating it as a constructor (constructors are usually named new).

After this change, main also needs to be updated:

let config = Config::new(&args);

12.3.5 The Full Code

Here is all the code written up to this article:

use std::env;  
use std::fs;  

struct Config {  
    query: String,  
    filename: String,  
}  

fn main() {  
    let args:Vec<String> = env::args().collect();  
    let config = Config::new(&args);  

    let contents = fs::read_to_string(config.filename)  
        .expect("Something went wrong while reading the file");  
    println!("With text:\n{}", contents);  
}  

impl Config {  
    fn new(args: &[String]) -> Config {  
        let query = args[1].clone();  
        let filename = args[2].clone();  
        Config {  
            query,  
            filename,  
        }  
    }  
}