DEV Community

loading...

CLI autocompletion algorithm in Rust

yujiri8 profile image Ryan Westlund ・2 min read

I started learning Rust a couple days ago, so I thought I was ready to try something significant in it. Natural pick: the autocomplete algorithm that I implemented in Python, Go, and Haskell and compared the brevity and performance of solutions.

Here's what I came up with:

use std::io;
use std::fs;
use std::time;
use serde::{Deserialize};
use serde_json;

#[derive(Deserialize, Debug)]
struct Case {
    typed : String,
    possibilities : Vec<String>,
}

fn main() -> Result<(), std::io::Error> {
    let cases : Vec<Case> = serde_json::from_reader(io::BufReader::new(fs::File::open("data.json")?))?;
    let before = time::Instant::now();
    for case in cases {
        autocomplete(&case.typed, &case.possibilities);
    }
    let time = before.elapsed();
    println!("{:?}", time);
    Ok(())
}

fn autocomplete(typed: &str, possibilities: &Vec<String>) -> String {
    let matches: Vec<&String> = possibilities.iter().filter(|p| p.starts_with(typed)).collect();
    // If there are no matches, proceed to the next rule.
    if matches.len() == 0 { return autocomplete_contiguous(typed, possibilities) }
    // Scan the common prefix 1 char a time.
    for i in typed.len().. {
        let mut c = None;
        for m in &matches {
            if m.len() <= i || c != None && m.as_bytes()[i] != c.unwrap() {
                return m[..i].to_string()
            }
            if c == None { c = Some(m.as_bytes()[i]) }
        }
    }
    typed.to_string()
}

fn autocomplete_contiguous(typed: &str, possibilities: &Vec<String>) -> String {
    let matches: Vec<&String> = possibilities.iter().filter(|p| p.contains(typed)).collect();
    if matches.len() == 1 { return matches[0].to_string() }
    if matches.len() > 1 { return typed.to_string() }
    autocomplete_in_order(typed, possibilities)
}

fn autocomplete_in_order(typed: &str, possibilities: &Vec<String>) -> String {
    let mut matching: Option<&String> = None;
    for s in possibilities {
        // See if we can find each character of typed in this possibility after where we found the last one.
        let mut lastmatch = 0;
        for (i, c) in typed.char_indices() {
            match s[lastmatch..].find(c) {
                None => break,
                Some(index) => {
                    lastmatch += index + 1;
                    // If we just found the last char.
                    if i + 1 == typed.len() {
                        // There was already a match. It's ambiguous.
                        if matching != None { return typed.to_string() }
                        matching = Some(s);
                    }
                }
            }
        }
    }
    match matching {
        None => typed.to_string(),
        Some(m) => m.to_string()
    }
}

Enter fullscreen mode Exit fullscreen mode

Discounting comments, blank lines, and the main function (since the comparison is only supposed to be of the algorithm), it comes to 42 lines, which is actually really good considering that 12 of those lines are just closing braces - which means if I don't count them at all, it would be the same number of lines as Python. It isn't getting there with lots of long lines either. This is a pretty impressive showing for a systems language, especially since this is code written by a newbie. I doubt a Rust expert can't improve this somewhat.

As for performance, a release build built with Rust 1.46.0 ran the million testcases in 1 second. To be honest, I'm a little disappointed in that - I expected it to beat Go (which ran in half a second) since it doesn't have a runtime or garbage collection. But then again, it's possible it could if it were written better.

The memory usage is amazing though. It allocated only 600 MB. My Go implementation had 700 MB resident in physical memory, with another 700 MB included in the process size as shown by top (Rust was 600 MB in both metrics).

Discussion (9)

pic
Editor guide
Collapse
evanjs profile image
Evan Stoll

No clue if it would have any impact, but I tend to try opt-level=z and lto=thin when I want lazy (potential) performance gains.

Collapse
maan2003 profile image
Maan2003 • Edited

match looks nicer atleast to me and also reduces the number of checks ( [i] checks, .unwrap() checks)

if m.len() <= i || c != None && m.as_bytes()[i] != c.unwrap() {
match (m.as_bytes().get(i), c) {
    (None, _) => return &m[..i],
    (Some(ch), Some(c)) if *ch == c => return &m[..i],
    (Some(ch), None) => c = Some(*ch),
    _ => (),
}
Collapse
jlgerber profile image
jlgerber • Edited

On the performance front, it looks like you are creating intermediate strings that could be left as &str with a bit of lifetime annotation work. It seems like Case should be the owner of String data in your code; None of your autocomplete_* functions should have to return String...

Collapse
yujiri8 profile image
Ryan Westlund Author

Ah, you're right!

I made those optimizations. It only reduced the runtime by about .05s... but it led me to notice something else. I don't know exactly why this is, but if I replace for case in cases with for case in &cases, runtime went all the way down to .62s! I wonder if it's that if I move cases into the loop, Rust frees all that memory before taking the final time, meaning the time to free the memory was being included in Rust's time but not Go's.

Part of the remaining gap with Go is the filters. It must be more expensive to allocate the lambda. If I refactor a filter/collect statement to an explicit loop, it saves a few more centiseconds. I don't think that would be a fair count though, since in a real application I wouldn't sacrifice so much readability for such a small gain.

Collapse
maan2003 profile image
Maan2003

replace &Vec<String> with &[String] autocomplete_* fns to avoid a pointer indirection

Collapse
maan2003 profile image
Maan2003 • Edited

Also would be interesting to see how easy it is to parallelize the code in rust and go (and haskell (maybe)).

Collapse
xanonid profile image
xanonid • Edited

As you have not explicitly mentioned it: Please make sure to run your test with release optimizations enabled to see the full performance. See e.g. stackoverflow.com/a/29822963/1182783

Collapse
yujiri8 profile image
Ryan Westlund Author

I did mention it: "A release build built with..."

Collapse
xanonid profile image
xanonid

sorry that I missed it.