DEV Community 👩‍💻👨‍💻

Jeancarlo Barrios
Jeancarlo Barrios

Posted on

Finding a Needle With RUST

In my journey to learn RUST, I was looking at problems to solve. While on that journey, I thought, why not find "the needle in a haystack." It turns out this is literally the name of a classic string matching problem. Which has applications in multiple fields like Pattern Recognition, DNA Matching(Bioinformatics), and Data Processing.

The problem is stated as follows: given two strings a text and a pattern, determine whether the pattern appears in the text.

This problem is on Exercism, but they take it a step further and compare if a pattern is a sublist or Superlist(the inverse). We kill two birds with one stone and solve it for their API

Tests

Let's start by writing some tests(I mean copying Exercism tests)...

use string_pattern::{sublist, Comparison};

#[test]
fn empty_equals_empty() {
    let v: &[u32] = &[];

    assert_eq!(Comparison::Equal, sublist(v, v));
}

#[test]
#[ignore]
fn test_sublist_at_start() {
    assert_eq!(Comparison::Sublist, sublist(&[1, 2, 3], &[1, 2, 3, 4, 5]));
}

#[test]
#[ignore]
fn sublist_in_middle() {
    assert_eq!(Comparison::Sublist, sublist(&[4, 3, 2], &[5, 4, 3, 2, 1]));
}

#[test]
#[ignore]
fn sublist_at_end() {
    assert_eq!(Comparison::Sublist, sublist(&[3, 4, 5], &[1, 2, 3, 4, 5]));
}

#[test]
#[ignore]
fn partially_matching_sublist_at_start() {
    assert_eq!(Comparison::Sublist, sublist(&[1, 1, 2], &[1, 1, 1, 2]));
}


Enter fullscreen mode Exit fullscreen mode

Implementation

First, let us try to solve it in a naive way. A naive implementation can be achieved by Iterating in T (text) and checking if P (pattern) exists inside it. Rust has a fantastic tool for slices called Window.

#[derive(Debug, PartialEq)]
pub enum Comparison {
    Equal,
    Sublist,
    Superlist,
    Unequal,
}
pub fn sublist<T: PartialEq>(pattern: &[T], search_universe: &[T]) -> Comparison {

    if pattern == search_universe {
        return Comparison::Equal;

    } else if pattern.len() < search_universe.len() && is_sublist(pattern, search_universe) {
        return Comparison::Sublist;

    } else if search_universe.len() < pattern.len() && is_sublist(search_universe, pattern) {
        return Comparison::Superlist;
    }
    Comparison::Unequal
}
fn is_sublist<T: PartialEq>(pattern: &[T], search_universe: &[T]) -> bool {
    pattern.is_empty()
        || search_universe
            .windows(pattern.len())
            .any(|w| w == pattern)
}
Enter fullscreen mode Exit fullscreen mode

Let's make sure it runs

 cargo test -- --ignored

    Finished test [unoptimized + debuginfo] target(s) in 0.26s
     Running unittests (target/debug/deps/sublist-f0465d8bc8b028e7)

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

     Running tests/sublist.rs (target/debug/deps/sublist-238cc73787cf647b)

running 17 tests
test sublist_at_end ... ok
test partially_matching_sublist_at_start ... ok
test recurring_values_unequal ... ok
test recurring_values_sublist ... ok
test superlist_at_end ... ok
test sublist_in_middle ... ok
test superlist_at_start ... ok
test superlist_in_middle ... ok
test test_1_is_not_2 ... ok
test test_anything_is_a_superlist_of_empty ... ok
test test_empty_is_a_sublist_of_anything ... ok
test test_compare_larger_equal_lists ... ok
test test_sublist_at_start ... ok
test superlist_early_in_huge_list ... ok
test sublist_early_in_huge_list ... ok
test huge_sublist_not_in_huge_list ... ok

test result: ok. 16 passed; 0 failed; 0 ignored; 0 measured; 1 filtered out; finished in 0.18s

   Doc-tests sublist

running 0 tests

test result: ok. 0 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s
Enter fullscreen mode Exit fullscreen mode

Nice!!! Now there is not much going, but let's talk about it anyways.

So the important part of this code is the is_sublist section.

fn is_sublist<T: PartialEq>(pattern: &[T], search_universe: &[T]) -> bool {
    pattern.is_empty()
        || search_universe
            .windows(pattern.len())
            .any(|w| w == pattern)
}

Enter fullscreen mode Exit fullscreen mode

NOTE: Then rust has this fantastic tool called window. An iterator over overlapping subslices of length size n.

First, we check if the pattern is empty because the set empty is a pattern of any set.

Window creates a iterator with a window the size of the pattern.
We use a closure with the function any() to check if the iteration window the pattern.

Finally, if you notice, rust also has an excellent generic system. We implemented this not just for string but for any type T that has the trait PartialEq(comparable).

Cool, right, we have a working implementation to find a string pattern;. However, if we look closely, this implementation is not the best. It is O(n*m). Maybe we can achieve a linear time if we try a bit harder.

To land the point home, I could not even run a benchmark on it. The function was too slow for Criterion when I added the following test.

#[test]
fn huge_sublist_not_in_huge_list_2() {
    let v1: Vec<u64> = vec![0; 1_000_000];
    let mut v2: Vec<u64> = vec![0; 500_000];
    v2.push(1);
assert_eq!(Comparison::Unequal, sublist(&v1, &v2));
}
Enter fullscreen mode Exit fullscreen mode

It takes longer than a minute to run the test(339.50s).

test huge_sublist_not_in_huge_list_2 has been running for over 60 seconds
test huge_sublist_not_in_huge_list_2 ... ok

test result: ok. 2 passed; 0 failed; 16 ignored; 0 measured; 0 filtered out; finished in 339.50s
Enter fullscreen mode Exit fullscreen mode

In conclusion, Rust has terrific tools that make it quite ergonomic to implement solutions. Nevertheless, just because it is in Rust does not mean that our implementation is fast. In Part 2, we will implement the KPM algorithm to achieve a linear time.

Top comments (0)

🌚 Life is too short to browse without dark mode