DEV Community

Mainak Bhattacharjee
Mainak Bhattacharjee

Posted on

From Regex Rampage to Lazy Bliss: My rjq Performance Adventure

Hey there, fellow Rustaceans 🊀!

I've been building a JSON filter tool called rjq, inspired by the awesome jq. But things took a turn for the worse when I hit a performance wall during lexing. The culprit? Compiling regular expressions in a hot loop . It turns out, regexes are like hungry hippos – they chomp up performance if you're not careful!
Here's the story of how I tamed the regex beast and saved my program from a slow, sluggish fate:

The Regex Rampage 🊖:

At first, I naively compiled the regex patterns within the lexing loop. This meant every iteration involved creating a brand new regex object. Think of it like baking a whole new pizza for every bite – inefficient, right? This constant creation caused a major performance bottleneck i.e. ~80% execution time was consumed by this.

The Lazylock Solution 🧙‍♂:

Thankfully, the Rust gods (and some helpful folks on the r/Rust subreddit) pointed me towards lazy_static and a technique called lazy initialization. This magic combo allowed me to compile the regex only once and store it in a thread-safe location using a LazyLock. Now, it's like having a box of pizza ready with a fresh slices whenever you need it – much more efficient!

The Lazy Bliss ✹:

The impact was phenomenal! Performance soared, and my lexing code became as smooth as butter . No more regex rampage, just happy filtering .
Want to See the Code?
Curious about the details? Head over to my GitHub repo for rjq: https://github.com/mainak55512/rjq

Lessons Learned 📚:

  • Regex compilation can be expensive, avoid hot loops!
  • Embrace lazy initialization for performance gains.
  • There's always a better way to do things in Rust (and life!)

So, the next time you encounter a performance bottleneck, remember – there might be a lazy solution waiting to be discovered!

P.S. If you have any other tips or tricks for optimizing JSON filtering in Rust, leave a comment below!

But wait, there's more!

Let's dive deeper into the technical aspects of this adventure.
Understanding lazy_static and LazyLock

  • lazy_static: This macro provides a way to declare static variables that are initialized only once, even in a multi-threaded environment.
  • LazyLock: This is a type provided by the lazy_static crate that ensures thread-safety during initialization.

Here's a simplified example of how I used these to optimize the regex compilation in rjq:

Outside the hot loop:

static MATCH_NUMBER: LazyLock<Regex> = LazyLock::new(|| Regex::new(r"^\d+\.?\d+").unwrap());

...and so on
Enter fullscreen mode Exit fullscreen mode

Inside the hot loop:

    if MATCH_NUMBER.is_match(&source_string[cursor..]) {
        match MATCH_NUMBER
            .find(&source_string[cursor..])
            .map(|x| x.as_str())
        {
            Some(val) => {
                cursor += val.len();
                token_array.push_back(token(TokenType::NUMBER, val.to_string()));
            }
            None => (),
        }
    } else if ... so on
Enter fullscreen mode Exit fullscreen mode

As you can see, the MATCH_NUMBER variable is declared using LazyLock, and it's initialized only once when the code is first executed. The LazyLock within the code ensures that the initialization is thread-safe.

Additional Performance Tips

  • Profiling: Use tools like perf or cargo-flamegraph to identify other performance bottlenecks in your code.
  • Data Structures: Choose appropriate data structures for your use case. For example, consider using HashMap for efficient lookups.
  • Algorithms: Optimize algorithms to reduce computational complexity.
  • Memory Management: Be mindful of memory allocations and deallocations.

By following these tips and leveraging techniques like lazy initialization, you can significantly improve the performance of your Rust applications.

Happy coding 🎉!

Top comments (0)