DEV Community

Calin Baenen
Calin Baenen

Posted on

Making a regex engine in Rust from scratch.

So I'm starting a new project in Rust called LexRs that will be a Rust remake of my original (misnamed) ParseJS library.

There was this one feature I wanted in the original library that I couldn't figure out how to make.
To alleviate this issue I thought that when making the Rust version I'd take a hand at making my own regular expression engine based on JavaScript's.

What I have so far.

I have chars.rs which contains REChar and REString, the former being a representation of a character in a regex.
I have pattern.rs which contains information for patterns, such as PatternType and PatternSize.

PatternType

The following are the pattern types; only three patterns are planned to be supported by my engine (for now):

enum PatternType {
    NoneOf(REString),   // Same as JS  /[^xyz]/.
    AnyOf(REString),    // Same as JS  /[xyz]/.
    Char(REChar)        // Same as JS  /x/.
}
Enter fullscreen mode Exit fullscreen mode

These are equivalent to [^xyz], [xyz], and x respectively.

PatternSize

The following are the sizes a pattern can be; only four distinct sizes will be supported:

enum PatterSize {
    OoM,        // Short for "One or More".
                // Same as JS  /x+/.
    ZoM,        // Short for "Zero or More".
                // Same as JS  /x*/.
    ZoO,        // Short for "Zero or One".
                // Same as JS  /x?/.
    N(usize),   // Represents a N-repetitions.
                // Same as JS  /x{N}/ where `N = int > 0`.
}
Enter fullscreen mode Exit fullscreen mode

These are equivalent to /x+/, /x*, x?, and x{1} respectively.

Flags

The following are the flags that will be supported by the regex:

struct Flags {
    case_insensitive:bool,   // Same as JS  /x/i.
    multiline:bool,          // Same as JS  /x/m.
    global:bool,             // Same as JS  /x/g.
}
Enter fullscreen mode Exit fullscreen mode

These are equivalent to the i, m, and g flags respectively.

Before the analyzer.

Before I work on the lexer for the regular expressions I feel I might want to work on the matching system first by matching a string against multiple Patterns.

How it's going so far.

I think things are going well and I am really excited to do this.
I look forward to seeing the technology I make work in action!

Þanks for reading!
Cheers!

Top comments (0)