The Anatomy of a 500ns Parser: Porting libphonenumber to Rust

#rust #cpp #performance

It started with a freelance project. I was writing a backend service in Rust and needed to validate international phone numbers. Like any Rust developer, I headed to crates.io and pulled the most popular library for the job.

Then, I opened their GitHub issues.

What I saw was a graveyard of long-standing, unhandled bugs. "National part of the number is truncated in some cases" (Open). "Numbers starting with the same sequence as country prefix are parsed incorrectly" (Open). "00-prefixed international numbers don't parse" (Open).

Porting Google's massive C++ libphonenumber library is an incredibly complex task, and I deeply respect the authors of that crate for undertaking it. But I couldn't ship my client's project with those bugs. So I decided to do something slightly crazy: I was going to write a bug-to-bug compatible port of the C++ libphonenumber, built from the ground up for maximum performance.

Here is the story of how making the library progressively "dumber" about regular expressions made it blazingly fast.

Phase 1: The Initial Prototype and the Protobuf Trap

The first challenge was simply navigating Google's C++ codebase. It's not easy when the language isn't your daily driver. I didn't want to blindly copy 5,000-line files, so I started breaking the logic down into idiomatic Rust modules.

My first strategic mistake was how I handled regular expressions. Google's library uses RE2 and relies heavily on exact bounds checking like FullMatch and FindAndConsume. I naïvely assumed I could just replace these with standard regex crate calls, checking if match.start == 0 && end == strlen.

Then came the metadata. Phone number validation relies on a massive set of rules for every country on earth. I initially thought I could hand-write the Rust structs for this data. I was wrong. The C++ metadata generation script outputs a raw, binary protobuf array. To read it, I had to use protobuf-gen (though I am currently planning a migration to prost or something even more custom). Another early hurdle was string normalization. The library needs to convert any Unicode character in the "Decimal Number" (Nd) category into its standard ASCII digit. I didn't want to drag in a massive ICU library just for this. My solution was to write a separate lightweight crate: dec_from_char. In its early days, it was just a giant match statement generated via a macro reading UnicodeProps.txt. Since there aren't that many Nd characters, it kept the binary size small while maintaining acceptable performance. I had a working prototype. The basic test suite passed. So, I took a long break from the project.

Phase 2: The Hiatus and the Warning Sign

During my time away from the codebase, a helpful contributor opened a Pull Request. They had directly ported the C++ RegexBasedMatcher to Rust, complete with a global RegexCache to handle the exact anchored matching that Google's library used. It was perfectly accurate to the upstream logic. But when I ran the benchmarks locally, I gasped.

Formatting Comparison/rlibphonenumber: format(National)
                        time:   [3.5683 ms 3.5991 ms 3.6337 ms]
                        change: [+22240% +22556% +22875%] (p = 0.00 < 0.05)
                        Performance has regressed.

Performance had regressed by over 22,000%. Formatting a single number suddenly took milliseconds. I politely declined the PR, thinking: "Exact C++ internal compatibility isn't worth destroying performance." I didn't realize at the time that this PR was a glaring warning sign of a much deeper flaw in my own code. I brushed it off.

Phase 3: The Return and the Differential Fuzzer

Months later, I returned to the library. Before publishing a v1.0, I wanted absolute proof that my library behaved identically to Google's. Writing manual edge cases was impossible, so I turned to differential fuzzing. I linked the original C++ library via cxx and fed both implementations millions of random strings to compare their outputs.

The fuzzer crashed almost immediately. A string like CD(+48X666666644 was being marked as invalid by my Rust code, but valid by C++.

Then it clicked. The contributor from months ago wasn't just translating C++ for the sake of it. My naïve start == 0 && end == strlen boundary checks were fundamentally broken for complex edge cases. Because of how the internal regex patterns were structured, my simple boundary checks were not allowing valid strings to slip through. I needed the exactness of C++'s anchored regexes. But I absolutely refused to accept the 22,000% performance penalty of a RegexCache allocating and compiling strings at runtime.

Phase 4: Fixing Regex the Fast Way

How do you anchor thousands of regexes without allocating strings at runtime or locking a global cache? You move the work to build time. I modified my Java build script (which parses the XML metadata) to pre-wrap the patterns in ^(?:...)$. But initializing three different variations of a regex at runtime would bloat memory. Instead, I created a structure called RegexTriplets:

#[derive(Debug, Clone)]
pub struct RegexTriplets {
    pub pattern_base: Option<String>,
    pub original: OnceLock<Result<Option<Regex>, crate::regexp::Error>>,
    pub anchor_start: OnceLock<Result<Option<Regex>, crate::regexp::Error>>,
    pub anchor_full: OnceLock<Result<Option<Regex>, crate::regexp::Error>>,
}

This struct sits on the stack and weighs only 84 bytes. We keep the raw string via pattern_base (since the internal logic sometimes needs to slice it) and lazily initialize the actual Regex objects only if a specific exact-match variation is requested. By leveraging Rust's string slicing ([..]), which is basically free, I achieved the exactness of the C++ implementation with minimal runtime overhead.

Phase 5: WASM, Regex-lite, and Custom Unicode Tries

One of my primary goals was WebAssembly support. I wanted a live web preview without paying for Rust backend hosting. However, my initial WASM builds were megabytes in size. The main offender was the regex crate; compiling DFA state machines into WASM takes a massive amount of space. I switched to regex-lite, which dropped the binary size to a beautiful ~500kB.

But there was a catch: regex-lite does not support Unicode categories like \p{L} (Letters) or \p{N} (Numbers). Dropping full Unicode support was not an option for an international phone library. To fix this, I completely rewrote my dec_from_char crate into a new code generator. Instead of a giant match statement, my build.rs now pre-computes a Trie-like lookup table for the necessary Unicode properties. Using fixed chunk sizes, it generates arrays that grant O(1) lookups for any character up to 0x10FFFF.

impl Category {
    #[inline(always)]
    pub fn from_char(c: char) -> ::std::option::Option<Self> {
        let cp = c as u32;
        if cp > MAX_CODEPOINT { return None; }

        let index_idx = (cp >> SHIFT) as usize;

        // SAFETY: Arrays are generated to cover up to 0x10FFFF
        unsafe {
            let block_idx = *CATEGORY_INDICES.get_unchecked(index_idx) as usize;
            let offset = (cp & MASK) as usize;
            let final_pos = (block_idx << SHIFT) + offset;
            *CATEGORY_BLOCKS.get_unchecked(final_pos)
        }
    }
}

This allowed me to rip out regular expressions entirely from critical hot-paths. Stripping unwanted trailing characters went from a slow regex match inside a loop:

// The old, slow regex way
pub(crate) fn trim_unwanted_end_chars<'a>(&self, phone_number: &'a str) -> &'a str {
    // ... loop with regex.full_match() ...
}

To a single, native Rust iterator method leveraging the generated tables:

// The new, instant way
pub(crate) fn trim_unwanted_end_chars<'a>(&self, phone_number: &'a str) -> &'a str {
    phone_number.trim_end_matches(|c| {
        c != '#' && uniprops_without_nl::uniprops::Category::from_char(c).is_some()
    })
}

Phase 6: Zero-Allocation Formatting

At this point, parsing was incredibly fast, but formatting still required a heap allocation when appending leading zeroes to national numbers. To achieve true zero-allocation formatting, I wrote a custom integer-to-string formatter called zeroes_itoa, by clonning and modifying This popular crate.

pub fn format(&mut self, mut n: u64, leading_zero_count: usize) -> Cow<'_, str> {
    let mut curr = self.bytes.len();
    let buf_ptr = self.bytes.as_mut_ptr() as *mut u8;
    let lut_ptr = DEC_DIGITS_LUT.as_ptr();

    // ...
    let final_len = self.bytes.len() - curr;
    let bytes = unsafe { slice::from_raw_parts(buf_ptr.add(curr), final_len) };
    unsafe { str::from_utf8_unchecked(bytes).into() }
}

Unless the required padding exceeds the 64-byte stack buffer (which is essentially impossible for a phone number), this returns a Cow::Borrowed(&str), entirely avoiding the system allocator.

The Final Results

Through differential fuzzing, build-time regex compilation, custom Unicode Tries, and stack-allocated integer formatting, rlibphonenumber is now entirely stable, bug-for-bug compatible with Google's upstream, and incredibly fast.

Here is the final performance comparison against Google's upstream C++ implementation:

Operation	C++ (`libphonenumber` + RE2)	Rust (`rlibphonenumber`)	Speedup
Parsing	2279 ns	506 ns	~ 4.5x
Format (E.164)	63 ns	36 ns	~ 1.7x
Format (International)	2028 ns	447 ns	~ 4.5x
Format (National)	2484 ns	578 ns	~ 4.3x