Your Java Regex Can Be Weaponized (And How To Stop It)

#java #cybersecurity #security

Most developers don't realize their input validation is a denial of service vulnerability waiting to happen. Let me show you what I mean.

Take this innocent looking regex that validates email addresses:

String regex = "^([a-zA-Z0-9]+)+@[a-zA-Z0-9]+\\.[a-zA-Z]{2,}$";
Pattern.compile(regex).matcher(input).matches();

Looks fine right? Now feed it this input:

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa!

Your CPU just pegged at 100% and stayed there. This is called ReDoS (Regular Expression Denial of Service) and it happens because Java's regex engine uses backtracking. Certain patterns cause exponential time complexity when matching fails in specific ways.

Attackers know about this. They send crafted inputs to your validation endpoints and watch your servers melt.

The Backtracking Problem

Java's java.util.regex uses an NFA (Nondeterministic Finite Automaton) with backtracking. When a match fails partway through, the engine backtracks and tries alternative paths. With nested quantifiers like ([a-zA-Z0-9]+)+ the number of paths explodes exponentially.

The input aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa! has 33 characters. The regex engine will try somewhere around 2^33 combinations before giving up. That's 8 billion operations for one validation call.

The Fix

There's a different kind of regex engine called RE2 that Google built. It uses a DFA (Deterministic Finite Automaton) which guarantees linear time matching. No matter how evil the pattern or input, it finishes in O(n) time where n is the input length.

I've been working on a Java validation library called Rules that bundles a patched fork of RE2J (the Java port of RE2). The fork includes fixes for some vulnerabilities found in the original implementation.

Using it looks like this:

Rule<String> email = StringRules.matches("^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$");

ValidationResult result = email.validate(userInput);
if (result.isValid()) {
    // safe to use
}

Same pattern, but now it can't be weaponized. The underlying RE2J engine simply won't allow catastrophic backtracking because it doesn't backtrack at all.

What Else Can Go Wrong

While we're on the topic of input validation attacks, regex isn't the only vector:

HashDoS : Attackers can make keys that all hash to the same bucket, turning your O(1) HashMap lookups into O(n). The library includes SecureHashMap which uses SipHash-2-4 with random keys to prevent this.

Timing Attacks : Comparing secrets with equals() leaks information through timing differences. Character by character comparison bails early on mismatch so attackers can guess passwords one character at a time. The library has constant time comparison functions.

Stack Overflow via Recursion : Self referential data structures can blow your stack during validation. The library tracks depth and detects cycles.

Quick Example

Here's validating a user registration with multiple rules composed together:

Rule<String> username = Rules.all(
    StringRules.notBlank(),
    StringRules.lengthBetween(3, 20),
    StringRules.matches("^[a-zA-Z0-9_]+$")
);

Rule<String> password = Rules.all(
    StringRules.minLength(12),
    StringRules.matches("[A-Z]"),
    StringRules.matches("[a-z]"),
    StringRules.matches("[0-9]")
);

username.validate("user_dev").isValid();  // true
password.validate("SecurePass123").isValid();  // true

Getting It

Maven:

<dependency>
    <groupId>io.github.xoifaii</groupId>
    <artifactId>rules</artifactId>
    <version>1.0.0</version>
</dependency>

Or clone from GitHub if you want to poke around. Full API docs are in the wiki.

Zero dependencies. The RE2J fork is bundled in.

If you've got a public facing Java app doing input validation with regex, it's worth checking whether your patterns are vulnerable. Tools like recheck can analyze patterns for ReDoS, very useful.

Or just use an engine that makes the problem impossible in the first place.