DEV Community

Cover image for Stop Writing Raw Regex in Java
Mirko Dimartino
Mirko Dimartino

Posted on

Stop Writing Raw Regex in Java

Let's be honest: writing Regular Expressions in Java is often a painful experience.

You start with a simple rule, but before you know it, you are staring at an unreadable string of backslashes, brackets, and cryptic symbols. Worse still, if you make a syntax error, the compiler won't help you — you'll only find out at runtime when your application crashes or, even worse, falls victim to a Catastrophic Backtracking (ReDoS) attack.

I got tired of this, so I built Sift.


What is Sift?

Sift is a fluent, state-machine-driven Domain Specific Language (DSL) for building Type-Safe Regular Expressions in Java.

Instead of guessing the syntax, Sift uses your IDE's auto-completion to guide you. It enforces strict structural guarantees at compile time.


The Problem: Raw Regex

Imagine trying to validate an international username and ensuring it doesn't trigger an infinite backtracking loop. You might write something like this:

^[\p{Lu}][\p{L}\p{Nd}_]{3,15}+[0-9]?$
Enter fullscreen mode Exit fullscreen mode

It works, but it's hard to read, impossible to maintain, and a nightmare for junior developers to review.


The Solution: Sift's Fluent API

With Sift, you write self-documenting code that compiles down to the exact same 100% native Java Pattern with zero runtime overhead:

var start = Sift.fromStart();
var anywhere = Sift.fromAnywhere();
var oneUnicodeUppercaseLetter = start.uppercaseLettersUnicode();
var otherUnicodeChar = anywhere.between(3, 15).wordCharactersUnicode().withoutBacktracking();
var optionalDigits = anywhere.optional().digits();

String regex = oneUnicodeUppercaseLetter
    .followedBy(otherUnicodeChar, optionalDigits)
    .andNothingElse()
    .shake();
Enter fullscreen mode Exit fullscreen mode

Modularity: The "LEGO Brick" Approach

Complex patterns usually require long, monolithic strings. Sift introduces modularity: you can build unanchored intermediate blocks and compose them into strict boundaries later.

Sift also supports Lazy Validation for Backreferences. You can define a Named Capture Group in one block, reference it in another disconnected block, and Sift will safely merge and validate them when you finally call .shake().


Built for Security

Java's default regex engine is vulnerable to ReDoS if you aren't careful with quantifiers. Sift exposes possessive (.withoutBacktracking()) and lazy (.asFewAsPossible()) modifiers directly through the Type-State machine, making secure patterns the path of least resistance.


Check out the Cookbook 🧑‍🍳

I recently wrote a comprehensive COOKBOOK.md demonstrating real-world use cases, such as parsing TSV logs, validating UUIDs/IPs, and extracting HTML tags.

👉 Sift on GitHub


I would love to hear your feedback on the DSL design and the state-machine architecture. What do you think about fluent builders for Regex? Let me know in the comments!

Top comments (0)