A colleague pushes a fix for a validation bug. The regex looks right. Tests pass. Two weeks later, a user reports that their perfectly valid email address is being rejected — because the fix introduced an unbalanced bracket that the compiler never complained about.
This is the nature of raw regex in Java. The compiler has no opinion. The mistake lives quietly in a string literal until runtime hands you the bill.
I built Sift to make this class of bug unrepresentable.
What Sift actually does
Sift is a fluent DSL for building regular expressions in Java. Its core idea is simple: instead of writing a string, you traverse a type-state machine. Each method returns only the next valid state — so wrong transitions don't exist as methods, and the compiler rejects incomplete or structurally invalid patterns before your code ever runs.
The before/after speaks for itself:
// Before — what does this even do?
Pattern p = Pattern.compile("^(?=[\\p{Lu}])[\\p{L}\\p{Nd}_]{3,15}+[0-9]?$");
// After — your IDE guides every step
String regex = Sift.fromStart()
.exactly(1).upperCaseLettersUnicode() // Must start with an uppercase letter
.then()
.between(3, 15).wordCharactersUnicode().withoutBacktracking() // ReDoS-safe
.then()
.optional().digits() // May end with a digit
.andNothingElse()
.shake();
// Result: ^[\p{Lu}][\p{L}\p{Nd}_]{3,15}+[0-9]?$
Same output. Zero runtime overhead. And if you try to skip the quantifier step and call .digits() directly, it simply doesn't compile.
The LEGO brick approach
The real power emerges when you start composing patterns from named building blocks.
Every Sift.fromAnywhere() call returns a SiftPattern<Fragment> — an unanchored, reusable piece that can be embedded anywhere without carrying unwanted ^ anchors. Patterns built with fromStart() or sealed with andNothingElse() become SiftPattern<Root> — they cannot be embedded. Attempting it is a compile-time error.
// Define named building blocks
SiftPattern<Fragment> year = Sift.fromAnywhere().exactly(4).digits();
SiftPattern<Fragment> month = Sift.fromAnywhere().exactly(2).digits();
SiftPattern<Fragment> day = Sift.fromAnywhere().exactly(2).digits();
SiftPattern<Fragment> dash = Sift.fromAnywhere().character('-');
// Compose into a reusable date block
SiftPattern<Fragment> date = year.followedBy(dash, month, dash, day);
// Embed inside a larger log parser
String logRegex = Sift.fromStart()
.of(date)
.followedBy('\t')
.then().oneOrMore().upperCaseLetters() // log level: INFO, WARN, ERROR
.followedBy('\t')
.then().oneOrMore().anyCharacter()
.andNothingElse()
.shake();
// Result: ^[0-9]{4}-[0-9]{2}-[0-9]{2}\t[A-Z]+\t.+$
The date fragment is independently testable, readable by name, and reusable across your codebase without copy-paste.
Patterns are extraction tools, not just validators
This is the part that usually surprises people. In v5.6, every SiftPattern ships a complete extraction API — no Matcher boilerplate required.
Named group extraction
NamedCapture yearGroup = SiftPatterns.capture("year", Sift.exactly(4).digits());
NamedCapture monthGroup = SiftPatterns.capture("month", Sift.exactly(2).digits());
NamedCapture dayGroup = SiftPatterns.capture("day", Sift.exactly(2).digits());
SiftPattern<?> datePattern = Sift.fromStart()
.namedCapture(yearGroup)
.followedBy('-')
.then().namedCapture(monthGroup)
.followedBy('-')
.then().namedCapture(dayGroup)
.andNothingElse();
Map<String, String> fields = datePattern.extractGroups("2026-03-13");
// → { "year": "2026", "month": "03", "day": "13" }
Extracting all matches across a text
List<String> prices = Sift.fromAnywhere()
.oneOrMore().digits()
.extractAll("Order: 3 items at 25 and 40 euros");
// → ["3", "25", "40"]
Extracting all named groups across multiple matches
List<Map<String, String>> allMatches = invoicePattern.extractAllGroups(largeDocument);
// → [{"id": "INV-001", "amount": "250"}, {"id": "INV-002", "amount": "80"}, ...]
Lazy streaming for large inputs
Sift.fromAnywhere().oneOrMore().lettersUnicode()
.streamMatches(largeText)
.filter(word -> word.length() > 5)
.forEach(System.out::println);
The full API — all null-safe:
| Method | Returns | Description |
|---|---|---|
containsMatchIn(input) |
boolean |
Is there at least one match? |
matchesEntire(input) |
boolean |
Does the entire string match? |
extractFirst(input) |
Optional<String> |
First match, or empty |
extractAll(input) |
List<String> |
All matches |
extractGroups(input) |
Map<String, String> |
Named groups from first match |
extractAllGroups(input) |
List<Map<String, String>> |
Named groups from all matches |
replaceFirst(input, replacement) |
String |
Replace first match |
replaceAll(input, replacement) |
String |
Replace all matches |
splitBy(input) |
List<String> |
Split around matches |
streamMatches(input) |
Stream<String> |
Lazy stream of all matches |
ReDoS mitigation built in
The withoutBacktracking() you saw earlier generates a possessive quantifier (\w++). There are two other tools:
// Atomic group — locks a sub-pattern once matched
SiftPattern<Fragment> safe = Sift.fromAnywhere()
.oneOrMore().digits()
.preventBacktracking(); // wraps in (?>...)
// Lazy quantifier — matches as few characters as possible
Sift.fromAnywhere()
.oneOrMore().anyCharacter().asFewAsPossible(); // generates .+?
Secure patterns become the path of least resistance — you don't have to remember whether it's *+ or *?.
Jakarta Validation — no more duplicated regex
If you use Bean Validation, you've probably written the same @Pattern across multiple DTOs and then forgotten to sync them when the rule changed. Sift solves this with @SiftMatch:
// Define the rule once
public class PromoCodeRule implements SiftRegexProvider {
@Override
public String getRegex() {
return Sift.fromStart()
.atLeast(4).letters()
.then()
.exactly(3).digits()
.andNothingElse()
.shake();
}
}
// Reuse it everywhere — compiled once at bootstrap, zero overhead per request
public record ApplyPromoRequest(
@SiftMatch(
value = PromoCodeRule.class,
flags = { SiftMatchFlag.CASE_INSENSITIVE },
message = "Invalid promo code format"
)
String promoCode
) {}
Ready-made patterns — SiftCatalog
For common formats, SiftCatalog provides production-ready, ReDoS-safe patterns. All are Fragment-typed — they compose cleanly with your own chains.
// Standalone validation
boolean valid = SiftCatalog.email().matchesEntire("user@example.com");
// Embedded in a larger pattern
String regex = Sift.fromStart()
.of(SiftCatalog.uuid())
.followedBy('/')
.then().of(SiftCatalog.isoDate())
.andNothingElse()
.shake();
Available: uuid(), ipv4(), macAddress(), email(), webUrl(), isoDate().
Getting started
Gradle:
implementation 'com.mirkoddd:sift-core:<latest>'
// Optional: Jakarta Validation integration
implementation 'com.mirkoddd:sift-annotations:<latest>'
Maven:
<dependency>
<groupId>com.mirkoddd</groupId>
<artifactId>sift-core</artifactId>
<version>latest</version>
</dependency>
Java 8 bytecode. Zero runtime dependencies. Tested on JVM 8, 11, 17, and 21.
👉 GitHub — mirkoddd/Sift
📖 Sift Cookbook — real-world recipes: UUID validation, TSV log parsing, lookarounds, conditional patterns, nested structures, and more.
The compiler is the best test suite you have. Sift puts it to work on your regex too.
Top comments (0)