DEV Community

Thomas Couderc
Thomas Couderc

Posted on

I built a Kotlin DSL because I couldn't read my own regex anymore

The git blame moment

You know that moment.

You open a file. You see a regex. Eighty characters of pure line noise.
You mutter "who wrote this garbage?" and run git blame.

It was you. Three months ago.

[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}
Enter fullscreen mode Exit fullscreen mode

That's the readable kind. Multiply by five on real legacy code, then sprinkle in a few nested groups, a couple of lookaheads, and a comment that says // don't touch this.

This article is about kexpresso, a small Kotlin library I built to stop hating my past self. It's open source, on Maven Central, and Kotlin Multiplatform.

The reveal

Same regex, written with kexpresso:

val email = kexpresso {
    startOfText()
    email()
    endOfText()
}
Enter fullscreen mode Exit fullscreen mode

The DSL reads top to bottom like English. The compiler catches typos that strings never would. And it compiles to a standard kotlin.text.Regex at construction time — I measured 0 % overhead at match time vs raw Regex.

But the DSL is just the entry point. The features that earned the library its place in my own projects are the ones that come after you have a pattern.

describe() — your past self's apology

val pattern = Kexpresso.from("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}")
println(pattern.describe())
// → "one or more characters from [a-zA-Z0-9._%+-], then '@', then one or
//    more characters from [a-zA-Z0-9.-], then '.', then 2 or more letters"
Enter fullscreen mode Exit fullscreen mode

Kexpresso.from(regex) reverse-engineers a raw regex into the DSL by parsing it into the same AST the builder produces. describe() walks that AST and returns a plain explanation.

Drop it into your tooling, your code reviews, your logs.

analyze() — catch ReDoS before prod does

Regular expressions can be catastrophically slow. A pattern like (a+)+b against "aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaac" will hang for seconds, sometimes minutes. This is the ReDoS family of vulnerabilities — and it has taken down major sites (Cloudflare in 2019, Stack Overflow in 2016, plenty of others).

val pattern = kexpresso {
    capture { oneOrMore { char('a') } }.oneOrMore()
}
println(pattern.analyze())
// → vulnerable: nested quantifier `(a+)+` exhibits exponential backtracking on input "a…ac"
Enter fullscreen mode Exit fullscreen mode

The analyzer walks the AST looking for known dangerous shapes: nested quantifiers, ambiguous alternations, overlapping repeats. It runs in milliseconds and you can wire it into your tests or a pre-commit hook.

examples(n) — actual test data, not your tired imagination

Testing a regex by hand is the worst. You always pick the same five inputs.

val pattern = kexpresso {
    digit().exactly(3)
    char('-')
    digit().exactly(2)
}
println(pattern.examples(5, seed = 42))
// → ["847-19", "302-55", "613-88", "104-72", "975-33"]
Enter fullscreen mode Exit fullscreen mode

examples(count, seed) walks the AST and generates strings that satisfy matches(). Deterministic per seed — your tests stay reproducible. It supports Sequence, Literal, primitive tokens, quantifiers, groups, and alternations with guaranteed matches; it is best-effort on lookarounds, backreferences, and Raw nodes (it still produces strings, just without a match guarantee).

Domain helpers, because nobody enjoys writing IPv6

val webhook = kexpresso {
    startOfText()
    url()
    endOfText()
}

val log = kexpresso {
    ipv4()
    whitespace()
    capture("status") { digit().exactly(3) }
}
Enter fullscreen mode Exit fullscreen mode

Sixteen helpers shipped: email(), url(), ipv4(), ipv6(), uuid(), macAddress(), base64(), jwt(), iso8601Date(), and friends. All composable inside the DSL.

Kotlin Multiplatform: write once, regex everywhere

The full DSL lives in commonMain. Published targets:

Target Status
JVM published
JS (IR, Node.js) published
Wasm (wasmJs, Node.js) published
Native — Linux, Windows published
Native — macOS, iOS published

The portable test suite passes identically on every target. JVM-only regex constructs (\A, \z, atomic groups, possessive quantifiers) remain JVM-only and stay tested in jvmTest.

Install

// Gradle Kotlin DSL
dependencies {
    implementation("io.github.elzinko:kexpresso:0.8.0")
}
Enter fullscreen mode Exit fullscreen mode

It's on Maven Central — no token, no extra repository configuration.

The honest 0.x disclaimer

This is 0.8. I'm aiming for 1.0 once the API has soaked with external users. If you try it and the API rubs you the wrong way, please open an issue — that's exactly the signal I need before committing to SemVer stability.

The roadmap, the contribution guide, and the publishing setup all live under docs/. Pull requests welcome.

Links


If kexpresso saves you time, a ⭐ on the repo means the world. ☕

Top comments (0)