DEV Community

Akash Singh
Akash Singh

Posted on

I built a regex tester that tells you what your pattern actually does

TL;DR: Regex is powerful and unreadable. I built a tool that takes any regex pattern and breaks it into plain English, piece by piece. Below is how the explanation engine works, plus the three patterns every dev should actually understand.


I wrote this regex last year:

^(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$%^&*])[A-Za-z0-9!@#$%^&*]{8,}$
Enter fullscreen mode Exit fullscreen mode

Four months later I came back to that codebase. Read the pattern. Had absolutely no memory of what it does or why I wrote it that way.

And I wrote it.

This is the universal regex experience. You understand it for about 20 minutes after writing it, then it becomes ancient hieroglyphics permanently.

So I built a regex tester that doesn't just tell you "match" or "no match" — it explains every piece of the pattern in actual human words. You paste a regex, it tells you what each part does.

Here's how.

How the explanation engine works

Collectively, the regex syntax boils down to maybe 15-20 token types. Anchors, character classes, quantifiers, groups, lookaheads. The engine walks the pattern character by character and classifies each segment.

The core idea is a tokenizer that knows regex grammar:

function tokenize(pattern) {
  const tokens = [];
  let i = 0;

  while (i < pattern.length) {
    if (pattern[i] === "^") {
      tokens.push({ type: "anchor", value: "^", explain: "Start of string" });
      i++;
    } else if (pattern[i] === "$") {
      tokens.push({ type: "anchor", value: "$", explain: "End of string" });
      i++;
    } else if (pattern[i] === "[") {
      const end = pattern.indexOf("]", i);
      const charClass = pattern.slice(i, end + 1);
      tokens.push({
        type: "charClass",
        value: charClass,
        explain: describeCharClass(charClass),
      });
      i = end + 1;
    } else if (
      pattern[i] === "(" &&
      pattern[i + 1] === "?" &&
      pattern[i + 2] === "="
    ) {
      const end = findGroupEnd(pattern, i);
      const lookahead = pattern.slice(i, end + 1);
      tokens.push({
        type: "lookahead",
        value: lookahead,
        explain: `Must contain: ${describeLookahead(lookahead)}`,
      });
      i = end + 1;
    }
    // ... quantifiers, groups, escaped chars, etc.
  }
  return tokens;
}
Enter fullscreen mode Exit fullscreen mode

The describeCharClass function is where it gets interesting. [A-Z] becomes "any uppercase letter." [0-9] becomes "any digit." [A-Za-z0-9] becomes "any letter or digit." It knows common ranges and translates them.

The quantifiers are simpler: {8,} becomes "8 or more times," * becomes "zero or more," + becomes "one or more."

Chain the descriptions together and you get human-readable output.

Let's break down 3 common patterns

Pattern 1: Email (simplified)

^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Enter fullscreen mode Exit fullscreen mode

The explanation engine outputs:

  • ^ → Start of string
  • [a-zA-Z0-9._%+-]+ → One or more: letter, digit, dot, underscore, percent, plus, or hyphen
  • @ → Literal "@" character
  • [a-zA-Z0-9.-]+ → One or more: letter, digit, dot, or hyphen
  • \. → Literal dot
  • [a-zA-Z]{2,} → Two or more letters
  • $ → End of string

In plain English: "A string that starts with alphanumeric/special chars, followed by @, followed by a domain name, followed by a dot and at least two letters."

This won't catch every valid email — the actual email RFC is a nightmare that would need a regex the size of a paragraph. But for form validation? This handles 99% of real-world inputs.

Pattern 2: Phone number (US)

^\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Enter fullscreen mode Exit fullscreen mode

Explanation engine output:

  • ^ → Start of string
  • \(? → Optional opening parenthesis
  • \d{3} → Exactly 3 digits
  • \)? → Optional closing parenthesis
  • [-.\s]? → Optional separator (dash, dot, or space)
  • \d{3} → Exactly 3 digits
  • [-.\s]? → Optional separator
  • \d{4} → Exactly 4 digits
  • $ → End of string

Matches: (555) 123-4567, 555.123.4567, 555-123-4567, 5551234567

The key insight: ? after each separator and parenthesis makes them optional. That one character handles four different phone formats.

Pattern 3: URL

https?:\/\/[\w.-]+\.[a-zA-Z]{2,}(\/[\w./-]*)*
Enter fullscreen mode Exit fullscreen mode

Explanation engine output:

  • https? → "http" followed by optional "s"
  • :\/\/ → Literal "://"
  • [\w.-]+ → One or more: word character, dot, or hyphen (the domain)
  • \. → Literal dot
  • [a-zA-Z]{2,} → Two or more letters (TLD)
  • (\/[\w./-]*)* → Zero or more path segments

The s? at the start is doing the heavy lifting — it matches both http:// and https:// without needing to write the whole thing twice.

The part that surprised me: cron patterns

While building the regex explainer, I realized the same concept works for cron expressions. 0 9 * * 1-5 is equally unreadable for most people.

So I built a cron expression builder that works the same way — paste a cron string, get a human explanation back. "At 9:00 AM, Monday through Friday."

nexuslabs.website/tools/cron-builder — same idea, different syntax.

Why regex explanations matter more than regex skills

I used to think the goal was to get good enough at regex that I could read patterns fluently. After years of writing them, my honest conclusion: nobody reads regex fluently. People who claim they do are either lying or writing only basic patterns.

The practical skill isn't writing complex regex from memory. It's being able to break down any pattern you encounter — in legacy code, in Stack Overflow answers, in config files — and understand what it's trying to match.

That's the gap the tool fills. Not "learn regex" but "understand THIS regex right now."

Try it: nexuslabs.website/tools/regex-tester

Paste any pattern. It breaks it down piece by piece. Runs in your browser — no server, no signup.


What regex pattern took you the longest to debug? I once spent 45 minutes on a pattern that didn't work because of a single backslash. The kind of debugging where the fix is one character but finding it takes your entire afternoon.

Top comments (0)