DEV Community

Michael Lip
Michael Lip

Posted on • Originally published at zovo.one

Stop Writing Regex by Hand: Generate Patterns From Examples

Regular expressions are a write-only language. The joke persists because it's true. A regex that took 20 minutes to write takes 40 minutes to understand six months later. The cognitive overhead of regex syntax, backtracking behavior, greedy vs lazy quantifiers, lookaheads, character class shorthand, creates a gap between "I know what pattern I want" and "I can express it correctly."

The problem with manual regex

Consider matching a US phone number. The "simple" pattern is \d{3}-\d{3}-\d{4}. But real phone numbers come in formats like (555) 123-4567, 555.123.4567, 555 123 4567, +1-555-123-4567, and 5551234567.

A regex that handles all formats:

^(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Enter fullscreen mode Exit fullscreen mode

This took me several iterations to write. Each optional separator, the optional country code, the optional parentheses around the area code. And I haven't accounted for extensions, international formats, or the fact that some area codes start with 0 or 1 (which are invalid in the North American Numbering Plan).

The example-based approach

Instead of writing the pattern, provide examples of what should match and what shouldn't:

Match: "555-123-4567", "(555) 123-4567", "5551234567"
Don't match: "555-1234", "abc-def-ghij", "555-123-45678"

From these examples, a generator can infer the pattern. It identifies the common structure (three digits, separator, three digits, separator, four digits) and the optional elements (parentheses, varying separators).

This approach is more accessible to non-regex-experts and often produces cleaner patterns because the generator optimizes for simplicity.

Common patterns everyone needs

These come up in almost every web application:

Email: The RFC 5322 compliant regex is 6,000+ characters long. No one uses it. The practical pattern /^[^\s@]+@[^\s@]+\.[^\s@]+$/ catches 99.9% of valid emails with minimal false positives.

URL: https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*) covers most HTTP/HTTPS URLs.

IPv4: ^((25[0-5]|(2[0-4]|1\d|[1-9]|)\d)\.?\b){4}$ validates the range 0-255 for each octet.

Date (MM/DD/YYYY): ^(0[1-9]|1[0-2])\/(0[1-9]|[12]\d|3[01])\/\d{4}$ handles zero-padded dates but doesn't validate February 30th.

Regex performance pitfalls

Catastrophic backtracking is the most dangerous regex bug. The pattern (a+)+$ applied to the string "aaaaaaaaaaaaaaaaaaaaab" causes exponential backtracking. Each 'a' can be matched by the inner a+ or the outer (a+)+, creating 2^n possible match paths.

A regex generator that understands these pitfalls can produce equivalent patterns that avoid catastrophic backtracking by using atomic groups or possessive quantifiers where supported.

I built a regex generator at zovo.one/free-tools/regex-generator that creates patterns from example strings, includes a library of common patterns, and tests your regex against sample data in real time. It bridges the gap between knowing what you want to match and expressing it in regex syntax.

I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.

Top comments (0)