Every backend developer ends up needing the same regex patterns over and over: email validation, URL extraction, number parsing, ID format checks, log parsing. Most rewrite the patterns each time. Most end up with subtle inconsistencies between projects. Keeping a personal snippets file of regex you have actually tested and used is one of those small productivity wins that compounds across years.
This is a curated list of ten regex patterns that show up in nearly every backend codebase, with the dialect notes and gotchas that matter. Use these as starting points; tune them to your specific inputs before shipping.
1. Pragmatic Email Validation
^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}$
Accepts the vast majority of real email addresses, rejects obviously malformed input, and stays simple enough to avoid backtracking issues. The full RFC 5322 spec is a 6,000-character monster that nobody needs. The W3C HTML specification for the input type="email" element uses a very similar pragmatic pattern.
Trade-off: rejects RFC-valid quoted local parts ("foo bar"@example.com), which essentially no real users ever type. Worth it.
2. URL Validation (HTTP/HTTPS Only)
^https?:\/\/([\w-]+\.)+[\w-]+(:\d+)?(\/[\w\-._~:\/?#[\]@!$&'()*+,;=%]*)?$
Matches HTTP and HTTPS URLs with optional port and path. The path character class includes the RFC 3986 unreserved and reserved characters that URLs are allowed to contain.
Gotcha: this validates format only, not reachability. A URL that matches this pattern can still 404 or DNS-fail. For real production validation, use the language's URL parser (URL in JavaScript, urllib.parse in Python).
3. UUID v4
^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
Validates the UUID v4 format specifically. The 4 in the third group and the [89ab] in the fourth group are version and variant bits required by the v4 spec. A pattern that accepts any 32-hex-with-dashes pattern would also accept v1, v3, and v5 UUIDs.
For case-insensitive matching of uppercase UUIDs, add the i flag or expand the character class to [0-9a-fA-F].
4. ISO 8601 Date
^\d{4}-\d{2}-\d{2}(T\d{2}:\d{2}:\d{2}(\.\d+)?(Z|[+-]\d{2}:?\d{2})?)?$
Matches ISO 8601 dates and datetimes, including optional fractional seconds and timezone offsets. Both Z (UTC) and +05:00-style offsets are accepted.
Gotcha: this is format validation only. A pattern-valid date like 2026-13-32 is still nonsensical. After regex passes, parse with a real date library to catch semantic errors.
5. US Phone Number (Permissive)
^(\+1[-.\s]?)?\(?\d{3}\)?[-.\s]?\d{3}[-.\s]?\d{4}$
Matches US phone numbers in common formats: 1234567890, 123-456-7890, (123) 456-7890, +1-123-456-7890. The optional country code and varied separators reflect what users actually type.
For international phone validation, this pattern is wrong. Use a library like libphonenumber, which handles every country's quirks. Regex alone cannot encode international phone format rules.
6. Strong Password Format
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$
Requires at least one lowercase letter, one uppercase letter, one digit, one symbol, and 8+ characters total. Uses positive lookaheads to enforce each requirement independently.
Modern security guidance from groups like NIST and OWASP has shifted away from composition rules in favor of length and breach-list checks, but composition checks remain common in production. Apply this for systems that still require them; do not assume it represents current best practice.
7. Hex Color Code
^#?([A-Fa-f0-9]{6}|[A-Fa-f0-9]{3})$
Accepts both 3-digit and 6-digit hex colors, with or without the leading #. Useful for color picker validation, CSS parsing, design tool input.
For modern CSS, this regex does not cover rgb(), rgba(), hsl(), or named colors. Hex remains the most common format for stored color values, so this pattern handles most cases.
8. Whitespace Collapse
\s+
Combined with the g flag and .trim(), this collapses runs of whitespace into single spaces and strips leading and trailing whitespace. The most-used regex in any string-normalization pipeline.
text.replace(/\s+/g, ' ').trim()
Gotcha: \s does not match all Unicode whitespace in JavaScript regex. For text that may include non-breaking spaces, em spaces, or ideographic spaces, use a broader character class.
9. Extract Markdown Link
\[([^\]]+)\]\(([^)]+)\)
Captures the link text (group 1) and URL (group 2) from standard markdown links. Useful for processing markdown content programmatically, building link analyzers, or migrating content between formats.
Breaks on links with literal parentheses in the URL (anything where the URL itself contains an unescaped closing parenthesis confuses the simple character class). For full markdown parsing, a real markdown parser is the right tool. This pattern works for the standard cases.
10. Pull Numbers From Text
-?\d+(\.\d+)?
Matches signed integers and decimals. Useful for log parsing, financial data extraction, or any context where mixed text and numbers need separating.
For numbers with thousand separators:
-?\d{1,3}(,\d{3})*(\.\d+)?
For numbers in scientific notation:
-?\d+(\.\d+)?([eE][+-]?\d+)?
Pick the variant that matches your input format. The most common bug is using the basic pattern on data that includes thousand separators and getting truncated values.
Bonus: Trim and Normalize in One Pass
A useful idiom that combines patterns 8 and others: trim leading and trailing whitespace and collapse internal whitespace in a single operation. In JavaScript:
text.trim().replace(/\s+/g, ' ')
In Python:
re.sub(r"\s+", " ", text.strip())
This shows up constantly in user-input normalization paths. Worth memorizing rather than rewriting each time.
A related pattern: stripping zero-width characters that sneak in via copy-paste from word processors and PDFs. Add [-] to your normalization regex to remove zero-width spaces, joiners, and BOMs. These cause real bugs in downstream systems that compare strings byte-for-byte, and they are invisible in most editors.
How to Maintain These Patterns
A snippets file is only useful if you maintain it. A few practices that keep it useful over time:
- One file per language or one file with language-tagged sections. JavaScript regex literals and Python raw strings have different syntax but the same patterns.
- Comments on every pattern explaining what it accepts, what it rejects, and what it does not try to do.
- Linked test cases somewhere accessible (a gist, a repo, a tests file in your dotfiles).
- A changelog of when patterns were updated and why. The change history is valuable when you wonder why a pattern looks the way it does.
Pair the snippets file with regex testing tools like regex101.com for testing modifications before saving them back. The combination of a tested snippets library and an interactive tester turns regex from a guessing game into a quick lookup.
When These Patterns Are Not Enough
These ten cover the most common needs but not every need. Cases where you need to reach beyond simple regex:
-
Internationalized inputs. Unicode-aware regex is harder than ASCII regex. Use Unicode property escapes (
\p{L}for any letter) where the dialect supports them. - Recursive structures. Nested HTML, balanced brackets, nested function calls. Regex cannot parse these correctly. Use a real parser.
- Context-sensitive validation. "Valid if this other field equals X" requires more than regex. Use a schema validator.
-
Performance-sensitive paths. Compile patterns once and reuse them. In Python,
re.compile(). In Java,Pattern.compile(). In JavaScript, the literal form is automatically cached butnew RegExp(...)inside a loop is not.
For production data validation work at 137Foundry, these patterns are the starting point of a layered approach: cheap regex format checks at the boundary, then more expensive validation layers that handle the cases regex cannot.
For more on the validation patterns that go around these regex snippets in production systems, see the full article Regex Code Snippets: Patterns for Common Validation and Parsing Problems. The 137Foundry data integration service covers the architectural side of validation in production pipelines, and the services hub describes related integration and automation work.
Top comments (0)