Forem

Thesius Code
Thesius Code

Posted on • Originally published at datanest-stores.pages.dev

Regex Cheatsheet & Patterns: Regex Cheatsheet & Pattern Library

Regex Cheatsheet & Pattern Library

Regular expressions don't have to be write-only code. This pack provides a complete regex syntax reference plus a library of 100+ tested, commented patterns organized by use case — input validation, log parsing, data extraction, URL manipulation, and text transformation. Each pattern includes the regex, an explanation of what each part does, test cases that pass and fail, and language-specific usage notes for Python, JavaScript, and Go.

What's Included

  • Syntax Reference — Anchors, quantifiers, character classes, groups, lookaheads, lookbehinds
  • Input Validation — Email, phone, URL, IP address, credit card, password strength, UUID
  • Log Parsing — Apache/Nginx logs, syslog, JSON log lines, stack traces, timestamps
  • Data Extraction — HTML tags, CSV fields, key-value pairs, version numbers, file paths
  • Text Transformation — Case conversion, whitespace normalization, comment stripping, slug generation
  • Language-Specific Guides — Python re, JavaScript RegExp, Go regexp, PCRE differences
  • Performance Tips — Catastrophic backtracking, atomic groups, possessive quantifiers
  • Interactive Test Cases — Every pattern includes strings that should match and strings that should not

Preview / Sample Content

Regex Syntax — Quick Reference

ANCHORS
^       Start of string (or line in multiline mode)
$       End of string (or line in multiline mode)
\b      Word boundary
\B      Not a word boundary

QUANTIFIERS
*       0 or more (greedy)
+       1 or more (greedy)
?       0 or 1 (greedy)
{3}     Exactly 3
{3,}    3 or more
{3,7}   Between 3 and 7
*?      0 or more (lazy / non-greedy)
+?      1 or more (lazy / non-greedy)

CHARACTER CLASSES
.       Any character (except newline)
\d      Digit [0-9]
\D      Not a digit
\w      Word character [a-zA-Z0-9_]
\W      Not a word character
\s      Whitespace [ \t\n\r\f]
\S      Not whitespace
[abc]   Character set (a, b, or c)
[^abc]  Negated set (not a, b, or c)
[a-z]   Range

GROUPS & REFERENCES
(abc)       Capturing group
(?:abc)     Non-capturing group
(?P<name>)  Named group (Python)
(?<name>)   Named group (JS/PCRE)
\1          Back-reference to group 1
(?=abc)     Positive lookahead
(?!abc)     Negative lookahead
(?<=abc)    Positive lookbehind
(?<!abc)    Negative lookbehind
Enter fullscreen mode Exit fullscreen mode

Validation Patterns — The Ones You'll Use Most

import re

# Email (RFC 5322 simplified — covers 99.9% of real addresses)
EMAIL = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
# ✅ user@example.com, first.last+tag@company.co.uk
# ❌ @example.com, user@.com, user@com

# URL (http/https with optional path, query, fragment)
URL = r'^https?://[a-zA-Z0-9][-a-zA-Z0-9]*(\.[a-zA-Z0-9][-a-zA-Z0-9]*)+(:\d+)?(/[-a-zA-Z0-9._~:/?#\[\]@!$&\'()*+,;=%]*)?$'
# ✅ https://example.com, http://sub.example.com:8080/path?q=1
# ❌ ftp://example.com, example.com, http://

# IPv4 Address
IPV4 = r'^(?:(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)\.){3}(?:25[0-5]|2[0-4]\d|1\d{2}|[1-9]?\d)$'
# ✅ 192.168.1.1, 10.0.0.255, 0.0.0.0
# ❌ 256.1.1.1, 192.168.1, 1.2.3.4.5

# UUID v4
UUID_V4 = r'^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$'
# ✅ 550e8400-e29b-41d4-a716-446655440000
# ❌ 550e8400-e29b-51d4-a716-446655440000 (version 5, not 4)

# Strong password (8+ chars, upper, lower, digit, special)
STRONG_PASS = r'^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*])[A-Za-z\d!@#$%^&*]{8,}$'
# ✅ MyP@ss1rd, Str0ng!Pass
# ❌ password, Pass1234, SHORT!1a

# Semantic version (major.minor.patch with optional pre-release)
SEMVER = r'^(0|[1-9]\d*)\.(0|[1-9]\d*)\.(0|[1-9]\d*)(?:-([\da-zA-Z-]+(?:\.[\da-zA-Z-]+)*))?$'
# ✅ 1.0.0, 2.1.3, 1.0.0-alpha.1
# ❌ 1.0, v1.0.0, 01.0.0
Enter fullscreen mode Exit fullscreen mode

Log Parsing Patterns

# Apache/Nginx combined log format
APACHE_LOG = r'(?P<ip>\S+) \S+ \S+ \[(?P<time>[^\]]+)\] "(?P<method>\S+) (?P<path>\S+) \S+" (?P<status>\d{3}) (?P<size>\d+|-)'
# Extracts: ip, timestamp, method, path, status code, response size

# ISO 8601 timestamp
ISO_TIMESTAMP = r'\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(?:\.\d+)?(?:Z|[+-]\d{2}:?\d{2})'
# ✅ 2024-01-15T14:30:00Z, 2024-01-15T14:30:00.123+05:30

# Python/Java stack trace (first line)
STACK_TRACE = r'^(?:Traceback|Exception|Error|Caused by:|\s+at\s+|File\s+")'

# Key-value pairs from log messages
KV_PAIRS = r'(\w+)=("(?:[^"\\]|\\.)*"|\S+)'
# Matches: user="John Doe" action=login status=200 duration=1.23s
Enter fullscreen mode Exit fullscreen mode

Quick Reference Table

Pattern Regex Matches
Digits only ^\d+$ 123, 0, 99999
Hex color ^#[0-9a-fA-F]{6}$ #FF5733, #000000
Date (YYYY-MM-DD) ^\d{4}-\d{2}-\d{2}$ 2024-01-15
Time (HH:MM:SS) ^\d{2}:\d{2}:\d{2}$ 14:30:00
Slug ^[a-z0-9]+(-[a-z0-9]+)*$ my-page-slug
File extension \.([a-zA-Z0-9]+)$ .py, .tar.gz (captures gz)
HTML tag <([a-zA-Z][a-zA-Z0-9]*)\b[^>]*> <div class="x">
Whitespace trim `^\s+\ \s+$`
Duplicate words \b(\w+)\s+\1\b the the, is is
Non-ASCII [^\x00-\x7F] é, ñ,

Comparison: Regex Engines

Feature Python re JavaScript Go regexp PCRE/PCRE2
Lookahead Yes Yes No Yes
Lookbehind Fixed-width Yes (ES2018+) No Variable-width
Named Groups (?P<n>) (?<n>) (?P<n>) Both syntaxes
Unicode \p{L} (regex module) \p{L} with /u \p{L} native \p{L}
Atomic Groups No No No (?>...)
Verbose Mode re.VERBOSE No No (?x)
Recursion No No No (?R)

Usage Tips

  1. Start with the validation patterns — email, URL, and IP patterns are battle-tested and handle edge cases.
  2. Use named groups ((?P<name>...) in Python) — they make regex self-documenting and results easier to access.
  3. Always use raw strings in Python (r'pattern') to avoid backslash escaping issues.
  4. Test with the included test cases — every pattern lists strings that should and should not match.
  5. Read the performance section before using regex in loops — catastrophic backtracking can freeze your program.

This is 1 of 11 resources in the Cheatsheet Reference Pro toolkit. Get the complete [Regex Cheatsheet & Patterns] with all files, templates, and documentation for $9.

Get the Full Kit →

Or grab the entire Cheatsheet Reference Pro bundle (11 products) for $79 — save 30%.

Get the Complete Bundle →


Related Articles

Top comments (0)