Regular expressions are one of those tools that sit quietly in every developer's toolkit until the moment you need to validate an input, parse a log file, or extract data from a messy string. Then they become indispensable.
This cheat sheet covers everything from basic syntax to advanced patterns you can copy and paste directly into your projects. If you want to test any of these patterns interactively, I built a free online regex tester that highlights matches in real time.
Basic Metacharacters
Before diving into complex patterns, here is a quick refresher on the fundamental building blocks.
| Symbol | Meaning | Example | Matches |
|---|---|---|---|
. |
Any character except newline | a.c |
abc, a9c, a-c
|
\d |
Any digit (0-9) | \d{3} |
123, 007
|
\D |
Any non-digit | \D+ |
hello, ---
|
\w |
Word character (a-z, A-Z, 0-9, _) | \w+ |
hello_world |
\W |
Non-word character | \W |
@, #,
|
\s |
Whitespace (space, tab, newline) | \s+ |
tabs, spaces |
\S |
Non-whitespace | \S+ |
hello |
\\ |
Escape special character | \. |
literal .
|
Character Classes
Character classes let you define a custom set of characters to match against.
[abc] # matches a, b, or c
[a-z] # matches any lowercase letter
[A-Z0-9] # matches uppercase letter or digit
[^abc] # matches anything EXCEPT a, b, or c
[a-zA-Z] # matches any letter regardless of case
You can combine ranges freely. A practical example -- matching a hex color code:
#[0-9a-fA-F]{6}\b
This matches strings like #FF5733 or #1a2b3c while the \b word boundary prevents partial matches.
Quantifiers
Quantifiers control how many times a preceding element must appear.
| Quantifier | Meaning | Example | Matches |
|---|---|---|---|
* |
0 or more | ab*c |
ac, abc, abbc
|
+ |
1 or more | ab+c |
abc, abbc (not ac) |
? |
0 or 1 | colou?r |
color, colour
|
{n} |
Exactly n | \d{4} |
2026 |
{n,} |
n or more | \d{2,} |
12, 123, 9999
|
{n,m} |
Between n and m | \d{2,4} |
12, 123, 1234
|
Greedy vs. Lazy Matching
By default, quantifiers are greedy -- they match as much as possible. Append ? to make them lazy.
# Greedy (default)
<.+> applied to "<b>bold</b>" matches "<b>bold</b>"
# Lazy
<.+?> applied to "<b>bold</b>" matches "<b>" then "</b>"
This distinction matters enormously when parsing HTML or any content with delimiters. Greedy matching is one of the most common sources of regex bugs.
Anchors and Boundaries
Anchors do not match characters; they match positions within the string.
| Anchor | Meaning | Example |
|---|---|---|
^ |
Start of string (or line with m flag) |
^Hello |
$ |
End of string (or line with m flag) |
world$ |
\b |
Word boundary |
\bcat\b matches "cat" but not "catch" |
\B |
Non-word boundary |
\Bcat matches "catch" but not "cat" |
Anchors are essential for validation. Without them, a pattern like \d{5} would match the first five digits inside 123456789. Adding anchors makes it strict:
^\d{5}$ # matches exactly a 5-digit string
Groups and Capturing
Parentheses create groups for extraction or applying quantifiers to a sequence.
# Capturing group
(foo)bar # captures "foo", matches "foobar"
# Non-capturing group
(?:foo)bar # groups without capturing, matches "foobar"
# Named group
(?<year>\d{4}) # captures into a group named "year"
# Backreference
(\w+)\s\1 # matches repeated words like "the the"
Alternation
The pipe | operator works like a logical OR inside groups:
(cat|dog|bird) # matches "cat", "dog", or "bird"
(?:jpg|png|gif) # same but without capturing
Lookahead and Lookbehind
Lookarounds assert that a pattern exists ahead of or behind the current position without consuming characters. They are zero-width assertions.
| Type | Syntax | Meaning |
|---|---|---|
| Positive lookahead | (?=...) |
Followed by ... |
| Negative lookahead | (?!...) |
NOT followed by ... |
| Positive lookbehind | (?<=...) |
Preceded by ... |
| Negative lookbehind | (?<!...) |
NOT preceded by ... |
# Match "foo" only if followed by "bar"
foo(?=bar) # matches "foo" in "foobar", not in "foobaz"
# Match a number NOT preceded by a dollar sign
(?<!\$)\d+ # matches "42" in "test 42" but not in "$42"
# Password validation: at least one digit ahead
(?=.*\d) # used as part of a larger pattern
A classic use case is password strength validation. This pattern requires at least 8 characters, one uppercase, one lowercase, one digit, and one special character:
^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$
You can drop this into the regex tester to experiment with different password strings and see which ones pass.
Flags / Modifiers
Flags change how the engine interprets your pattern.
| Flag | Name | Effect |
|---|---|---|
g |
Global | Find all matches, not just the first |
i |
Case-insensitive |
a matches A
|
m |
Multiline |
^ and $ match line boundaries |
s |
Dotall |
. matches newline characters |
u |
Unicode | Enables full Unicode matching |
x |
Extended | Allows whitespace and comments in pattern |
In JavaScript:
const pattern = /hello/gi; // global, case-insensitive
const matches = text.matchAll(pattern);
In Python:
import re
matches = re.findall(r'hello', text, re.IGNORECASE | re.MULTILINE)
Common Patterns You Will Actually Use
Here are battle-tested patterns for everyday validation tasks. Each one is ready to copy into your codebase.
Email Address
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Handles most real-world email formats. For RFC 5322 full compliance you would need a much longer pattern, but this covers the vast majority of valid addresses.
URL
^https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()@:%_\+.~#?&//=]*)$
Matches both HTTP and HTTPS URLs with optional www prefix, path, and query parameters.
IPv4 Address
^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$
Validates each octet is between 0 and 255. Rejects strings like 999.999.999.999.
IPv6 Address
^([0-9a-fA-F]{1,4}:){7}[0-9a-fA-F]{1,4}$
Matches the full expanded form. Compressed forms with :: require a more involved pattern.
Phone Number (International)
^\+?[1-9]\d{1,14}$
Follows the E.164 international format. Allows an optional leading + and 2 to 15 digits.
US Phone Number (Formatted)
^\(?(\d{3})\)?[-.\s]?(\d{3})[-.\s]?(\d{4})$
Matches (555) 123-4567, 555-123-4567, 555.123.4567, and 5551234567.
Date (YYYY-MM-DD)
^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$
Validates ISO 8601 date format with basic month and day range checks.
Time (HH:MM, 24-hour)
^([01]\d|2[0-3]):([0-5]\d)$
Matches 00:00 through 23:59.
UUID v4
^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
Validates the structure including the version-4 indicator and the variant bits.
Slug (URL-friendly string)
^[a-z0-9]+(?:-[a-z0-9]+)*$
Matches my-blog-post but rejects My Blog Post or --double-dash.
HTML Tag
<\/?[\w\s]*>|<.+[\W]>
Basic tag matching. For real HTML parsing, always prefer a proper DOM parser, but this works for quick searches.
Language-Specific Tips
JavaScript
// Named groups (ES2018+)
const datePattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-02-27'.match(datePattern);
console.log(match.groups.year); // "2026"
// String.matchAll for multiple matches
const emails = [...text.matchAll(/[\w.+-]+@[\w-]+\.[\w.]+/g)];
Python
import re
# Compile for reuse
email_re = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')
# Named groups
m = re.match(r'(?P<area>\d{3})-(?P<rest>\d{4})', '555-1234')
print(m.group('area')) # "555"
# re.VERBOSE for readable patterns
phone_re = re.compile(r'''
^\+? # optional plus
[1-9]\d{0,2} # country code
[-.\s]? # separator
\d{3,14}$ # number
''', re.VERBOSE)
Go
package main
import (
"fmt"
"regexp"
)
func main() {
re := regexp.MustCompile(`^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$`)
fmt.Println(re.MatchString("dev@example.com")) // true
}
Performance Considerations
Regex engines can be fast or catastrophically slow depending on the pattern. Keep these principles in mind:
Avoid catastrophic backtracking. Patterns like
(a+)+$can cause exponential time on non-matching inputs. Use atomic groups or possessive quantifiers where your engine supports them.Be specific.
[a-zA-Z0-9]is faster than.*because the engine does not have to consider every possible character.Anchor your patterns. Adding
^and$prevents the engine from attempting matches at every position in the string.Compile once, use many times. In Python use
re.compile(). In Go useregexp.MustCompile(). Avoid recompiling the same pattern inside loops.Test with edge cases. An empty string, a very long string, and strings with special characters will expose most regex bugs. Use a regex tester to validate patterns before committing them to production code.
Quick Reference Card
. any character \d digit \s whitespace
^ start of string \D non-digit \S non-whitespace
$ end of string \w word char \b word boundary
* 0 or more + 1 or more ? 0 or 1
{n} exactly n {n,m} n to m {n,} n or more
[abc] char class [^abc] negated (x|y) alternation
(...) capture group (?:...) non-capture
(?=...) positive lookahead (?!...) negative lookahead
(?<=...) positive lookbehind (?<!...) negative lookbehind
Wrapping Up
Regex is a skill that rewards practice. The patterns above cover the scenarios that come up most frequently in day-to-day development. Bookmark this page or save it somewhere accessible -- the next time you need to validate an email or parse a date, you will not have to start from scratch.
For a deeper dive with interactive examples, check out the full Regex Cheat Sheet on our blog. And whenever you are building or debugging a pattern, the Regex Tester can save you a lot of trial-and-error time.
Happy matching.
Top comments (0)