Regular expressions solve text problems that string operations can't handle cleanly. Here's a practical guide with real examples rather than theory.
The basics
A regex pattern is a sequence of characters that defines a search pattern. Some characters have special meaning:
| Character | Meaning |
|---|---|
. |
Any character (except newline by default) |
* |
Zero or more of the preceding |
+ |
One or more of the preceding |
? |
Zero or one of the preceding |
^ |
Start of string (or line with m flag) |
$ |
End of string (or line with m flag) |
\d |
Any digit (0-9) |
\w |
Word character (letter, digit, underscore) |
\s |
Whitespace |
\D |
Non-digit |
\W |
Non-word character |
[abc] |
Any of a, b, c |
[a-z] |
Any lowercase letter |
[^abc] |
Not a, b, or c |
(abc) |
Capturing group |
(?:abc) |
Non-capturing group |
| `a | b` |
Practical examples
Validate an email address
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
emailRegex.test('user@example.com'); // true
emailRegex.test('not-an-email'); // false
Note: This is a simple, practical check. A fully RFC 5321-compliant email regex is hundreds of characters long and overkill for most uses. The above catches the most common invalid formats.
Extract URLs from text
const urlRegex = /https?:\/\/[^\s<>"]+/g;
const text = 'Visit https://example.com or http://other.org for more.';
const urls = text.match(urlRegex);
// → ['https://example.com', 'http://other.org']
Validate a phone number (flexible)
// Matches formats: 555-555-5555, (555) 555-5555, 555 555 5555, +1 555 555 5555
const phoneRegex = /^[\+]?[(]?[0-9]{3}[)]?[-\s\.]?[0-9]{3}[-\s\.]?[0-9]{4,6}$/;
Replace all occurrences
// Using replace with /g flag
const result = 'hello world hello'.replace(/hello/g, 'hi');
// → 'hi world hi'
// With capture groups
const date = '2026-05-30';
const formatted = date.replace(/(\d{4})-(\d{2})-(\d{2})/, '$2/$3/$1');
// → '05/30/2026'
Split on multiple delimiters
'a,b;c d'.split(/[,;\s]+/);
// → ['a', 'b', 'c', 'd']
Extract named capture groups
const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-05-30'.match(pattern);
const { year, month, day } = match.groups;
// year='2026', month='05', day='30'
Strip HTML tags
'<p>Hello <strong>world</strong></p>'.replace(/<[^>]*>/g, '');
// → 'Hello world'
Note: This is fine for simple cases, but for complex HTML (nested tags, attribute values with >), use a proper HTML parser.
Validate hex color codes
const hexColor = /^#([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})$/;
hexColor.test('#fff'); // true
hexColor.test('#2f855a'); // true
hexColor.test('#GGGGGG'); // false (G is not hex)
hexColor.test('2f855a'); // false (missing #)
Extract markdown links
const mdLinkRegex = /\[([^\]]+)\]\(([^)]+)\)/g;
const markdown = 'See [example](https://example.com) and [doc](docs.html).';
for (const match of markdown.matchAll(mdLinkRegex)) {
console.log(`Text: ${match[1]}, URL: ${match[2]}`);
}
// Text: example, URL: https://example.com
// Text: doc, URL: docs.html
Python regex
Python uses the re module:
import re
# Match (only at start of string)
re.match(r'\d+', '123abc') # match object
re.match(r'\d+', 'abc123') # None
# Search (anywhere in string)
re.search(r'\d+', 'abc123') # matches '123'
# Find all
re.findall(r'\d+', 'abc 123 def 456')
# → ['123', '456']
# Substitute
re.sub(r'\s+', '-', 'hello world foo')
# → 'hello-world-foo'
# With flags
re.findall(r'python', 'Python is PYTHON', re.IGNORECASE)
# → ['Python', 'PYTHON']
# Named groups
pattern = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
m = pattern.match('2026-05-30')
m.group('year') # '2026'
m.groupdict() # {'year': '2026', 'month': '05', 'day': '30'}
# Split
re.split(r'[,;\s]+', 'a,b;c d')
# → ['a', 'b', 'c', 'd']
Flags
| Flag | JavaScript | Python | Effect |
|---|---|---|---|
| Case insensitive | /pattern/i |
re.IGNORECASE |
A matches a
|
| Global (find all) | /pattern/g |
(default in findall) | Don't stop at first match |
| Multiline | /pattern/m |
re.MULTILINE |
^ and $ match line boundaries |
| Dotall | /pattern/s |
re.DOTALL |
. matches newlines |
| Extended | – | re.VERBOSE |
Allows whitespace/comments in pattern |
Lookahead and lookbehind
// Positive lookahead: match 'price' only if followed by '$'
/price(?=\$)/.test('price$100') // true
/price(?=\$)/.test('price100') // false
// Negative lookahead: match 'price' NOT followed by '$'
/price(?!\$)/.test('price100') // true
// Positive lookbehind: match digits preceded by '$'
/(?<=\$)\d+/.exec('$100') // matches '100'
// Negative lookbehind
/(?<!\$)\d+/.exec('100') // matches '100' (no $ before)
Common gotchas
Greedy vs lazy:
'<a><b><c>'.match(/<.+>/)[0] // '<a><b><c>' (greedy — matches as much as possible)
'<a><b><c>'.match(/<.+?>/)[0] // '<a>' (lazy — matches as little as possible)
Escaping special characters:
// To match a literal '.', escape it
'1.2.3'.split('.') // ['1', '2', '3'] — string split, works
'1.2.3'.split(/\./) // ['1', '2', '3'] — regex, escaped dot
'1.2.3'.split(/./) // ['', '', '', '', '', ''] — unescaped, matches any char
The regex101.com shortcut: For complex patterns, paste into regex101.com — it shows match groups, flags, and explanations inline, and lets you test against multiple strings.
Regex has a reputation for being hard to read, but for the common cases (validation, extraction, replacement), simple patterns solve 90% of the work. Start with the simplest pattern that works and add complexity only when the simpler version fails.
Top comments (0)