Summary
Regex is search on steroids. This post demonstrates how to create powerful searches by example.
Examples start basic and build up to more complex expressions. They are designed for devs who search via their IDE.
We'll use this text throughout (demo):
Alabama (AL) Birmingham (Dec 14, 1819) 4,903,185
Hawaii* (HI) Honolulu (Aug 21, 1959) 1,415,872
Michigan (MI) Detroit (Jan 26, 1837) 9,986,857
North Dakota (ND) Fargo (Nov 2, 1889) 762,062
Wyoming* (WY) Cheyenne (Jul 10, 1890) 578,759
(state, id, largest city, founding date, population)
How to use this guide
- Browse quickly by skimming the picture/title. There are many examples.
- If you're a beginner start at the top
- Find more advanced/interesting examples further down
- Examples next to each other are related
- Experiment by testing the examples yourself
- Find a cheat sheet at the bottom
Setup
Follow along here or via your IDE:
- Open a modern IDE (I use VSCode)
- Paste the example
- Open search (
ctrl + f
orcmd + f
) - Enable regex (usually a
.*
icon)
Series
This is the first part in a short series I'll be making. Following guides will cover real case studies and more useful concepts like regex replace.
Follow the newsletter or hit me up on Twitter for more.
Basic matches
Letters [a-zA-Z]
-
[a-z]
lowercase letters -
[A-Z]
uppercase letters - Casing only matters when the 'match case' (
Aa
) option is enabled in VSCode - 102 matches found because it's matching
a-z
characters
Words [a-zA-Z]+
-
[a-zA-Z]
letters -
+
repeats the match for consecutive characters - 21 matches found because it's matching
a-z
words
Specific words (Jan|Jul|Dec)
-
(Jan|Jul|Dec)
matchesJan
,Jul
, orDec
specifically
2 numbers [0-9]{2}
-
[0-9]
numbers -
{2}
match twice - Note long numbers contain multiple matches
4 numbers [0-9]{4}
-
[0-9]
numbers -
{4}
match 4 times
2-3 letters [a-z]{2,3}
-
[a-z]
letters -
{2,3}
match between 2 to 3 times (inclusive) - Note long words contain multiple matches
6+ letters [a-z]{6,}
-
[a-z]
letters -
{6,}
match 6 or more times (inclusive)
3 letters/numbers \w{3}
-
\w
letters and numbers (see special chars) -
{3}
match 3 times - Note long words contain multiple matches
3 whole letters/numbers \b\w{3}\b
-
\w{3}
match 3 letters and numbers -
\b
word boundaries (see special chars) - Note long words don't contain multiple matches
3 whole letter words \b[a-z]{3}\b
-
[a-z]{3}
match 3 letters -
\b
word boundaries
Two words [a-zA-Z]+\s[a-zA-Z]+
- Looks scarier than it is. The form is
word space word
-
[a-zA-Z]+
word -
\s
space (see special chars)
One or two words [a-zA-Z]+(\s[a-zA-Z]+)?
- Looks scarier than it is. The form is
word (space word)?
-
[a-zA-Z]+
word -
\s
space -
( ... )?
optional - Note
North Dakota
is considered one match now
Wildcards
Everything in brackets (greedy) \(.*\)
-
\(
and\)
match brackets (see special chars) -
.*
greedy wildcard - This greedy wildcard will match up to the last
)
bracket
Everything in brackets (non-greedy) \(.*?\)
-
\(
and\)
match brackets -
.*?
non-greedy wildcard - This non-greedy wildcard will match up to the first
)
bracket
Lines with the *
character ^.*\*.*$
-
^
and$
match the start/end of the line (optional) -
.*
wildcard -
\*
the star*
character (see special chars)
Lines without the *
character ^[^\*]+$
-
^
and$
match the start/end of the line -
[^ ... ]
matches anything not in the brackets-
\*
the star*
character -
[^\*]
matches anything not a*
character
-
-
+
repeats the match for consecutive characters
All lines with the e
character ^.*[e].*$
-
^
and$
match the start/end of the line -
.*
wildcard -
[e]
the lettere
All lines without the e
character ^[^e]+$
-
^
and$
match the start/end of the line -
[^ ... ]
matches anything not in the brackets-
[^e]
matches anything not ane
character
-
-
+
repeats the match for consecutive characters
Brackets starting with certain words \((Jan|Jul|Dec).*\)
-
\(
and\)
match brackets -
(Jan|Jul|Dec)
matchesJan
,Jul
, orDec
words -
.*
wildcard
Mixed matches
The short date in brackets [a-z]{3}\s+[0-9]+
-
[a-z]{3}
3 letters exactly -
\s+
one or more spaces -
[0-9]+
one or more numbers
The date in brackets [a-z]{3}\s+[0-9]+,\s[0-9]+
- Looks scarier than it is. The form is
word number, number
-
[a-z]{3}
3 letters exactly -
\s+
one or more spaces -
,
comma -
[0-9]+
one or more numbers
Words with m
(in the middle) [a-z]+[m][a-z]+
-
[a-z]+
one or more letters -
[m]
the letterm
- Note this doesn't match
Michigan
becausem
is at the start of the word
Words with m
(anywhere) ([a-z]+)?[m]([a-z]+)?
- Looks scarier than it is. The form is
(word)? m (word)?
-
( ... )?
optional-
[a-z]+
a word -
([a-z]+)?
an optional word
-
-
[m]
the letterm
- Note
m
can be anywhere in the word soMichigan
is matched now
Exclusive matches
Match expressions but exclude them from the result. Officially known as 'look arounds'.
Word in brackets (inclusive) \([a-z]+\)
- Note the word is matched with the brackets
-
\(
and\)
match brackets -
[a-z]+
a word
Word in brackets (exclusive) (?<=\()[a-z]+(?=\))
- Note the word is matched without the brackets
-
[a-z]+
a word -
(?<= ... )
starts a match but excludes it from the result-
\(
the bracket(
character -
(?<=\()
matches from bracket(
without including it
-
-
(?= ... )
ends a match but excludes it from the result-
\)
the bracket)
character -
(?=\))
matches up to bracket)
without including it
-
Everything in brackets (exclusive) (?<=\().*?(?=\))
-
(?<=\()
matches from bracket(
without including it -
.*?
non-greedy wildcard -
(?=\))
matches up to bracket)
without including it
Everything in brackets on lines with *
(exclusive)
(?<=\*.*\().*?(?=\))
-
(?<= ... )
starts a match but excludes it-
\*
the star*
character -
.*
wildcard -
\(
the bracket(
character -
(?<=\*.*\()
wildcard from*
to(
without including them
-
-
.*?
non-greedy wildcard -
(?=))
matches up to)
without including it
Everything up to *
(exclusive) ^.*(?=\*)
-
^
start of a line -
.*
wildcard -
(?=\*)
matches up to*
without including it
Cheat sheet
. ^ $ * + ? ( ) [ { \ |
reserved characters
- Escape with
\
-
(abc)
matchesabc
(in a regex group) -
\(abc\)
matches(abc)
(with brackets)
-
[a-zA-Z]
letters (case-sensitive)
[0-9]
or \d
match numbers
[a-c1-3#]
matches characters a b c 1 2 3 #
.*
greedy wildcard. .*?
non-greedy wildcard.
^
start of line. $
end of line.
\s
space. \t
tab. \n
new line.
\w
letters and numbers. \W
not letters and numbers.
\b
word break. \B
not word break.
+
repeat matches
{3}
repeat match exactly thrice
{1,3}
repeat match 1, 2, or 3 times
{3,}
repeat match 3+ times
[^ ... ]
match all but given characters
(?<= ... )
start match with given characters and exclude them (look behind)
(?= ... )
end match with given characters and exclude them (look ahead)
Top comments (2)
@kanga_bru , thanks a lot! Very useful! Checked all of these. Works like a charm! Small typo in "3 whole letter words".
My pleasure! Thanks for the heads up, that typo's now fixed 👍
Also the follow up article is coming this weekend so get keen.