loading...
Cover image for Powerful regex for the practical dev

Powerful regex for the practical dev

kanga_bru profile image Scotty Updated on ・6 min read

Summary

Regex is search on steroids. This post demonstrates how to create powerful searches by example.

Examples start basic and build up to more complex expressions. They are designed for devs who search via their IDE.

We'll use this text throughout (demo):

Alabama       (AL)  Birmingham  (Dec 14, 1819)  4,903,185
Hawaii*       (HI)  Honolulu    (Aug 21, 1959)  1,415,872
Michigan      (MI)  Detroit     (Jan 26, 1837)  9,986,857
North Dakota  (ND)  Fargo       (Nov  2, 1889)    762,062
Wyoming*      (WY)  Cheyenne    (Jul 10, 1890)    578,759

(state, id, largest city, founding date, population)


How to use this guide

  • Browse quickly by skimming the picture/title. There are many examples.
  • If you're a beginner start at the top
  • Find more advanced/interesting examples further down
  • Examples next to each other are related
  • Experiment by testing the examples yourself
  • Find a cheat sheet at the bottom

Setup

Follow along here or via your IDE:

  • Open a modern IDE (I use VSCode)
  • Paste the example
  • Open search (ctrl + f or cmd + f)
  • Enable regex (usually a .* icon)

Series

This is the first part in a short series I'll be making. Following guides will cover real case studies and more useful concepts like regex replace.

Follow the newsletter or hit me up on Twitter for more.


Basic matches

Letters [a-zA-Z]

Regex example in VSCode

  • [a-z] lowercase letters
  • [A-Z] uppercase letters
  • Casing only matters when the 'match case' (Aa) option is enabled in VSCode
  • 102 matches found because it's matching a-z characters

Words [a-zA-Z]+

Regex example in VSCode

  • [a-zA-Z] letters
  • + repeats the match for consecutive characters
  • 21 matches found because it's matching a-z words

Specific words (Jan|Jul|Dec)

Regex example in VSCode

  • (Jan|Jul|Dec) matches Jan, Jul, or Dec specifically

2 numbers [0-9]{2}

Regex example in VSCode

  • [0-9] numbers
  • {2} match twice
  • Note long numbers contain multiple matches

4 numbers [0-9]{4}

Regex example in VSCode

  • [0-9] numbers
  • {4} match 4 times

2-3 letters [a-z]{2,3}

Regex example in VSCode

  • [a-z] letters
  • {2,3} match between 2 to 3 times (inclusive)
  • Note long words contain multiple matches

6+ letters [a-z]{6,}

Regex example in VSCode

  • [a-z] letters
  • {6,} match 6 or more times (inclusive)

3 letters/numbers \w{3}

Regex example in VSCode

  • \w letters and numbers (see special chars)
  • {3} match 3 times
  • Note long words contain multiple matches

3 whole letters/numbers \b\w{3}\b

Alt Text

  • \w{3} match 3 letters and numbers
  • \b word boundaries (see special chars)
  • Note long words don't contain multiple matches

3 whole letter words \b[a-z]{3}\b

Alt Text

  • [a-z]{3} match 3 letters
  • \b word boundaries

Two words [a-zA-Z]+\s[a-zA-Z]+

Regex example in VSCode

  • Looks scarier than it is. The form is word space word
  • [a-zA-Z]+ word
  • \s space (see special chars)

One or two words [a-zA-Z]+(\s[a-zA-Z]+)?

Regex example in VSCode

  • Looks scarier than it is. The form is word (space word)?
  • [a-zA-Z]+ word
  • \s space
  • ( ... )? optional
  • Note North Dakota is considered one match now

Wildcards

Everything in brackets (greedy) \(.*\)

Regex example in VSCode

  • \( and \) match brackets (see special chars)
  • .* greedy wildcard
  • This greedy wildcard will match up to the last ) bracket

Everything in brackets (non-greedy) \(.*?\)

Regex example in VSCode

  • \( and \) match brackets
  • .*? non-greedy wildcard
  • This non-greedy wildcard will match up to the first ) bracket

Lines with the * character ^.*\*.*$

Regex example in VSCode


Lines without the * character ^[^\*]+$

Regex example in VSCode

  • ^ and $ match the start/end of the line
  • [^ ... ] matches anything not in the brackets
    • \* the star * character
    • [^\*] matches anything not a * character
  • + repeats the match for consecutive characters

All lines with the e character ^.*[e].*$

Regex example in VSCode

  • ^ and $ match the start/end of the line
  • .* wildcard
  • [e] the letter e

All lines without the e character ^[^e]+$

Regex example in VSCode

  • ^ and $ match the start/end of the line
  • [^ ... ] matches anything not in the brackets
    • [^e] matches anything not an e character
  • + repeats the match for consecutive characters

Brackets starting with certain words \((Jan|Jul|Dec).*\)

Regex example in VSCode

  • \( and \) match brackets
  • (Jan|Jul|Dec) matches Jan, Jul, or Dec words
  • .* wildcard

Mixed matches

The short date in brackets [a-z]{3}\s+[0-9]+

Regex example in VSCode

  • [a-z]{3} 3 letters exactly
  • \s+ one or more spaces
  • [0-9]+ one or more numbers

The date in brackets [a-z]{3}\s+[0-9]+,\s[0-9]+

Regex example in VSCode

  • Looks scarier than it is. The form is word number, number
  • [a-z]{3} 3 letters exactly
  • \s+ one or more spaces
  • , comma
  • [0-9]+ one or more numbers

Words with m (in the middle) [a-z]+[m][a-z]+

Regex example in VSCode

  • [a-z]+ one or more letters
  • [m] the letter m
  • Note this doesn't match Michigan because m is at the start of the word

Words with m (anywhere) ([a-z]+)?[m]([a-z]+)?

Regex example in VSCode

  • Looks scarier than it is. The form is (word)? m (word)?
  • ( ... )? optional
    • [a-z]+ a word
    • ([a-z]+)? an optional word
  • [m] the letter m
  • Note m can be anywhere in the word so Michigan is matched now

Exclusive matches

Match expressions but exclude them from the result. Officially known as 'look arounds'.

Word in brackets (inclusive) \([a-z]+\)

Regex example in VSCode

  • Note the word is matched with the brackets
  • \( and \) match brackets
  • [a-z]+ a word

Word in brackets (exclusive) (?<=\()[a-z]+(?=\))

Regex example in VSCode

  • Note the word is matched without the brackets
  • [a-z]+ a word
  • (?<= ... ) starts a match but excludes it from the result
    • \( the bracket ( character
    • (?<=\() matches from bracket ( without including it
  • (?= ... ) ends a match but excludes it from the result
    • \) the bracket ) character
    • (?=\)) matches up to bracket ) without including it

Everything in brackets (exclusive) (?<=\().*?(?=\))

Regex example in VSCode

  • (?<=\() matches from bracket ( without including it
  • .*? non-greedy wildcard
  • (?=\)) matches up to bracket ) without including it

Everything in brackets on lines with * (exclusive)

(?<=\*.*\().*?(?=\))

Regex example in VSCode

  • (?<= ... ) starts a match but excludes it
    • \* the star * character
    • .* wildcard
    • \( the bracket ( character
    • (?<=\*.*\() wildcard from * to ( without including them
  • .*? non-greedy wildcard
  • (?=)) matches up to ) without including it

Everything up to * (exclusive) ^.*(?=\*)

Regex example in VSCode

  • ^ start of a line
  • .* wildcard
  • (?=\*) matches up to * without including it

Cheat sheet

. ^ $ * + ? ( ) [ { \ | reserved characters

  • Escape with \
    • (abc) matches abc (in a regex group)
    • \(abc\) matches (abc) (with brackets)

[a-zA-Z] letters (case-sensitive)

[0-9] or \d match numbers

[a-c1-3#] matches characters a b c 1 2 3 #

.* greedy wildcard. .*? non-greedy wildcard.

^ start of line. $ end of line.

\s space. \t tab. \n new line.

\w letters and numbers. \W not letters and numbers.

\b word break. \B not word break.

+ repeat matches

{3} repeat match exactly thrice

{1,3} repeat match 1, 2, or 3 times

{3,} repeat match 3+ times

[^ ... ] match all but given characters

(?<= ... ) start match with given characters and exclude them (look behind)

(?= ... ) end match with given characters and exclude them (look ahead)

Posted on by:

kanga_bru profile

Scotty

@kanga_bru

Bringing the 'hack' to indie hacker. I'm building 12 startups this year and blogging about it.

Discussion

pic
Editor guide
 

@kanga_bru , thanks a lot! Very useful! Checked all of these. Works like a charm! Small typo in "3 whole letter words".

 

My pleasure! Thanks for the heads up, that typo's now fixed 👍

Also the follow up article is coming this weekend so get keen.