DEV Community

Cover image for Super ultimate guide to Regex in 2021 (how to  use in JavaScript)
devbookmark
devbookmark

Posted on • Originally published at devbookmark.com

Super ultimate guide to Regex in 2021 (how to use in JavaScript)

Regular expressions play a vital role in every high-level programming language and so in JavaScript. Let's know them all in detail...


A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. It helps you to "match" part of the text (string) by given rule.

// Let's get our hands dirty with an examples: 

const regex = /[A-Z]\w+/g; // (We ommited ' ')
// regex holds a regular expression which will match words starting with a capital letter. 

const str = `Rahul, Taylor and Susanne are coders who live in India`; 

// When we apply the regex to str, it returns all matches in a simple array! 
// console.log( str.match(regex) )
//["Rahul", "Taylor", "Susanne", "India"]

Enter fullscreen mode Exit fullscreen mode

You can do the same operation by using plain JavaScript, but regex can save you hundreds of lines & you can use it with almost any language (and even CLI tools)

The Core and Some Basics

When you write a RegEx, it always starts with / and ends with /. Your write the code in between the two slashes. The simplest example - to match the word 'apple' use /apple/ RegEx. This, however, won't match 'APPLE' or 'aPpLe', because RegEx is case sensitive.

To disable case sensitivity in RegEX, use what is called an i flag /apple/i now it will match 'apple', 'APPLE' & 'aPpLe'. To match both 'apple' and 'nut' use apple|nut/ RegEx. Simple, ugh?

How to use in JavaScript

Let's learn the most basic methods in JS for working with RegEx'es

  • str.match(regex) : Returns an array with all the matches it has found. Actually. there's a little catch here😉. If you try doing this: "apple apple" .match(/apple/) you would expect to get ['apple', 'apple'] nut that's not the case. In reality it returns just ['apple']. To get a full array with multiple matches, you should add g flag.

  • str.test(str) : regex is a variable assigned to your RegEx. str is the string you test with the RegEx. The method returns true if it finds any matches or false.

  // Let's hang with them
  let regex = /code|easy/i; 
  const str = 'this code is EaSy super easy'; 
  regex.test(str) // true; means we have a match😍

  str.match(regex) // ["code", index: 5, input..]

  // Oops! We forgot adding the g flag
  regex = /code|easy/ig;

  str.match(regex) // ["code", "EaSy", "easy"]
  // ALRIGHT!!
Enter fullscreen mode Exit fullscreen mode

Concept of Wildcard Period

We learned how to statically match a word, let's say 'hug' (/hug/). But what if we want to match 'huh', 'hug', 'hum' at the same time? Wildcard period! That's the answer. /hu./ This will match all 3 letters long words starting with 'hu'.

Match single character with multiple possibilities

A lot of times you want something in-between. Instead of targeting every character by using . you might want to target only a, b, c, d, e characters. That's when the nest 'tricks' come in handy.

// CHARACTER CLASSES allow you to define a group of characters you wish to match. You put the chards in [] "car cat cam cal car".match(/ca[rt]/g); 
// returns: ['car', 'cat', 'car']

// match "bag", "big", "bug", but nit "bog" "big bag has a bug bog".match(/b[aiu]g/g); 
// ["big", "bag", "bug"]

// MAKE CHARACTER CLASSES SHORTER by using [X-Y] which will match fro letter x to letter y. Example: [a-zA-Z] will match all capital and not capital letters from a to z

"abcdefghijklmnopqr".match(/[d-j]/g); 
// ["d", "e", "f", "g", "h", "i", "j"]

//same as: 
"abcdefghijklmnopqr".match(/defghij/g); 
// ["d", "e", "f", "g", "h", "i", "j"]

// Use it with a number too: 
"1234567890".match(/4-9/g); 
//["4", "5", "6", "7, "8", "9"]
Enter fullscreen mode Exit fullscreen mode

Reverse the character classes

a-z will match all letters from a to z. To match all symbols, EXCEPT the letters from a to z, use [^a-z]. The ^ operator reverses the behaviours when used in[ ].

Matching characters that occur more than one times

// With +
let regex = /a+/g; 
"abc".match(regex) //["a"]
"aabc".match(regex) //["aa"]
"aabac".match(regex) //["aa", "a"]
"bbc".match(regex) //null

//without +
regex = /a/g; 
"abc".match(regex) //["a"]
"aabc".match(regex) //["aa"]
"aabac".match(regex) //["aa", "a"]
"bbc".match(regex) //null

Enter fullscreen mode Exit fullscreen mode

Search for patterns from the beginning of the end of the string

To search a character exactly at the beginning of a string using ^

let regex = /^K/; 

regex.test("__K_K_") // false - K is not exactly at the beginning!
regex.test("K___K___") // true 

//To search for a character at the end of string use $ like so

regex = /K$/; 

regex.test("__K__K_") // false - K has to be at the end

regex.test("__K") // true
Enter fullscreen mode Exit fullscreen mode

Optional character

let regex = /colou?r/; // makes 'u' capital

let american = "color"; 
let british = "colour"; 

regex.test(american); // true
regex.test(british); // true
regex.test("cologr"); // false
Enter fullscreen mode Exit fullscreen mode

Let's take this to advance level

Common shorthands

  • Instead of [A-Za=z0-9]

Use -> \w

  • Instead of [^A-Za-z0-9]

Use -> \W

  • Instead of [0-9]

\d

  • Instead of ^ 0-9

Use -> \D

Specify the upper and lower limit of matches

What if you want to match a sequence of characters that repeats X times, for example - match exactly a sequence of 5 letters 'a'? Here we go a{5} This would match only 'aaaaa' but not 'aa' or 'aaaaaaa'.

Let's see...

let str = "ama baalo maaaaamal aaaaaa"; 
console.log( str.match(/a{5}/g ) ); 
//prints ["aaaaa". "aaaaa"]

//to match 'm' letter followed by 5 x 'a'
console.log( str.match( /ma{5}/ ) ); 
// prints ["maaaaa", indes: 10, ...]
//which means we have a match at index 10

// to match empty space followed by 4 x 'a'
console.log( str.match(/\sa{4}/ ) ); 
// prints [" aaaa", index: 19, ...]
// match at index 19
Enter fullscreen mode Exit fullscreen mode

You saw how to match an exact number of repeating characters a{5} matches "aaaaa". But what if you want to match not exactly 5, but in a more flexible manner - from 1 to 3 repeating characters? Here we go a{1,3} which will match "a" , "aa", "aaa", but not "aaaa".

We can go even further - by omitting the first or the second parameter a{3} will not match "a", "aa", but will match "aaa", "aaaa" or higher.

## Match characters t#hat occur multiple times

Above we have briefly covered this topic, now is the moment to go deep.

  • To match one or more characters, use after the target character.
let str = "ama balo maaaaamal"; 
console.log( str.match( /a+/g ) ); 
// ["a", "a", "aa", "aaaaa", "a"]

console.log( str.match( /a/g ) ); 
// ["a", "a", "a", "a", "a", "a", "a", "a", "a", "a"]
Enter fullscreen mode Exit fullscreen mode
  • To match zero or more characters, ue after the target character
let str = "aaa"; 
console.log( str.match( /a*/g ) ); 
// ["aaa", ""]

consolle.log( str.match( /a/g ) ); 
// ["a", "a", "a"]
Enter fullscreen mode Exit fullscreen mode
  • To match zero or one character, use after the target character
let str = "aaa"; 
console.log( str.match( /a?/g ) ); 
// ["a", "a", "a", ""]
Enter fullscreen mode Exit fullscreen mode

Positive and Negative lookahead

This is considered one of the abstract topics in regex, but I will try to cover 80/100 of what you need to know.

  • a(?=g) - Positive lookahead Matches all "a" that is followed by "g", without making the "g" part of the match.
  • a(?!g) - Negative lookahead Matches all "a" that are NOT followed by "g", without making "g" part of the match.

But it can be even more flexible. See this example -> (?=regex) ?!regex

On the place of regex, you can put any valid regex expression. Let's hang with this...

let str = "IsFunBaloonIsLearningRegExIsLean"; 

console.log (str.match( /Is(?=Learning)/ ) ); 
//["Is", index: 11, ...]
//Matches the 2nd "Is", right before "Learning"

console.log( str.match( /Is(?=Lean)/ ) ); 
//["Is", index: 26, ...]
//Match the 3rd "Is", right before "Lean"

console.log( str.match( /Is(?=L)/g ) ); 
// ["Is", "Is"]
//Matches all "Is" which are followed by "L"

console.log( str.match(/Is(?!L)/ ) ); 
//["Is", index:0, ...]
// Matches all "Is" which aren't followed by "L"
Enter fullscreen mode Exit fullscreen mode

What if you want the opposite - check the character before, not after the target character? You use a LookBehind ;P

Reusing patterns with capture groups

We all know the DRY programming principle - Don't Repeat Yourself. Capture groups help us to do exactly this.

/(bam+)\w\1/g  same as 
/(bamm+)\w(bamm+)/g same as
/bamm+\wbamm+/g
Enter fullscreen mode Exit fullscreen mode
/(\w+)\s(\1\1\1)\2/g same as
/(\w+)\s\1\1\1\1\1\1/g

Enter fullscreen mode Exit fullscreen mode
/(\w+)\s\1\1\1/g  same as
/\w+\s\w+\w+\w+/g
Enter fullscreen mode Exit fullscreen mode

Now let's learn how to unleash this potential regex power and fuel it all to your JavaScript skills!

Creating RegEx in JavaScript

let regex = /a[0-9]b+/

//if you want to pass flags (like i and g)
let regex = /a[0-9]b+/ig
Enter fullscreen mode Exit fullscreen mode

-> Compiles when script is loaded

  • Using the RegEx constructor function
  let regex - new RegExp('a[0-9]b+')

  //if you want to pass flags (like i and g)
  let regex = new RegExp('a[0-9]b+', 'ig')
Enter fullscreen mode Exit fullscreen mode

-> Compiled on runtime


FLAGS

In JavaScript we have 6 flags which affect the match:

  • i - Makes the match case-insensitive. No difference between 'C' and 'c'
  • g - Without this flag, only the first match will be returned
  • m - Multiline more; only affects the behavior of ^ and $
  • s - Dotall mode; allows wildcard period . to match newline character \n
  • u - Enabled full Unicode support
  • y - Sticky mode. Enabled searching at a specific position

LET'S SEE JS METHODS THAT USE RegEx IN SOME FORM OR ANOTHER

  • str.match(regexp) - Finds all matches of regexp in the string str and returns an array of those matches
  • regexp.exec(str) - Similar to the match method but it's meant to be used in a loop when the regexp is stored in global variable but not passed directly
// Difference between the two methods

let re = /bla/g; 
let str = "bla and yea bla yeh"; 

re.exec(str)
// -> ["bla", index: 0, ...]
re.exec(str)
// -> ["bla", index: 13, ...]
re.exec(str)
// -> null
re.exec(str)
// -> ["bla", index: 0, ...]                
// STARTS AGAIN

//USAGE WITH A LOOP
let match, str = "bla and yeah bla yeh ble"; 
while (mathc = re.exec(str)) {
    console.log(match); 
}
// ["bla", index: 0, input: ...]
// ["bla", index: 13, input: ...]

// on the other side, match works pretty simple
str.match(re)
// ["bla", "bla"]
Enter fullscreen mode Exit fullscreen mode
  • str.matchAll(regexp) - A new JS feature and improvement on the match method. 3 Differences:
    • Returns an iterable object with matches instead of an array.
    • Each match is in the same format as str.match without the 'g' flag.
    • If there are no matches it returns empty iterable object rather than null if you used to match.

Always add g flag when using this one!

let regexp = /bla/g; 
let str = 'bla and yeah bla yeh'; 
const matches = str.matchAll(regexp); 
for (let match of matches) {
    console.log(match)
}
// ["bla", index: 0, ...]
// ["bla", index: 13, ...]
Enter fullscreen mode Exit fullscreen mode
  • regexp.test(str) - Looks for at least one match of regexp in str. If found, returns true. Otherwise false.

  • str.search(regexp) - Returns the index of the first available match. If no match is found returns -1.

  • str.match(separator) - Instead of passing a simple string to separator like ' ', we can also pass regex for more precise split/

  • str.replace(from, to) - from is what to match. It can be a string or regex. The first match will be replaced with the string you have passed to the to argument. Instead of a string, you can pass a function too, but this is outside of the scope of this tutorial.

  • str.repalceAll(from,to) - Same as replace, except instead of replacing only the first match it will replace all matches with the provided to. Example:

  let str = "stuffed str living fforever pff"
  let regex = /f+/; //match one or more 'f'

  let repl = str.replace(regex, '*'); 
  //repl is "stu*ed str living fforeverpff"
  let repl = str.replaceAll(regex, '*'); 
  // repl is "stu*ed str living *orever p*"
  // NOTE: If you add g flag to replace it works like replaceAll
Enter fullscreen mode Exit fullscreen mode

A bit tough and lengthy. Hope you liked it! Use the comments for sharing your views and questions.

🔐Thanks For Reading | Happy Coding 📘

Top comments (4)

Collapse
 
jonrandy profile image
Jon Randy 🎖️ • Edited

Your 'wildcard period' section is quite poorly written - /hu./ will match any string of 3 characters starting with hu - including hu followed by a space, asterisk, exclamation mark, 😊 etc. You haven't explained that . matches any character

Collapse
 
dekarguy profile image
James Terry

your section with the + explanation doesn't have a good description and has errors in the comments

this section:

//without +
regex = /a/g; 
"abc".match(regex) //["a"]
"aabc".match(regex) //["aa"]
"aabac".match(regex) //["aa", "a"]
"bbc".match(regex) //null
Enter fullscreen mode Exit fullscreen mode

should be

regex = /a/g; 
"abc".match(regex) //["a"]
"aabc".match(regex) //["a", "a"]
"aabac".match(regex) //["a", "a", "a"]
"bbc".match(regex) //null
Enter fullscreen mode Exit fullscreen mode
Collapse
 
ghaerdi profile image
Gil Rudolf Härdi • Edited

Let's see something interesting but weird:

I have a function that filter a food list and returns a fruit list.
The expected result is: ["apple", "banana", "watermelon"] but the next code don't push watermelon in the array.

const foodList = ["apple", "soap", "banana", "watermelon", "pizza"];

function getFruits(foodList) {
  const fruitRGX = /apple|banana|watermelon/g;
  const fruitList = [];

  for (const food of foodList) {
    if (!fruitRGX.test(food)) continue;

    fruitList.push(food);
  }

  return fruitList;
}

const fruits = getFruits(foodList);
const expect = ["apple", "banana", "watermelon"];

console.log({ fruits, expect });
// output:
// {
//  fruits: [ "apple", "banana" ],
//  expect: [ "apple", "banana", "watermelon" ]
// }
Enter fullscreen mode Exit fullscreen mode

If I remove the g flag in the fruitRGX or I move the constant declaration inside the for loop then fruits is equal to expect.

Can someone explain why is happening?

Collapse
 
tim012432 profile image
Timo

Very helpful thanks