Regular expressions play a vital role in every high-level programming language and so in JavaScript. Let's know them all in detail...
A Regular Expression (RegEx) is a sequence of characters that defines a search pattern. It helps you to "match" part of the text (string) by given rule.
// Let's get our hands dirty with an examples:
const regex = /[A-Z]\w+/g; // (We ommited ' ')
// regex holds a regular expression which will match words starting with a capital letter.
const str = `Rahul, Taylor and Susanne are coders who live in India`;
// When we apply the regex to str, it returns all matches in a simple array!
// console.log( str.match(regex) )
//["Rahul", "Taylor", "Susanne", "India"]
You can do the same operation by using plain JavaScript, but regex can save you hundreds of lines & you can use it with almost any language (and even CLI tools)
The Core and Some Basics
When you write a RegEx, it always starts with / and ends with /. Your write the code in between the two slashes. The simplest example - to match the word 'apple' use /apple/ RegEx. This, however, won't match 'APPLE' or 'aPpLe', because RegEx is case sensitive.
To disable case sensitivity in RegEX, use what is called an i flag /apple/i now it will match 'apple', 'APPLE' & 'aPpLe'. To match both 'apple' and 'nut' use apple|nut/ RegEx. Simple, ugh?
How to use in JavaScript
Let's learn the most basic methods in JS for working with RegEx'es
str.match(regex) : Returns an array with all the matches it has found. Actually. there's a little catch here😉. If you try doing this:
"apple apple" .match(/apple/)
you would expect to get ['apple', 'apple'] nut that's not the case. In reality it returns just ['apple']. To get a full array with multiple matches, you should add g flag.str.test(str) : regex is a variable assigned to your RegEx. str is the string you test with the RegEx. The method returns true if it finds any matches or false.
// Let's hang with them
let regex = /code|easy/i;
const str = 'this code is EaSy super easy';
regex.test(str) // true; means we have a match😍
str.match(regex) // ["code", index: 5, input..]
// Oops! We forgot adding the g flag
regex = /code|easy/ig;
str.match(regex) // ["code", "EaSy", "easy"]
// ALRIGHT!!
Concept of Wildcard Period
We learned how to statically match a word, let's say 'hug' (/hug/). But what if we want to match 'huh', 'hug', 'hum' at the same time? Wildcard period! That's the answer. /hu./
This will match all 3 letters long words starting with 'hu'.
Match single character with multiple possibilities
A lot of times you want something in-between. Instead of targeting every character by using .
you might want to target only a, b, c, d, e characters. That's when the nest 'tricks' come in handy.
// CHARACTER CLASSES allow you to define a group of characters you wish to match. You put the chards in [] "car cat cam cal car".match(/ca[rt]/g);
// returns: ['car', 'cat', 'car']
// match "bag", "big", "bug", but nit "bog" "big bag has a bug bog".match(/b[aiu]g/g);
// ["big", "bag", "bug"]
// MAKE CHARACTER CLASSES SHORTER by using [X-Y] which will match fro letter x to letter y. Example: [a-zA-Z] will match all capital and not capital letters from a to z
"abcdefghijklmnopqr".match(/[d-j]/g);
// ["d", "e", "f", "g", "h", "i", "j"]
//same as:
"abcdefghijklmnopqr".match(/defghij/g);
// ["d", "e", "f", "g", "h", "i", "j"]
// Use it with a number too:
"1234567890".match(/4-9/g);
//["4", "5", "6", "7, "8", "9"]
Reverse the character classes
a-z
will match all letters from a to z. To match all symbols, EXCEPT the letters from a to z, use [^a-z]
. The ^
operator reverses the behaviours when used in[ ]
.
Matching characters that occur more than one times
// With +
let regex = /a+/g;
"abc".match(regex) //["a"]
"aabc".match(regex) //["aa"]
"aabac".match(regex) //["aa", "a"]
"bbc".match(regex) //null
//without +
regex = /a/g;
"abc".match(regex) //["a"]
"aabc".match(regex) //["aa"]
"aabac".match(regex) //["aa", "a"]
"bbc".match(regex) //null
Search for patterns from the beginning of the end of the string
To search a character exactly at the beginning of a string using ^
let regex = /^K/;
regex.test("__K_K_") // false - K is not exactly at the beginning!
regex.test("K___K___") // true
//To search for a character at the end of string use $ like so
regex = /K$/;
regex.test("__K__K_") // false - K has to be at the end
regex.test("__K") // true
Optional character
let regex = /colou?r/; // makes 'u' capital
let american = "color";
let british = "colour";
regex.test(american); // true
regex.test(british); // true
regex.test("cologr"); // false
Let's take this to advance level
Common shorthands
- Instead of
[A-Za=z0-9]
Use -> \w
- Instead of
[^A-Za-z0-9]
Use -> \W
- Instead of
[0-9]
\d
- Instead of
^ 0-9
Use -> \D
Specify the upper and lower limit of matches
What if you want to match a sequence of characters that repeats X times, for example - match exactly a sequence of 5 letters 'a'? Here we go a{5}
This would match only 'aaaaa' but not 'aa' or 'aaaaaaa'.
Let's see...
let str = "ama baalo maaaaamal aaaaaa";
console.log( str.match(/a{5}/g ) );
//prints ["aaaaa". "aaaaa"]
//to match 'm' letter followed by 5 x 'a'
console.log( str.match( /ma{5}/ ) );
// prints ["maaaaa", indes: 10, ...]
//which means we have a match at index 10
// to match empty space followed by 4 x 'a'
console.log( str.match(/\sa{4}/ ) );
// prints [" aaaa", index: 19, ...]
// match at index 19
You saw how to match an exact number of repeating characters a{5}
matches "aaaaa". But what if you want to match not exactly 5, but in a more flexible manner - from 1 to 3 repeating characters? Here we go a{1,3}
which will match "a" , "aa", "aaa", but not "aaaa".
We can go even further - by omitting the first or the second parameter a{3}
will not match "a", "aa", but will match "aaa", "aaaa" or higher.
## Match characters t#hat occur multiple times
Above we have briefly covered this topic, now is the moment to go deep.
- To match one or more characters, use after the target character.
let str = "ama balo maaaaamal";
console.log( str.match( /a+/g ) );
// ["a", "a", "aa", "aaaaa", "a"]
console.log( str.match( /a/g ) );
// ["a", "a", "a", "a", "a", "a", "a", "a", "a", "a"]
- To match zero or more characters, ue after the target character
let str = "aaa";
console.log( str.match( /a*/g ) );
// ["aaa", ""]
consolle.log( str.match( /a/g ) );
// ["a", "a", "a"]
- To match zero or one character, use after the target character
let str = "aaa";
console.log( str.match( /a?/g ) );
// ["a", "a", "a", ""]
Positive and Negative lookahead
This is considered one of the abstract topics in regex, but I will try to cover 80/100 of what you need to know.
-
a(?=g)
- Positive lookahead Matches all "a" that is followed by "g", without making the "g" part of the match. -
a(?!g)
- Negative lookahead Matches all "a" that are NOT followed by "g", without making "g" part of the match.
But it can be even more flexible. See this example -> (?=regex)
?!regex
On the place of regex, you can put any valid regex expression. Let's hang with this...
let str = "IsFunBaloonIsLearningRegExIsLean";
console.log (str.match( /Is(?=Learning)/ ) );
//["Is", index: 11, ...]
//Matches the 2nd "Is", right before "Learning"
console.log( str.match( /Is(?=Lean)/ ) );
//["Is", index: 26, ...]
//Match the 3rd "Is", right before "Lean"
console.log( str.match( /Is(?=L)/g ) );
// ["Is", "Is"]
//Matches all "Is" which are followed by "L"
console.log( str.match(/Is(?!L)/ ) );
//["Is", index:0, ...]
// Matches all "Is" which aren't followed by "L"
What if you want the opposite - check the character before, not after the target character? You use a LookBehind ;P
Reusing patterns with capture groups
We all know the DRY programming principle - Don't Repeat Yourself. Capture groups help us to do exactly this.
/(bam+)\w\1/g same as
/(bamm+)\w(bamm+)/g same as
/bamm+\wbamm+/g
/(\w+)\s(\1\1\1)\2/g same as
/(\w+)\s\1\1\1\1\1\1/g
/(\w+)\s\1\1\1/g same as
/\w+\s\w+\w+\w+/g
Now let's learn how to unleash this potential regex power and fuel it all to your JavaScript skills!
Creating RegEx in JavaScript
let regex = /a[0-9]b+/
//if you want to pass flags (like i and g)
let regex = /a[0-9]b+/ig
-> Compiles when script is loaded
- Using the RegEx constructor function
let regex - new RegExp('a[0-9]b+')
//if you want to pass flags (like i and g)
let regex = new RegExp('a[0-9]b+', 'ig')
-> Compiled on runtime
FLAGS
In JavaScript we have 6 flags which affect the match:
- i - Makes the match case-insensitive. No difference between 'C' and 'c'
- g - Without this flag, only the first match will be returned
-
m - Multiline more; only affects the behavior of
^
and$
-
s - Dotall mode; allows wildcard period
.
to match newline character\n
- u - Enabled full Unicode support
- y - Sticky mode. Enabled searching at a specific position
LET'S SEE JS METHODS THAT USE RegEx IN SOME FORM OR ANOTHER
-
str.match(regexp)
- Finds all matches of regexp in the string str and returns an array of those matches -
regexp.exec(str)
- Similar to the match method but it's meant to be used in a loop when the regexp is stored in global variable but not passed directly
// Difference between the two methods
let re = /bla/g;
let str = "bla and yea bla yeh";
re.exec(str)
// -> ["bla", index: 0, ...]
re.exec(str)
// -> ["bla", index: 13, ...]
re.exec(str)
// -> null
re.exec(str)
// -> ["bla", index: 0, ...]
// STARTS AGAIN
//USAGE WITH A LOOP
let match, str = "bla and yeah bla yeh ble";
while (mathc = re.exec(str)) {
console.log(match);
}
// ["bla", index: 0, input: ...]
// ["bla", index: 13, input: ...]
// on the other side, match works pretty simple
str.match(re)
// ["bla", "bla"]
-
str.matchAll(regexp)
- A new JS feature and improvement on the match method. 3 Differences:- Returns an iterable object with matches instead of an array.
- Each match is in the same format as
str.match
without the 'g' flag. - If there are no matches it returns empty iterable object rather than null if you used to match.
Always add g flag when using this one!
let regexp = /bla/g;
let str = 'bla and yeah bla yeh';
const matches = str.matchAll(regexp);
for (let match of matches) {
console.log(match)
}
// ["bla", index: 0, ...]
// ["bla", index: 13, ...]
regexp.test(str)
- Looks for at least one match of regexp in str. If found, returns true. Otherwise false.str.search(regexp)
- Returns the index of the first available match. If no match is found returns -1.str.match(separator)
- Instead of passing a simple string to separator like ' ', we can also pass regex for more precise split/str.replace(from, to)
- from is what to match. It can be a string or regex. The first match will be replaced with the string you have passed to the to argument. Instead of a string, you can pass a function too, but this is outside of the scope of this tutorial.str.repalceAll(from,to)
- Same as replace, except instead of replacing only the first match it will replace all matches with the provided to. Example:
let str = "stuffed str living fforever pff"
let regex = /f+/; //match one or more 'f'
let repl = str.replace(regex, '*');
//repl is "stu*ed str living fforeverpff"
let repl = str.replaceAll(regex, '*');
// repl is "stu*ed str living *orever p*"
// NOTE: If you add g flag to replace it works like replaceAll
A bit tough and lengthy. Hope you liked it! Use the comments for sharing your views and questions.
🔐Thanks For Reading | Happy Coding 📘
Top comments (4)
Your 'wildcard period' section is quite poorly written -
/hu./
will match any string of 3 characters starting withhu
- includinghu
followed by a space, asterisk, exclamation mark, 😊 etc. You haven't explained that.
matches any characteryour section with the + explanation doesn't have a good description and has errors in the comments
this section:
should be
Let's see something interesting but weird:
I have a function that filter a food list and returns a fruit list.
The expected result is:
["apple", "banana", "watermelon"]
but the next code don't pushwatermelon
in the array.If I remove the
g
flag in thefruitRGX
or I move the constant declaration inside the for loop thenfruits
is equal toexpect
.Can someone explain why is happening?
Very helpful thanks