Megan

Posted on Aug 5, 2020

Regular Expressions Crash Course

Hi all! I came across a regular expression problem in a hackerrank and was completely stumped. I didn't think regular expressions were a common interview question, but I did realize how useful they could be. So I thought I'd post a little crash course on regular expression in hopes of helping others.

Okay okay, so first off... what are regular expressions?

"Regular expressions or RegEx is a sequence of characters that define a search pattern, mainly for use in pattern matching with strings, or string matching, i.e. “find and replace”-like operations."
GeeksForGeeks via Wikipedia

Basically, you can create a combination of characters (because regex uses a lot of symbols) to match exactly to a string you would be looking for.

A good example and what it is commonly used for is pattern matching in emails. That happened to be a part of the question I received as well.

It makes sense if you're going over hundreds or thousands or hundreds of thousands of emails that you wouldn't or couldn't feasibly look through each one one at a time. Enter... regular expressions!

Now given this is just a crash course, I will only go over a few basic examples so this definitely isn't all there is to know about regular expressions. But you will be able to use this information to build up your regex knowledge!

I think it is easiest to learn at first if you use a nice regular expression compiler like Regex 101 or Regexr. You can test your pattern very easily with these websites.

If you mosey on over to one of those sites you will notice that regular expressions have a fairly common setup across the board.

The strings will begin and end with a forward slash followed up a string, for example:

/ (put your regular expressions in here!) /g

The string at the end denotes the different flags for how your regex get interpreted:

Letter(s)	What it stands for	What it does
g	global	searches through the entire string, not just for the first match it finds
i	case insensitive	will search for the text whether it is capitalized or not
m	multiline	used with start and end denotations
s	single line / dotall	will match any character (used with .)

There are more flags than this available (and sometimes denoted by different letters), but for our purposes I didn't include them as they are more advanced than what we will go over. For now we will stick with g so we know that our entire input will be evaluated.

Alright, now that you have somewhere to put your regex and know how it's structured... let's get to creating!

Letters and Numbers

Longhand

If you want to denote a range of characters or numbers, you can do so uses these:

characters	what they do
[a-z]	lowercase range from a-z
[A-Z]	uppercase range from a-z
[0-9]	range of digits from 0-9

So for instance, if our input text to test over is:

"How now brown cow"

Our regular expression:

/[a-z]/g

would find all of the letters except for the capital H.

Shorthand

characters	equivalent	what they do
\w	[0-9a-zA-Z_]	includes all letters (case insensitive) and digits
\d	[0-9]	range of digits from 0-9
\s	-	white space (tabs, regular space, newline)

These are also very straightforward, but if we continue with our previous text example, our regular expression could be:

//w/g

and it would find all of the letters, including the capital H this time.

NOT Containing

These are similar to the above and you can see that if you capitalize the letter in these shorthand versions, it changes the meaning to NOT including certain characters in the string:

characters	equivalent	what they do
\W	![0-9a-zA-Z_]	does NOT includes all letters (case insensitive) and digits
\D	![0-9]	does NOT range of digits from 0-9
\S	-	does NOT include white space (tabs, regular space, newline)

Continuing with our example from above, using the regular expression:

/\W/g

it would not return any matches for letters or digits, but it would return matches for the spaces in between the letters.

Modifiers

Now that we've taken care of the characters, what if we wanted them in a specific order? Here is where modifiers come in to play:

characters	what they do
.	matches everything but new lines
\|	used to separate different strings
\.\	literal period if needed (escaping)
*	0 or more of that character
+	1 or more of that character
?	only 0 or 1 of that character (optional)
{m, n}	between m and n (m and n being integers) of that character

There are also two more important special characters to be aware of: the ^ denotes the beginning of a regular expression and the $ denotes the end of a regular expression. Keep these in mind for the examples below!

Examples

Of course we need some helpful examples!

IP address validator

/^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$/

This one is great, very straightforward and uses all of the shortcuts we just learned! As I'm sure you are familiar with IP addresses, this example just checks that each set of digits between the periods are actually digits.

Email address validator

/^\S+@\S+\.\S+$/

I wanted to be sure to include this one as regular expressions are often used for email address validation. This is a very simple example of an email address, but as you can tell it checks that there aren't any spaces and that the string includes only one @ and one period.

Resources

As usual, I'd like to include some extra resources I found very helpful:

Thanks so much for taking the time to read this! I hope that it could be helpful for you. And as always please feel free to leave comments or corrections. Have an awesome week!

Top comments (1)

aryaziai • Aug 5 '20 • Edited

This has been one of the elephant in the room concepts for me that I never wanted to address. After reading this, I opened up IRB and began testing it out.

value = "Hey world 12345"
value.scan(/[0-3a-z]/).join
=> "eyworld123"

I'll stick to the long way of typing it out until it becomes second nature. I appreciate the detailed explanation.