DEV Community

Cover image for ⌛Saving time with regular expressions - learn the ways of the regex!⏱
Raheem
Raheem

Posted on • Edited on

⌛Saving time with regular expressions - learn the ways of the regex!⏱

A regular expression, often abbreviated as a regex, is a sequence of characters that defines a search pattern that can be used to find occurrences of the said pattern in a string. With regular expressions one can search strings for phone numbers, dates, ISBN codes, etc, using the regex syntax. Regex syntax is incredibly powerful and makes it possible to write regular expressions that can find practically anything.

Regular expressions save time because instead of writing each rule with code, you write a small, compact search pattern that includes all the rules for the sequence you are trying to match. Let's say you want a regex to match valid email addresses.

These are the rules that an email address has to comply with to be a valid one:

  • No spaces
  • Only one "@" sign
  • Must have only one ".", and placed after the @ sign
  • Must have text after said "@"
  • Must have text after said "."

Something like this: raheem@dev.to

I know that the rules for a valid email address are much more complicated - Wikipedia "Email address examples" - but for the sake of this post, let us follow the rules above.

Searching text for valid email addresses without using a regular expression could look like this:

def findValidEmailAddresses(text):

    validEmailAddresses = []

    #Spilt by whitespace, no need to check for whitespaces then
    words = text.split()

    for word in words:

        #Check if there is only one "@" symbol
        if word.count("@") != 1:
            continue

        #Must have text before and after the "@" symbol
        if word.find("@") == 0 or word.find("@") == (len(word)-1):
            continue

        #Only one "." symbol after the "@" symbol
        if word.count(".") != 1 and not word.find(".") > word.find("@"):
            continue


        #"." symbol cannot be directly after the "@" symbol
        if word.find("@") + 1 == word.find("."):
            continue

        #Check for text after the "." symbol ("." cannot be the last character)
        if word.find(".") == (len(word)-1):
            continue

        #All tests passed, this word is a valid email address
        validEmailAddresses.append(word)

    #Return the email addresses
    return(validEmailAddresses)

print(findValidEmailAddresses("Thanks for reading my post dude. Contact me at plzdont@nooooo.com"))
#Prints ["plzdont@nooooo.com"]

This is certainly a way, but let's solve the same problem with a regular expression instead.

import re

emailRegex = re.compile(r"\w+@\w+\.\w+")
print(emailRegex.findall("Thanks for reading my post dude. Contact me at plzdont@nooooo.com"))
#Prints ["plzdont@nooooo.com"]

WOW! It gets me every time how slick regular expressions are! Let's break down the regex.

r"\w+@\w+\.\w+"
  • "\w" is a character that symbolizes any word character.
  • "+" is a quantifier that allows one or more of the previous character.
  • The "@" and the "." are literally what they are, they need to be part of the string for it to match.
  • The "\" before the "." is there because "." on its own is a wildcard character (any character except a line break). The "\" is "escaping" the ".", making it a literal "." That's what we want, not the wildcard character.

In English, the regular expression "\w+@\w+\w+" means:

  • One or more of any word character,
  • followed by an "@" symbol,
  • followed by one or more of any word character,
  • followed by a "." symbol,
  • followed by one or more of any word character.

Sweet, right? 🍬

Regex syntax is simple and easy to learn, but with enough complexity that there is no possible sequence of characters that can't be matched with a regex. Also, regular expressions work with any programming language. Learn the syntax once and use it to search for character sequences anywhere!

Here are some resources to help you reach regex enlightenment:

Thanks for reading my post, I hope you liked it! This is my first post on dev.to and seriously, I was joking about "plzdont@nooooo.com". If you've got any feedback for me, please comment! What do you think about this humorous style? Should I adopt a more formal tone? More code examples? Not enough emojis? Say it all, I really appreciate it!

That's it for this one!

Top comments (1)

Collapse
 
r4h33m profile image
Raheem

I am and was aware of this. The wording could have been better, maybe "Let's say these are the rules that an email address has to comply with to be a valid one." I just wanted some simple rules to make a regex for, to show the ease and speed that regular expression syntax gives you. This was just an example.