DEV Community

Noor Sheikh
Noor Sheikh

Posted on • Updated on

An Overview of Regex Expressions

In this post, I am going to have a quick overview of regex expressions. The review is based on my learning outcomes from one of my recent MS course.

A definition of regular expression from the internet.

A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as .txt to find all text files in a file manager. The regex equivalent is ^..txt$.

Let's start with an example regex expression

^[A-Za-z]+[._-]?[A-Za-z0-9]*[@][A-Za-z0-9]{2,}\.[a-z]{2,6}$

Any guess what the above regex expression represent? If your guess is an email address then you are right. The above regex expression represents a valid pattern for an email address. Although, this might not be a fully valid email address pattern let's use it as an example here.

Below is the valid matching email address for above regex expression:

firstlast@domain.com
first.last@domain.net
first_last@domain.us
first-last12@longdomain.online

Check it out here

Let's break down the above regex expression and compare it with the result.

First of all, every regex expressions begin with ^ caret and end with a $ dollar sign. These two signs indicate the starting and end of a regex expression.

Now, let's extract the first portion of the email before the @ sign, which can also be named as the username.

[A-Za-z]+[._-]?[A-Za-z0-9]*

Let's break it further into three parts.

  1. [A-Za-z]+ this pattern represents case insensitive one or more + letter(s) from a to z A-Za-z. An example is Firstlast and first from the above email.
  2. [._-]? this part of the pattern represents an optional ? special character of type . period, _ underscore or - dash (hyphen) as seen in the example emails above.
  3. [A-Za-z0-9]* finally this part of the pattern represents zero or more * characters of type upper or lower case A-Za-z letter(s) or digit(s) from zero to 9 0-9 after one of the special characters from ._-.

[@] donates the at sign in the email.

Finally, the last portion of an email is the domain name of the provider and it is donated as below in regex expression.

[A-Za-z0-9]{2,}\.[a-z]{2,6}

Let's break it further into three parts:

  1. [A-Za-z0-9]{2,} this part of the pattern represents 2 or more {2,} characters of type upper and lower case letter(s) from a to z A-Za-z and digit(s) from 0 to 9 0-9. An example is domain.com from the above list of emails.
  2. \. this part represents the period used in the domain name part of the email. Note: \ is used for escaping.
  3. [a-z]{2,6} this part represents 2 to 6 {2,6} characters from a to z a-z of the last portion of email after the period sign.

Explanation of regex characters:

^: indicates the start of regex expression.
Example Usage: ^.$ (it returns any character ABCabc123!#@$#%$#%)

$: indicates the end of regex expression.
Example Usage: ^.$ (it returns any character ABCabc123!#@$#%$#%)

\: indicates escaping in regex expression.
Example usage: [a-z]\.[a-z] (it escape period between characters abc.def)

+: indicates one or more characters in a pattern.
Example usage: [a-z]+ (it returns one or more lowercase characters abcdef)

*: indicates zero or more characters in a pattern.
Example usage: [a-z]+[0-9]* (it indicates optional digit(s) at the end of text abcdef123 and abcdef both are valid results)

?: indicates zero or one character in a pattern.
Example Usage: 0?[1-9] (it makes zero optional at the begging of single-digit 01, 1 both are valid results)

|: indicates or/alternative in a pattern.
Example Usage: (cat|dog) (the valid result of expression is either cat or dog)

[]: indicates matching of values in a pattern.
Example Usage: ca[tr] (the valid result of the expression is ca followed by one of the values inside the brackets, car or cat)

(): indicates the grouping of values in a pattern.
Example Usage: (1|2|3) (the valid result of the expression is on of the values inside the group separated by the pip sign, 1 or 2 or 3)

A-Z: indicates upper case letters from a to z.
Example Usage: ^[A-Z][a-z] (it capitalize first letter for first name John, Mark etc)

a-z: indicates lower case letters from a to z.
Example Usage: ^[A-Z][a-z] John, Mark etc.

0-9: indicates digits from 0 to 9.
Example Usage: ^[2-9][0-9]{3} [1-9][0-9]{2}-[0-9]{4} = 340 597-1234.

or \s: indicates white space in a pattern.
Example Usage: [A-Z][a-z]\s[A-Z][a-z] (it inserts space between first name and last name Steve Jobs)

Bonus

US Phone Number Try it here

Pattern

^[2-9][0-9]{2}\s[1-9][0-9]{2}-[0-9]{4}$

Explanation

^[2-9][0-9]{2}: Start the expression, add initial digit between 2 and 9 followed by two additional digits between 0 and 9.
\s: Represent a white space
[1-9][0-9]{2}: After white space, add a digit between 1 and 9 followed by two additional digits between 0 and 9.
-[0-9]{4}$: Add a hyphen followed by four additional digits between 0 and 9 and mark the end of the expression.

Matching Phone Number

234 123-4567

Social Security Number Try it here

Pattern

^[0-9]{3}-[0-9]{2}-[0-9]{4}$

Explanation

^[0-9]{3}-: Start the expression and add three digits between 0 and 9 followed by a hyphen.
[0-9]{2}-: Add two digits between 0 and 9 followed by a hyphen.
0-9]{4}$: Add four digits between 0 and 9 and mark the end of the expression.

Matching SSN

000-00-0000

Street Address Try it here

Pattern

^[1-9][0-9]{3}\s[A-Z][a-z]+\s[A-Z](a-z.)?|[a-z]+$

Explanation

^[1-9][0-9]{3}: Start the expression, add a digit between 1 and 9 followed by 3 more digits between 0 and 9.
\s: Add white space.
[A-Z][a-z]+: Add an upper case letter followed by one or more lower case letters.
\s: Add white space.
[A-Z](a-z.)?|[a-z]+$: Add an uppercase letter followed by either one letter and a period, or one or more lower case letters and mark the end of the expression.

Matching Street Address

1234 Sample Street
1234 Sample St.

Oldest comments (0)