In this post, I am going to have a quick overview of regex expressions. The review is based on my learning outcomes from one of my recent MS course.
A definition of regular expression from the internet.
A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as .txt to find all text files in a file manager. The regex equivalent is ^..txt$.
Let's start with an example regex expression
^[A-Za-z]+[._-]?[A-Za-z0-9]*[@][A-Za-z0-9]{2,}\.[a-z]{2,6}$
Any guess what the above regex expression represent? If your guess is an email address then you are right. The above regex expression represents a valid pattern for an email address. Although, this might not be a fully valid email address pattern let's use it as an example here.
Below is the valid matching email address for above regex expression:
firstlast@domain.com
first.last@domain.net
first_last@domain.us
first-last12@longdomain.online
Let's break down the above regex expression and compare it with the result.
First of all, every regex expressions begin with ^ caret and end with a $ dollar sign. These two signs indicate the starting and end of a regex expression.
Now, let's extract the first portion of the email before the @ sign, which can also be named as the username.
[A-Za-z]+[._-]?[A-Za-z0-9]*
Let's break it further into three parts.
-
[A-Za-z]+this pattern represents case insensitive one or more+letter(s) from a to zA-Za-z. An example isFirstlastandfirstfrom the above email. -
[._-]?this part of the pattern represents an optional?special character of type.period,_underscore or-dash (hyphen) as seen in the example emails above. -
[A-Za-z0-9]*finally this part of the pattern represents zero or more*characters of type upper or lower caseA-Za-zletter(s) or digit(s) from zero to 90-9after one of the special characters from._-.
[@] donates the at sign in the email.
Finally, the last portion of an email is the domain name of the provider and it is donated as below in regex expression.
[A-Za-z0-9]{2,}\.[a-z]{2,6}
Let's break it further into three parts:
-
[A-Za-z0-9]{2,}this part of the pattern represents 2 or more{2,}characters of type upper and lower case letter(s) from a to zA-Za-zand digit(s) from 0 to 90-9. An example isdomain.comfrom the above list of emails. -
\.this part represents the period used in the domain name part of the email. Note:\is used for escaping. -
[a-z]{2,6}this part represents 2 to 6{2,6}characters from a to za-zof the last portion of email after the period sign.
Explanation of regex characters:
^: indicates the start of regex expression.
Example Usage: ^.$ (it returns any character ABCabc123!#@$#%$#%)
$: indicates the end of regex expression.
Example Usage: ^.$ (it returns any character ABCabc123!#@$#%$#%)
\: indicates escaping in regex expression.
Example usage: [a-z]\.[a-z] (it escape period between characters abc.def)
+: indicates one or more characters in a pattern.
Example usage: [a-z]+ (it returns one or more lowercase characters abcdef)
*: indicates zero or more characters in a pattern.
Example usage: [a-z]+[0-9]* (it indicates optional digit(s) at the end of text abcdef123 and abcdef both are valid results)
?: indicates zero or one character in a pattern.
Example Usage: 0?[1-9] (it makes zero optional at the begging of single-digit 01, 1 both are valid results)
|: indicates or/alternative in a pattern.
Example Usage: (cat|dog) (the valid result of expression is either cat or dog)
[]: indicates matching of values in a pattern.
Example Usage: ca[tr] (the valid result of the expression is ca followed by one of the values inside the brackets, car or cat)
(): indicates the grouping of values in a pattern.
Example Usage: (1|2|3) (the valid result of the expression is on of the values inside the group separated by the pip sign, 1 or 2 or 3)
A-Z: indicates upper case letters from a to z.
Example Usage: ^[A-Z][a-z] (it capitalize first letter for first name John, Mark etc)
a-z: indicates lower case letters from a to z.
Example Usage: ^[A-Z][a-z] John, Mark etc.
0-9: indicates digits from 0 to 9.
Example Usage: ^[2-9][0-9]{3} [1-9][0-9]{2}-[0-9]{4} = 340 597-1234.
or \s: indicates white space in a pattern.
Example Usage: [A-Z][a-z]\s[A-Z][a-z] (it inserts space between first name and last name Steve Jobs)
Bonus
US Phone Number Try it here
Pattern
^[2-9][0-9]{2}\s[1-9][0-9]{2}-[0-9]{4}$
Explanation
^[2-9][0-9]{2}: Start the expression, add initial digit between 2 and 9 followed by two additional digits between 0 and 9.
\s: Represent a white space
[1-9][0-9]{2}: After white space, add a digit between 1 and 9 followed by two additional digits between 0 and 9.
-[0-9]{4}$: Add a hyphen followed by four additional digits between 0 and 9 and mark the end of the expression.
Matching Phone Number
234 123-4567
Social Security Number Try it here
Pattern
^[0-9]{3}-[0-9]{2}-[0-9]{4}$
Explanation
^[0-9]{3}-: Start the expression and add three digits between 0 and 9 followed by a hyphen.
[0-9]{2}-: Add two digits between 0 and 9 followed by a hyphen.
0-9]{4}$: Add four digits between 0 and 9 and mark the end of the expression.
Matching SSN
000-00-0000
Street Address Try it here
Pattern
^[1-9][0-9]{3}\s[A-Z][a-z]+\s[A-Z](a-z.)?|[a-z]+$
Explanation
^[1-9][0-9]{3}: Start the expression, add a digit between 1 and 9 followed by 3 more digits between 0 and 9.
\s: Add white space.
[A-Z][a-z]+: Add an upper case letter followed by one or more lower case letters.
\s: Add white space.
[A-Z](a-z.)?|[a-z]+$: Add an uppercase letter followed by either one letter and a period, or one or more lower case letters and mark the end of the expression.
Matching Street Address
1234 Sample Street
1234 Sample St.
Top comments (0)