In this post, I am going to have a quick overview of regex expressions. The review is based on my learning outcomes from one of my recent MS course.
A definition of regular expression from the internet.
A regular expression (regex or regexp for short) is a special text string for describing a search pattern. You can think of regular expressions as wildcards on steroids. You are probably familiar with wildcard notations such as .txt to find all text files in a file manager. The regex equivalent is ^..txt$.
Let's start with an example regex expression
^[A-Za-z]+[._-]?[A-Za-z0-9]*[@][A-Za-z0-9]{2,}\.[a-z]{2,6}$
Any guess what the above regex expression represent? If your guess is an email address then you are right. The above regex expression represents a valid pattern for an email address. Although, this might not be a fully valid email address pattern let's use it as an example here.
Below is the valid matching email address for above regex expression:
firstlast@domain.com
first.last@domain.net
first_last@domain.us
first-last12@longdomain.online
Let's break down the above regex expression and compare it with the result.
First of all, every regex expressions begin with ^
caret and end with a $
dollar sign. These two signs indicate the starting and end of a regex expression.
Now, let's extract the first portion of the email before the @
sign, which can also be named as the username.
[A-Za-z]+[._-]?[A-Za-z0-9]*
Let's break it further into three parts.
-
[A-Za-z]+
this pattern represents case insensitive one or more+
letter(s) from a to zA-Za-z
. An example isFirstlast
andfirst
from the above email. -
[._-]?
this part of the pattern represents an optional?
special character of type.
period,_
underscore or-
dash (hyphen) as seen in the example emails above. -
[A-Za-z0-9]*
finally this part of the pattern represents zero or more*
characters of type upper or lower caseA-Za-z
letter(s) or digit(s) from zero to 90-9
after one of the special characters from._-
.
[@]
donates the at sign in the email.
Finally, the last portion of an email is the domain name of the provider and it is donated as below in regex expression.
[A-Za-z0-9]{2,}\.[a-z]{2,6}
Let's break it further into three parts:
-
[A-Za-z0-9]{2,}
this part of the pattern represents 2 or more{2,}
characters of type upper and lower case letter(s) from a to zA-Za-z
and digit(s) from 0 to 90-9
. An example isdomain.com
from the above list of emails. -
\.
this part represents the period used in the domain name part of the email. Note:\
is used for escaping. -
[a-z]{2,6}
this part represents 2 to 6{2,6}
characters from a to za-z
of the last portion of email after the period sign.
Explanation of regex characters:
^
: indicates the start of regex expression.
Example Usage: ^.$
(it returns any character ABCabc123!#@$#%$#%
)
$
: indicates the end of regex expression.
Example Usage: ^.$
(it returns any character ABCabc123!#@$#%$#%
)
\
: indicates escaping in regex expression.
Example usage: [a-z]\.[a-z]
(it escape period between characters abc.def
)
+
: indicates one or more characters in a pattern.
Example usage: [a-z]+
(it returns one or more lowercase characters abcdef
)
*
: indicates zero or more characters in a pattern.
Example usage: [a-z]+[0-9]*
(it indicates optional digit(s) at the end of text abcdef123
and abcdef
both are valid results)
?
: indicates zero or one character in a pattern.
Example Usage: 0?[1-9]
(it makes zero optional at the begging of single-digit 01
, 1
both are valid results)
|
: indicates or/alternative in a pattern.
Example Usage: (cat|dog)
(the valid result of expression is either cat
or dog
)
[]
: indicates matching of values in a pattern.
Example Usage: ca[tr]
(the valid result of the expression is ca
followed by one of the values inside the brackets, car
or cat
)
()
: indicates the grouping of values in a pattern.
Example Usage: (1|2|3)
(the valid result of the expression is on of the values inside the group separated by the pip sign, 1
or 2
or 3
)
A-Z
: indicates upper case letters from a to z.
Example Usage: ^[A-Z][a-z]
(it capitalize first letter for first name John
, Mark
etc)
a-z
: indicates lower case letters from a to z.
Example Usage: ^[A-Z][a-z]
John
, Mark
etc.
0-9
: indicates digits from 0 to 9.
Example Usage: ^[2-9][0-9]{3} [1-9][0-9]{2}-[0-9]{4}
= 340 597-1234
.
or
\s
: indicates white space in a pattern.
Example Usage: [A-Z][a-z]\s[A-Z][a-z]
(it inserts space between first name and last name Steve Jobs
)
Bonus
US Phone Number Try it here
Pattern
^[2-9][0-9]{2}\s[1-9][0-9]{2}-[0-9]{4}$
Explanation
^[2-9][0-9]{2}
: Start the expression, add initial digit between 2 and 9 followed by two additional digits between 0 and 9.
\s
: Represent a white space
[1-9][0-9]{2}
: After white space, add a digit between 1 and 9 followed by two additional digits between 0 and 9.
-[0-9]{4}$
: Add a hyphen followed by four additional digits between 0 and 9 and mark the end of the expression.
Matching Phone Number
234 123-4567
Social Security Number Try it here
Pattern
^[0-9]{3}-[0-9]{2}-[0-9]{4}$
Explanation
^[0-9]{3}-
: Start the expression and add three digits between 0 and 9 followed by a hyphen.
[0-9]{2}-
: Add two digits between 0 and 9 followed by a hyphen.
0-9]{4}$
: Add four digits between 0 and 9 and mark the end of the expression.
Matching SSN
000-00-0000
Street Address Try it here
Pattern
^[1-9][0-9]{3}\s[A-Z][a-z]+\s[A-Z](a-z.)?|[a-z]+$
Explanation
^[1-9][0-9]{3}
: Start the expression, add a digit between 1 and 9 followed by 3 more digits between 0 and 9.
\s
: Add white space.
[A-Z][a-z]+
: Add an upper case letter followed by one or more lower case letters.
\s
: Add white space.
[A-Z](a-z.)?|[a-z]+$
: Add an uppercase letter followed by either one letter and a period, or one or more lower case letters and mark the end of the expression.
Matching Street Address
1234 Sample Street
1234 Sample St.
Top comments (0)