DEV Community

Ugbem Job
Ugbem Job

Posted on

Day 11: Regular Expressions (Regex)

What I learnt

  • Regular Expressions
  • Regex Syntax
  • Regex functions/methods
  • Regex Flags
  • Regular Expression Character

Regular Expressions

Regular Expressions (Regex) are used in text data to match patterns. They are used to find specific patterns in strings or collections of strings. They may also be used to search for (find), change, and modify text data. They can also be utilized to validate user input like email addresses and phone numbers.

We need to import the re module to use regex in python.

Regex Syntax

Before we get into the syntax, we must understand that regex is a language. It has its own syntax and rules.
Regex is found in python and other programming languages like Java, C, JavaScript, PHP, etc. So if you have used regex in any of these languages, you will find that the syntax is the same.

Below are some of the most common syntax that makes up regex:

  • . - any character except newline
  • ^ - start of string
  • $ - end of string
  • * - 0 or more occurrences of the preceding character
  • + - 1 or more occurrences of the preceding character
  • ? - 0 or 1 occurrences of the preceding character
  • {3} - exactly 3 occurrences of the preceding character
  • {3,4} - 3 to 4 occurrences of the preceding character
  • {3,} - 3 or more occurrences of the preceding character
  • [a-z] - any lowercase letter
  • [A-Z] - any uppercase letter
  • [0-9] - any digit
  • [0-9a-zA-Z] - any alphanumeric character (combination of the 3 syntaxes above)
  • [aeiou] - any vowel
  • [^aeiou] - any non-vowel (opposite of [aeiou])
  • [0-9]{3} - any 3 digit number
  • [0-9]{3,4} - any 3 or 4 digit number
  • [a-zA-Z]{8} - any 8 letter word
  • [a-zA-Z]{8,} - any word with 8 or more letters
  • \w - any alphanumeric character
  • \w+ - any alphanumeric character followed by 1 or more alphanumeric characters
  • \W - any non-alphanumeric character (opposite of \w)
  • \s - any whitespace character
  • \S - any non-whitespace character (opposite of \s)
  • \d - any digit
  • \D - any non-digit (opposite of \d)
  • \b - any word boundary
  • \B - any non-word boundary

Regex functions/methods

  • re.compile() - returns a regex pattern object, which can be used to match all occurences of a pattern
  • re.match() - returns a match object if the text matches the pattern
  • re.search() - returns a match object if there is a match anywhere in the text
  • re.findall() - returns a list of strings containing all matches
  • re.split() - returns a list of strings where the string has been split at each match
  • re.sub() - replaces one or many matches with a string

Regex Flags

  • re.I - ignore case (uppercase and lowercase)
  • re.M - multi-line (newline)
  • re.S - dot matches all (newline)
  • re.U - unicode (Python 3 only)
  • re.X - verbose (ignore whitespace and comments)

Example

import re

"""
The search() method returns a match object if there is a match anywhere in the string. If there is more than one match, only the first occurrence of the match will be returned.
"""

string = 'The quick brown fox jumps over the lazy dog.'
search = re.search(r'fox', string, re.I) # r - raw string

print(search) # <re.Match object; span=(16, 19), match='fox'>

print(search.span()) # (16, 19) # returns a tuple containing the start-, and end positions of the match.

print(search.start()) # 16 # returns the start position of the match.

print(search.end()) # 19 # returns the end position of the match.

print(search.group()) # fox # returns the part of the string where there was a match. This is more useful when there's multiple searches.

print(re.search('Fox', string)) # None # returns None if no match was found.

"""
The compile() method returns a regex pattern object, which can be used to match all occurrences of a pattern.
"""

pattern = re.compile('fox')

print(pattern.search(string)) # <re.Match object; span=(16, 19), match='fox'>

"""
The findall() method returns a list of strings containing all matches.

"""

print(pattern.findall(string))

"""
The match() method returns a match object if the text matches the pattern. If there is more than one match, only the first occurrence of the match will be returned.

"""

print(pattern.match(string)) # None # returns None if no match was found.

Enter fullscreen mode Exit fullscreen mode

To read more about regex, visit the W3schools docs here.

Also, you can practice regex on Regex101 as it will help you understand regex more.

Regular Expression Character

  • . - Any character (except newline character)
  • \d - Digit (0-9)
  • \D - Not a digit (0-9)
  • \w - Word character (a-z, A-Z, 0-9, _) - any alphanumeric character

Example


string = 'Hey how are you!'
pattern = re.compile(r"([a-zA-Z]).([a])") # r - raw string # . - any character except newline # () - capturing group # [a-zA-Z] - any lowercase letter or uppercase letter  # [a] - any vowel

search_re = pattern.search(string)
print(search_re.group())

# More Examples
# 1. Validate email address using regex

pattern = re.compile(
r"([a-zA-Z0-9.-]+)@([a-zA-Z-]+)\.([a-zA-Z]{2,5})")
email = 'thisisjustatestemail@gmail.com'
valid_email = pattern.search(email)
print(valid_email.group())



# 2. Validate phone number using regex

pattern = re.compile(r"([0-9]{3})-([0-9]{3})-([0-9]{4})")
phone_number = '234-904-112-0034'
valid_phone_number = pattern.search(phone_number)
print(valid_phone_number.group())


# 3. Validate password using regex (password must be at least 8 characters long)

password = 'Passw@#12!ord'
pattern = re.compile(r"[A-Za-z0-9@#$%&!]{8,}")

"""
- fullmatch() - returns a match object if the string matches the pattern
"""

try:
valid_password = pattern.fullmatch(password)
print(valid_password.group())
except AttributeError:
print('Invalid password. Password must be at least 8 characters long')


# 4. Validate URL using regex

url = 'https://dev.to'
pattern = re.compile(r"https?://(www\.)?(\w+)(\.\w+)") # ? - optional # \ - escape character # \w - any alphanumeric character # \. - any period character
valid_url = pattern.fullmatch(url)
valid_url2 = pattern.search(url)
print(valid_url.group())
print(valid_url2.group())
Enter fullscreen mode Exit fullscreen mode

Conclusion

In this article, we have learned about some regular expressions (regex) and how to use it in Python.
We have also learned about the different regex functions/methods and flags. We have also learned about the different regex characters. We have also learned how to validate email addresses, phone numbers, passwords, and URLs using regex as examples.

I hope you found this article helpful and if you have any questions, please leave a comment below.
I'd love to hear from you, and please share this post with your friends and follow me on my journey here on Dev.to for more articles like this.

Thank you for taking the time to read this.

References

If you found this article helpful, please leave a comment and share with your friends.

Top comments (0)