Regular expressions, often referred to as regex or regexp, are powerful tools for pattern matching and text manipulation. Python provides a built-in module called re that allows you to work with regular expressions.
To start using regular expressions in Python, you need to import the re module:
import re
Now let’s dive into the details of using regular expressions in Python:
Matching Patterns: The re module provides several functions for matching patterns:
re.search(pattern, string): Searches the string for a match to the pattern and returns the first occurrence.
re.match(pattern, string): Checks if the pattern matches at the beginning of the string.
re.findall(pattern, string): Returns all non-overlapping matches of the pattern in the string.
re.finditer(pattern, string): Returns an iterator yielding match objects for all matches.
- Basic Patterns: Literal characters: You can search for literal characters by including them directly in the pattern. Metacharacters: These are special characters that have special meanings in regular expressions, such as ., *, +, ?, ^, $, \, [ ], |, (, and ).
Character classes: You can define custom character classes using square brackets [ ]. For example, [a-z] matches any lowercase letter.
Predefined character classes: There are several shorthand character classes like \d (digits), \w (word characters), \s (whitespace characters), and their negations \D, \W, \S, respectively.
Quantifiers:
*: Matches zero or more occurrences of the preceding element.
+: Matches one or more occurrences of the preceding element.
?: Matches zero or one occurrence of the preceding element.
{n}: Matches exactly n occurrences of the preceding element.
{n,}: Matches at least n occurrences of the preceding element.
{n,m}: Matches at least n and at most m occurrences of the preceding element.Anchors:
^: Matches the beginning of a line.
$: Matches the end of a line.Groups and Capturing:
( ): Creates a group. Groups can be used for capturing substrings and applying quantifiers.
\number: Backreference to a captured group by its number.Special Sequences:
\b: Matches a word boundary.
\B: Matches a non-word boundary.
\d: Matches any decimal digit.
\s: Matches any whitespace character.
\w: Matches any alphanumeric character and underscore.Flags: The re module supports optional flags that modify the behavior of the regular expression matching. Some common flags include:
re.IGNORECASE or re.I: Performs case-insensitive matching.
re.MULTILINE or re.M: Allows the ^ and $ anchors to match at the beginning and end of each line.
re.DOTALL or re.S: Allows the . metacharacter to match any character, including a newline.
Here’s a simple example that demonstrates some of the concepts mentioned above:
import re pattern = r"gr.y" string = "The gray cat is sitting on the green mat." match = re.search(pattern, string) if match: print("Match found:", match.group()) else: print("No match found.") matches = re.findall(pattern, string) prin
Match found: gray
Explanation:
The regular expression pattern gr.y matches any three-character sequence starting with "gr" and ending with "y", where the dot . represents any single character.
The re.search() function is used to find the first occurrence of the pattern in the string.
Since the string contains the word "gray," which matches the pattern, the re.search() function returns a match object.
The match.group() method retrieves the matched substring.
Therefore, the output is "Match found: gray".
Note: If you want to find all occurrences of the pattern in the string, you can use the re.findall() function. In this case, the output would be:
['gray']
This returns a list containing all the non-overlapping matches of the pattern in the string.
Top comments (1)
Hi Rohit, Thanks for sharing your article on regex in Python. Please feel free to include up to 4 tags so that your article will reach more readers :)