What is Regex?
Regex, short for Regular Expressions, is a way to determine whether a string or a part of a string fits a certain pattern. It is very powerful and short, but the syntax is not very intuitive. In this article, I will give a brief run-down of how to use regex in python, and a basic dictionary for the symbols used.
Startup
In order to use regex in Python, you first need to import the module with import re. Once you've done this, you're good to go!
Search
The first function that we'll go over is search. This function takes in two parameters: the pattern that the function will look for, and the string it'll look for it in. The full syntax is <match> = re.search(<pattern>, <string>). If this function finds a match, it returns a Match object representing the first substring that fits the pattern, otherwise it return None. You can get the matching substring with <match>.group(0), and the start and end positions of the substring as a tuple with <match>.span(). For example, re.search("ra", "abracadabra").span() will return (2,4). If you use groupings in your pattern (see below), you can use <match>.group(<n>), where n is the group number you want to access.
Find All
Findall will give you all the substrings that match the pattern. re.findall(<pattern>, <string>) will return a list of all the non-overlapping substrings in the string that match the pattern. For example, re.findall("a.{1,2}a", "abracadabra") will return ["abra", "ada"]. Note that it doesn't find "aca" nor the second "abra", since they overlap with the substrings already found.
Split
Split is used when you want to break up your original string. re.split(<pattern>, <string>) will return an list of substrings, separated where the pattern matches in the string. For example, re.split("ra", "abracadabra") will return ["ab", "cadab", ""].
Substitute
Sub is used when you want to replace parts of your original string with something else. re.sub(<pattern>, <replacement>, <string>) will replace all the substrings in string that match the pattern with replacement. For example, re.sub("ra", "lo", "abracadabra") will return "ablocadablo".
Special Characters
There are many special characters that can be used in the pattern in order to make your searches more powerful. Here is a list of some of the more widely used ones:
-
^: Start of the string -
$: End of the string -
[]: Will match any character inside the square braces. Ranges can be given with-, and^will negate it, matching anything except what's inside.Examples:
-
[abc]will match"a","b","c" -
[^abc]will match anything except"a","b","c" -
[4-7f-e]will match any digit between 4 and 7, and any letter between f and e (inclusive)
-
.: Wildcard. This will match anything except a newline\d: Any digit. The same as[0-9]\D: Any non-digit. The same as[^0-9]\s: Any whitespace character, such as spaces, tabs, and newlines\S: Any non whitespace character\w: Any "word" character: numbers, letters, and _ (underscore)\W: Any non-"word" character*: Any number of repetitions of the expression before. For example,a*bwill match"b","ab","aab","aaab", etc+: One or more repetitions of the expression before. For example,a+bwill match"ab","aab","aaab", etc?: One or no matches o the expression before. For example,a?bwill match"b"and"ab"-
{}: Used to match a specific number of repetitions:-
{n}: will match exactlynrepetitions:-
a{3}bwill match"aaab"
-
-
{n,}: will matchnor more repetitions:-
a{3,}bwill match"aaab","aaaab", etc
-
-
{n,m}: will match betweennandmrepetitions, inclusive:-
a{1,3}bwill match"ab","aab", and"aaab"
-
-
</code>: Will escape the next character, allowing you to search for special characters. For example*\?will search for"*?"|: "Or" function: will match either expression on each side. For example,a|bwill match"a"and"b".(): Will "group" the expression inside the parentheses, either for capturing with the functions above, or to use in relation with the repetition or | symbols. If you don't want to capture, use(?:.
Sources:
https://docs.python.org/3/library/re.html
https://www.w3schools.com/python/python_regex.asp
Useful Links:
https://regex101.com/
https://www.rexegg.com/regex-quickstart.html
https://xkcd.com/1313/
https://alf.nu/RegexGolf
Top comments (0)