DEV Community

Erlan Akbaraliev
Erlan Akbaraliev

Posted on

Regulary expressions in Python

Import the python built-in re module.
import re

import re

word = 'word_123 !'
match = re.fullmatch(r'.*', word)
print(bool(match))
# result: True
Enter fullscreen mode Exit fullscreen mode

.* means any character, any number of times


  1. Meta characters and special sequences
  2. Functions of module re

1. Meta characters and special sequences

Meta characters

char Description Example Matches
. Matches any single character (except newline). a.b acb, a1b, a#b
^ Matches the start of the string. ^Start Start of the sentence
$ Matches the end of the string. End$ This is the End
* Matches zero or more occurrences of the preceding element. ab*c ac, abc, abbbc
+ Matches one or more occurrences of the preceding element. ab+c abc, abbbc, but not ac
? Matches zero or one occurrence of the preceding element (optional/lazy). colou?r color, colour
{m,n} Matches at least m and at most n occurrences of the preceding element. a{2,3} aa, aaa
() Groups elements together and creates a capturing group. (abc)+ abc, abcabc
[] Defines a character set, matching any single character contained within the brackets. [aeiou] a, e, i, o, or u
\ Escapes the following character, treating it literally or giving special meaning (e.g., \d). \$100 $100

Special sequences

Sequence Meaning Simpler Example Matches
\d Matches any digit (0-9). \d\d 12, 05, 99
\D Matches any non-digit character. \D+ Hello, ?!@, (a space)
\w Matches any word character (letters, numbers, underscore, except symbols like !?,& ). \w+ user_name, File1, _test
\W Matches any non-word character (punctuation, space, symbol). \W !, @, #, (a space)
\s Matches any whitespace character (space, tab, newline, etc.). \s A space (), a tab (\t), a newline (\n)
\S Matches any non-whitespace character. \S+ word!, 123, (text)

2. Functions of module re

re.fullmatch - checks for a match against the entire string
re.match - checks for a match only at the beginning (start) of the string

import re

word = '123_word !'
match = re.fullmatch(r'\d+', word)
print(match)
# result: None
# The entire word doesn't match the pattern
Enter fullscreen mode Exit fullscreen mode
import re

word = '123_word !'
match = re.match(r'\d+', word)
print(match)
# result: <re.Match object; span=(0, 3), match='123'>
# The chars in the index span 0,3 of the beginning of the string match the pattern

word2 = 'word_123 !'
match = re.match(r'\d+', word2)
print(match)
# result: None
Enter fullscreen mode Exit fullscreen mode

re.findall() - finds and returns all matching occurrences in a list

import re

word = 'word_123 !'
match = re.findall(r'\d+', word)
print(match)
# result: 123
Enter fullscreen mode Exit fullscreen mode

split - splits a string wherever the pattern matches

import re

word = 'Words, words , Words'
match = re.split(r'\W+', word)
print(match)
# result: ['Words', 'words', 'Words']
# W - non-word

wor2 = 'On 12th Jan 2016, at 11:02 AM'
match = re.split(r'\d+', word2)
print(match)
# result: ['On ', 'th Jan ', ', at ', ':', ' AM']
# split the string by numbers
Enter fullscreen mode Exit fullscreen mode

Top comments (0)