Regulary expressions in Python

#beginners #python #tutorial

Import the python built-in re module.
import re

import re

word = 'word_123 !'
match = re.fullmatch(r'.*', word)
print(bool(match))
# result: True

.* means any character, any number of times

Meta characters and special sequences
Functions of module re

1. Meta characters and special sequences

Meta characters

char	Description	Example	Matches
`.`	Matches any single character (except newline).	`a.b`	`acb`, `a1b`, `a#b`
`^`	Matches the start of the string.	`^Start`	Start of the sentence
`$`	Matches the end of the string.	`End$`	This is the End
*``**	Matches zero or more occurrences of the preceding element.	`ab*c`	`ac`, `abc`, `abbbc`
`+`	Matches one or more occurrences of the preceding element.	`ab+c`	`abc`, `abbbc`, but not `ac`
`?`	Matches zero or one occurrence of the preceding element (optional/lazy).	`colou?r`	`color`, `colour`
`{m,n}`	Matches at least m and at most n occurrences of the preceding element.	`a{2,3}`	`aa`, `aaa`
`()`	Groups elements together and creates a capturing group.	`(abc)+`	`abc`, `abcabc`
`[]`	Defines a character set, matching any single character contained within the brackets.	`[aeiou]`	`a`, `e`, `i`, `o`, or `u`
`\`	Escapes the following character, treating it literally or giving special meaning (e.g., `\d`).	`\$100`	`$100`

Special sequences

Sequence	Meaning	Simpler Example	Matches
`\d`	Matches any digit (0-9).	`\d\d`	`12`, `05`, `99`
`\D`	Matches any non-digit character.	`\D+`	`Hello`, `?!@`, (a space)
`\w`	Matches any word character (letters, numbers, underscore, except symbols like !?,& ).	`\w+`	`user_name`, `File1`, `_test`
`\W`	Matches any non-word character (punctuation, space, symbol).	`\W`	`!`, `@`, `#`, (a space)
`\s`	Matches any whitespace character (space, tab, newline, etc.).	`\s`	A space (), a tab (`\t`), a newline (`\n`)
`\S`	Matches any non-whitespace character.	`\S+`	`word!`, `123`, `(text)`

2. Functions of module re

re.fullmatch - checks for a match against the entire string
re.match - checks for a match only at the beginning (start) of the string

import re

word = '123_word !'
match = re.fullmatch(r'\d+', word)
print(match)
# result: None
# The entire word doesn't match the pattern

import re

word = '123_word !'
match = re.match(r'\d+', word)
print(match)
# result: <re.Match object; span=(0, 3), match='123'>
# The chars in the index span 0,3 of the beginning of the string match the pattern

word2 = 'word_123 !'
match = re.match(r'\d+', word2)
print(match)
# result: None

re.findall() - finds and returns all matching occurrences in a list

import re

word = 'word_123 !'
match = re.findall(r'\d+', word)
print(match)
# result: 123

split - splits a string wherever the pattern matches

import re

word = 'Words, words , Words'
match = re.split(r'\W+', word)
print(match)
# result: ['Words', 'words', 'Words']
# W - non-word

wor2 = 'On 12th Jan 2016, at 11:02 AM'
match = re.split(r'\d+', word2)
print(match)
# result: ['On ', 'th Jan ', ', at ', ':', ' AM']
# split the string by numbers

DEV Community

Regulary expressions in Python

1. Meta characters and special sequences

2. Functions of module re

Top comments (0)