The 5 Regex Patterns I Use in 90% of My Projects

#python #beginners #regex #webdev

Regex is hard. But you only need 5 patterns.

After 3 years of building scrapers and data pipelines, I use the same 5 regex patterns in almost every project.

1. Extract Email Addresses

import re
text = 'Contact us at hello@company.com or support@company.co.uk'
emails = re.findall(r'[\w.+-]+@[\w-]+\.[\w.]+', text)
# ['hello@company.com', 'support@company.co.uk']

2. Extract URLs

text = 'Visit https://example.com/page?q=test or http://api.site.io/v2'
urls = re.findall(r'https?://[\w.-]+(?:/[\w./?=&%-]*)?', text)
# ['https://example.com/page?q=test', 'http://api.site.io/v2']

3. Extract Numbers (including decimals and negatives)

text = 'Price: $29.99, discount: -5.50, items: 3'
numbers = re.findall(r'-?\d+\.?\d*', text)
# ['29.99', '-5.50', '3']

4. Clean Whitespace (multiple spaces, tabs, newlines → single space)

text = 'Too   many\n\n  spaces    here\t\ttabs'
clean = re.sub(r'\s+', ' ', text).strip()
# 'Too many spaces here tabs'

5. Extract Content Between Tags/Delimiters

html = '<title>My Page Title</title>'
title = re.search(r'<title>(.*?)</title>', html).group(1)
# 'My Page Title'

# Also works for:
json_str = '{"key": "value"}'
value = re.search(r'"key":\s*"(.*?)"', json_str).group(1)
# 'value'

Cheat sheet

Pattern	Matches
`\d+`	One or more digits
`\w+`	Word characters (letters, digits, _)
`\s+`	Whitespace
`.*?`	Anything (non-greedy)
`[^\s]+`	Non-whitespace
`(?:...)`	Non-capturing group
`(?P<name>...)`	Named group

What regex pattern do you use most?

I build 77 web scrapers — regex is my daily bread. Follow for more practical patterns.

DEV Community