DEV Community

Alex Spinov
Alex Spinov

Posted on

The 5 Regex Patterns I Use in 90% of My Projects

Regex is hard. But you only need 5 patterns.

After 3 years of building scrapers and data pipelines, I use the same 5 regex patterns in almost every project.


1. Extract Email Addresses

import re
text = 'Contact us at hello@company.com or support@company.co.uk'
emails = re.findall(r'[\w.+-]+@[\w-]+\.[\w.]+', text)
# ['hello@company.com', 'support@company.co.uk']
Enter fullscreen mode Exit fullscreen mode

2. Extract URLs

text = 'Visit https://example.com/page?q=test or http://api.site.io/v2'
urls = re.findall(r'https?://[\w.-]+(?:/[\w./?=&%-]*)?', text)
# ['https://example.com/page?q=test', 'http://api.site.io/v2']
Enter fullscreen mode Exit fullscreen mode

3. Extract Numbers (including decimals and negatives)

text = 'Price: $29.99, discount: -5.50, items: 3'
numbers = re.findall(r'-?\d+\.?\d*', text)
# ['29.99', '-5.50', '3']
Enter fullscreen mode Exit fullscreen mode

4. Clean Whitespace (multiple spaces, tabs, newlines → single space)

text = 'Too   many\n\n  spaces    here\t\ttabs'
clean = re.sub(r'\s+', ' ', text).strip()
# 'Too many spaces here tabs'
Enter fullscreen mode Exit fullscreen mode

5. Extract Content Between Tags/Delimiters

html = '<title>My Page Title</title>'
title = re.search(r'<title>(.*?)</title>', html).group(1)
# 'My Page Title'

# Also works for:
json_str = '{"key": "value"}'
value = re.search(r'"key":\s*"(.*?)"', json_str).group(1)
# 'value'
Enter fullscreen mode Exit fullscreen mode

Cheat sheet

Pattern Matches
\d+ One or more digits
\w+ Word characters (letters, digits, _)
\s+ Whitespace
.*? Anything (non-greedy)
[^\s]+ Non-whitespace
(?:...) Non-capturing group
(?P<name>...) Named group

What regex pattern do you use most?

I build 77 web scrapers — regex is my daily bread. Follow for more practical patterns.


More from me: 10 Dev Tools I Use Daily | 77 Scrapers on a Schedule | 150+ Free APIs

Top comments (0)