DEV Community

Peyton Green
Peyton Green

Posted on

150+ regex patterns for Python developers: stop rebuilding the same wheel

Every Python developer I know has a folder somewhere called "regex_snippets" or "useful_patterns" or "patterns_to_remember.txt". Mine grew to 47 files over six years. Half of them were wrong. Three of them were the same URL pattern, each slightly different.

The problem with regex isn't that it's hard to write. It's that it's hard to remember. The syntax is compact enough that you forget it between uses. You Google the same log parser every two months. You copy the same email validator from the same Stack Overflow answer you copied from last year.

I finally sat down and went back through two years of production code and pulled out every regex pattern I'd written more than twice. Added the patterns I keep Googling. Got 150+. Organized them by category, added plain-English explanations, documented the edge cases, and packaged it as something I can actually keep open.

That's the Regex Master Pack.


What's in it

8 cheatsheets covering:

  • Core syntax (characters, quantifiers, anchors, groups, flags — the complete reference on one page)
  • Lookaheads and lookbehinds — the stuff that trips everyone up
  • Common gotchas (greedy vs lazy, backtracking, catastrophic patterns, Unicode pitfalls)
  • Language differences (Python, JavaScript, Go, Java, Rust, PCRE)
  • Performance guide (writing fast regex, when NOT to use regex)
  • Regex in CLI tools (grep, sed, awk, ripgrep, perl one-liners)
  • Regex in editors (VS Code, Vim, JetBrains, Sublime find-and-replace)
  • Testing and debugging (how to test regex, debug backtracking, workflows)

150+ patterns in 10 categories:

Category Count
Email, URL, web 18
Dates and times 16
Numbers and currency 15
Phone and address 14
Passwords and auth 10
Code parsing 20
Log parsing 18
Data extraction 15
Text processing 14
DevOps and infra 15

Every pattern includes the regex, a plain-English explanation of each token, example test strings (matches and non-matches), Python usage, and documented edge cases.

5 interactive Python scripts (stdlib only, no pip installs):

  • regex_tester.py — interactive REPL, paste a pattern and test strings, see matches highlighted with group details
  • regex_explainer.py — feed in any regex, get a plain-English breakdown of every token
  • log_parser_generator.py — describe your log format, get a working regex + Python parser
  • pattern_search.py — search the entire pattern library by keyword or category
  • regex_quiz.py — 50 progressive challenges to build muscle memory

A few examples

Log parsing: Apache/Nginx combined format

This is the one I rebuild from memory every time I need it. Now I don't:

import re

APACHE_COMBINED = r'^(\S+)\s+\S+\s+(\S+)\s+\[([^\]]+)\]\s+"(\S+)\s+(\S+)\s+(\S+)"\s+(\d{3})\s+(\d+|-)\s+"([^"]*)"\s+"([^"]*)"'

log_line = '192.168.1.1 - frank [10/Oct/2023:13:55:36 -0700] "GET /api/users HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0"'

m = re.match(APACHE_COMBINED, log_line)
if m:
    ip, user, timestamp, method, path, proto, status, bytes_, referer, ua = m.groups()
    print(f"{method} {path}{status}")
    # GET /api/users → 200
Enter fullscreen mode Exit fullscreen mode

The pattern captures: IP, authenticated user, timestamp, HTTP method, path, protocol, status code, bytes transferred, referer, user agent — named groups version also in the pack.

Code parsing: Python function signatures

Useful for static analysis, documentation generators, or any tooling that needs to understand Python code structure:

PYTHON_FUNC = r'^\s*(?:async\s+)?def\s+(\w+)\s*\(([^)]*)\)\s*(?:->\s*([^:]+))?\s*:'

examples = [
    'def simple(x, y):',
    'async def fetch_data(url: str, timeout: int = 30) -> dict:',
    '    def nested(self) -> None:',
]

for line in examples:
    m = re.search(PYTHON_FUNC, line, re.MULTILINE)
    if m:
        name, params, return_type = m.group(1), m.group(2), m.group(3)
        print(f"fn={name}, params={params!r}, returns={return_type!r}")
Enter fullscreen mode Exit fullscreen mode

Handles sync and async, optional return type annotation, indented (nested/method) definitions.

Data extraction: YAML frontmatter

Every static site generator, Jekyll template, and Hugo theme uses this. Here's a pattern that actually handles the edge cases:

YAML_FRONTMATTER = r'^---\s*\n(.*?)\n---\s*\n'

content = """---
title: My Article
tags: python, regex
published: true
---

Article body starts here.
"""

m = re.search(YAML_FRONTMATTER, content, re.DOTALL)
if m:
    frontmatter = m.group(1)
    # Parse the captured YAML string with yaml.safe_load()
Enter fullscreen mode Exit fullscreen mode

The re.DOTALL flag is required — without it, . won't match newlines and the pattern fails on multi-line frontmatter.

DevOps: semantic versioning

SEMVER = r'^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>[a-zA-Z0-9.-]+))?(?:\+(?P<buildmeta>[a-zA-Z0-9.-]+))?$'

versions = ['1.2.3', '2.0.0-alpha.1', '1.0.0+build.42', '1.0.0-beta+exp.sha.5114f85']

for v in versions:
    m = re.match(SEMVER, v)
    if m:
        print(f"{v} → major={m.group('major')}, pre={m.group('prerelease')}")
Enter fullscreen mode Exit fullscreen mode

Named groups make the captures readable. The pack includes variants for loose semver (accepts v1.2.3 prefix) and range matching (for tools like npm/pip that use >=, ~=, ^).


The part that actually saves time: regex_explainer.py

The interactive tools are the piece I use most. regex_explainer.py takes any regex and breaks it down:

$ python scripts/regex_explainer.py '(?:https?://)?(?:www\.)?([^/\s]+)'

Pattern: (?:https?://)?(?:www\.)?([^/\s]+)

  (?:         Non-capturing group start
    https?    Literal 'http', then 's' is optional (? = zero or one)
    ://       Literal '://'
  )?          End non-capturing group, entire group is optional
  (?:         Non-capturing group start
    www\.     Literal 'www' + escaped dot (. in regex matches anything; \. matches only a literal dot)
  )?          End non-capturing group, optional
  (           Capturing group 1 start
    [^/\s]+   One or more characters that are NOT '/' and NOT whitespace
  )           Capturing group 1 end
Enter fullscreen mode Exit fullscreen mode

This is the tool I wish I had when I was learning regex. It's also useful for auditing patterns someone else wrote.


pattern_search.py — the reference you'll actually use

$ python scripts/pattern_search.py "log"

Found 18 results in category 'log-parsing':
  [01] Apache/Nginx Combined Log
  [02] Apache/Nginx Common Log
  [03] Syslog RFC 3164
  [04] Syslog RFC 5424
  ...

$ python scripts/pattern_search.py --category "code"

Found 20 results in category 'code-parsing':
  [01] Python Function Definition
  [02] JavaScript/TypeScript Function
  [03] Python Class Definition
  ...
Enter fullscreen mode Exit fullscreen mode

Who this is for

If you write Python and regularly need to parse logs, validate input, extract structured data from text, or build any tooling that touches text at all — this is the reference that replaces the pile of Stack Overflow bookmarks.

Backend developers parsing logs and validating API inputs. DevOps engineers writing grep/sed pipelines. Data engineers cleaning text data. Anyone who uses regex weekly but reaches for Google every time.


The pack

Regex Master Pack on Gumroad →$19, one-time purchase.

150+ patterns, 8 cheatsheets, 5 Python scripts. Markdown format (works in any editor, terminal, or note-taking app). Patterns also in JSON for programmatic access. Python 3.8+, stdlib only.

30-day refund if it's not useful.


See also: Python Automation Cookbook — 25 production-ready Python scripts for the automation tasks you keep rebuilding.

Top comments (0)