Peyton Green

Posted on Apr 20

150+ regex patterns for Python developers: stop rebuilding the same wheel

#python #regex #programming #productivity

Every Python developer I know has a folder somewhere called "regex_snippets" or "useful_patterns" or "patterns_to_remember.txt". Mine grew to 47 files over six years. Half of them were wrong. Three of them were the same URL pattern, each slightly different.

The problem with regex isn't that it's hard to write. It's that it's hard to remember. The syntax is compact enough that you forget it between uses. You Google the same log parser every two months. You copy the same email validator from the same Stack Overflow answer you copied from last year.

I finally sat down and went back through two years of production code and pulled out every regex pattern I'd written more than twice. Added the patterns I keep Googling. Got 150+. Organized them by category, added plain-English explanations, documented the edge cases, and packaged it as something I can actually keep open.

That's the Regex Master Pack.

What's in it

8 cheatsheets covering:

Core syntax (characters, quantifiers, anchors, groups, flags — the complete reference on one page)
Lookaheads and lookbehinds — the stuff that trips everyone up
Common gotchas (greedy vs lazy, backtracking, catastrophic patterns, Unicode pitfalls)
Language differences (Python, JavaScript, Go, Java, Rust, PCRE)
Performance guide (writing fast regex, when NOT to use regex)
Regex in CLI tools (grep, sed, awk, ripgrep, perl one-liners)
Regex in editors (VS Code, Vim, JetBrains, Sublime find-and-replace)
Testing and debugging (how to test regex, debug backtracking, workflows)

150+ patterns in 10 categories:

Category	Count
Email, URL, web	18
Dates and times	16
Numbers and currency	15
Phone and address	14
Passwords and auth	10
Code parsing	20
Log parsing	18
Data extraction	15
Text processing	14
DevOps and infra	15

Every pattern includes the regex, a plain-English explanation of each token, example test strings (matches and non-matches), Python usage, and documented edge cases.

5 interactive Python scripts (stdlib only, no pip installs):

regex_tester.py — interactive REPL, paste a pattern and test strings, see matches highlighted with group details
regex_explainer.py — feed in any regex, get a plain-English breakdown of every token
log_parser_generator.py — describe your log format, get a working regex + Python parser
pattern_search.py — search the entire pattern library by keyword or category
regex_quiz.py — 50 progressive challenges to build muscle memory

A few examples

Log parsing: Apache/Nginx combined format

This is the one I rebuild from memory every time I need it. Now I don't:

import re

APACHE_COMBINED = r'^(\S+)\s+\S+\s+(\S+)\s+\[([^\]]+)\]\s+"(\S+)\s+(\S+)\s+(\S+)"\s+(\d{3})\s+(\d+|-)\s+"([^"]*)"\s+"([^"]*)"'

log_line = '192.168.1.1 - frank [10/Oct/2023:13:55:36 -0700] "GET /api/users HTTP/1.1" 200 1234 "https://example.com" "Mozilla/5.0"'

m = re.match(APACHE_COMBINED, log_line)
if m:
    ip, user, timestamp, method, path, proto, status, bytes_, referer, ua = m.groups()
    print(f"{method} {path} → {status}")
    # GET /api/users → 200

The pattern captures: IP, authenticated user, timestamp, HTTP method, path, protocol, status code, bytes transferred, referer, user agent — named groups version also in the pack.

Code parsing: Python function signatures

Useful for static analysis, documentation generators, or any tooling that needs to understand Python code structure:

PYTHON_FUNC = r'^\s*(?:async\s+)?def\s+(\w+)\s*\(([^)]*)\)\s*(?:->\s*([^:]+))?\s*:'

examples = [
    'def simple(x, y):',
    'async def fetch_data(url: str, timeout: int = 30) -> dict:',
    '    def nested(self) -> None:',
]

for line in examples:
    m = re.search(PYTHON_FUNC, line, re.MULTILINE)
    if m:
        name, params, return_type = m.group(1), m.group(2), m.group(3)
        print(f"fn={name}, params={params!r}, returns={return_type!r}")

Handles sync and async, optional return type annotation, indented (nested/method) definitions.

Data extraction: YAML frontmatter

Every static site generator, Jekyll template, and Hugo theme uses this. Here's a pattern that actually handles the edge cases:

YAML_FRONTMATTER = r'^---\s*\n(.*?)\n---\s*\n'

content = """---
title: My Article
tags: python, regex
published: true
---

Article body starts here.
"""

m = re.search(YAML_FRONTMATTER, content, re.DOTALL)
if m:
    frontmatter = m.group(1)
    # Parse the captured YAML string with yaml.safe_load()

The re.DOTALL flag is required — without it, . won't match newlines and the pattern fails on multi-line frontmatter.

DevOps: semantic versioning

SEMVER = r'^(?P<major>0|[1-9]\d*)\.(?P<minor>0|[1-9]\d*)\.(?P<patch>0|[1-9]\d*)(?:-(?P<prerelease>[a-zA-Z0-9.-]+))?(?:\+(?P<buildmeta>[a-zA-Z0-9.-]+))?$'

versions = ['1.2.3', '2.0.0-alpha.1', '1.0.0+build.42', '1.0.0-beta+exp.sha.5114f85']

for v in versions:
    m = re.match(SEMVER, v)
    if m:
        print(f"{v} → major={m.group('major')}, pre={m.group('prerelease')}")

Named groups make the captures readable. The pack includes variants for loose semver (accepts v1.2.3 prefix) and range matching (for tools like npm/pip that use >=, ~=, ^).

The part that actually saves time: `regex_explainer.py`

The interactive tools are the piece I use most. regex_explainer.py takes any regex and breaks it down:

$ python scripts/regex_explainer.py '(?:https?://)?(?:www\.)?([^/\s]+)'

Pattern: (?:https?://)?(?:www\.)?([^/\s]+)

  (?:         Non-capturing group start
    https?    Literal 'http', then 's' is optional (? = zero or one)
    ://       Literal '://'
  )?          End non-capturing group, entire group is optional
  (?:         Non-capturing group start
    www\.     Literal 'www' + escaped dot (. in regex matches anything; \. matches only a literal dot)
  )?          End non-capturing group, optional
  (           Capturing group 1 start
    [^/\s]+   One or more characters that are NOT '/' and NOT whitespace
  )           Capturing group 1 end

This is the tool I wish I had when I was learning regex. It's also useful for auditing patterns someone else wrote.

`pattern_search.py` — the reference you'll actually use

$ python scripts/pattern_search.py "log"

Found 18 results in category 'log-parsing':
  [01] Apache/Nginx Combined Log
  [02] Apache/Nginx Common Log
  [03] Syslog RFC 3164
  [04] Syslog RFC 5424
  ...

$ python scripts/pattern_search.py --category "code"

Found 20 results in category 'code-parsing':
  [01] Python Function Definition
  [02] JavaScript/TypeScript Function
  [03] Python Class Definition
  ...

Who this is for

If you write Python and regularly need to parse logs, validate input, extract structured data from text, or build any tooling that touches text at all — this is the reference that replaces the pile of Stack Overflow bookmarks.

Backend developers parsing logs and validating API inputs. DevOps engineers writing grep/sed pipelines. Data engineers cleaning text data. Anyone who uses regex weekly but reaches for Google every time.

The pack

Regex Master Pack on Gumroad → — $19, one-time purchase.

150+ patterns, 8 cheatsheets, 5 Python scripts. Markdown format (works in any editor, terminal, or note-taking app). Patterns also in JSON for programmatic access. Python 3.8+, stdlib only.

30-day refund if it's not useful.

See also: Python Automation Cookbook — 25 production-ready Python scripts for the automation tasks you keep rebuilding.

DEV Community

150+ regex patterns for Python developers: stop rebuilding the same wheel

What's in it

A few examples

Log parsing: Apache/Nginx combined format

Code parsing: Python function signatures

Data extraction: YAML frontmatter

DevOps: semantic versioning

The part that actually saves time: `regex_explainer.py`

`pattern_search.py` — the reference you'll actually use

Who this is for

The pack

Top comments (0)

What's in it

A few examples

Log parsing: Apache/Nginx combined format

Code parsing: Python function signatures

Data extraction: YAML frontmatter

DevOps: semantic versioning

The part that actually saves time: regex_explainer.py

pattern_search.py — the reference you'll actually use

Who this is for

The pack

The part that actually saves time: `regex_explainer.py`

`pattern_search.py` — the reference you'll actually use