Snappy Tools

Posted on May 30

How to Use Regex for Text Processing: Practical Examples in JavaScript and Python

#beginners #javascript #python #webdev

Regular expressions solve text problems that string operations can't handle cleanly. Here's a practical guide with real examples rather than theory.

The basics

A regex pattern is a sequence of characters that defines a search pattern. Some characters have special meaning:

Character	Meaning
`.`	Any character (except newline by default)
`*`	Zero or more of the preceding
`+`	One or more of the preceding
`?`	Zero or one of the preceding
`^`	Start of string (or line with `m` flag)
`$`	End of string (or line with `m` flag)
`\d`	Any digit (0-9)
`\w`	Word character (letter, digit, underscore)
`\s`	Whitespace
`\D`	Non-digit
`\W`	Non-word character
`[abc]`	Any of a, b, c
`[a-z]`	Any lowercase letter
`[^abc]`	Not a, b, or c
`(abc)`	Capturing group
`(?:abc)`	Non-capturing group
`a	b`

Practical examples

Validate an email address

const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
emailRegex.test('user@example.com');  // true
emailRegex.test('not-an-email');      // false

Note: This is a simple, practical check. A fully RFC 5321-compliant email regex is hundreds of characters long and overkill for most uses. The above catches the most common invalid formats.

Extract URLs from text

const urlRegex = /https?:\/\/[^\s<>"]+/g;
const text = 'Visit https://example.com or http://other.org for more.';
const urls = text.match(urlRegex);
// → ['https://example.com', 'http://other.org']

Validate a phone number (flexible)

// Matches formats: 555-555-5555, (555) 555-5555, 555 555 5555, +1 555 555 5555
const phoneRegex = /^[\+]?[(]?[0-9]{3}[)]?[-\s\.]?[0-9]{3}[-\s\.]?[0-9]{4,6}$/;

Replace all occurrences

// Using replace with /g flag
const result = 'hello world hello'.replace(/hello/g, 'hi');
// → 'hi world hi'

// With capture groups
const date = '2026-05-30';
const formatted = date.replace(/(\d{4})-(\d{2})-(\d{2})/, '$2/$3/$1');
// → '05/30/2026'

Split on multiple delimiters

'a,b;c d'.split(/[,;\s]+/);
// → ['a', 'b', 'c', 'd']

Extract named capture groups

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-05-30'.match(pattern);
const { year, month, day } = match.groups;
// year='2026', month='05', day='30'

Strip HTML tags

'<p>Hello <strong>world</strong></p>'.replace(/<[^>]*>/g, '');
// → 'Hello world'

Note: This is fine for simple cases, but for complex HTML (nested tags, attribute values with >), use a proper HTML parser.

Validate hex color codes

const hexColor = /^#([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})$/;
hexColor.test('#fff');      // true
hexColor.test('#2f855a');   // true
hexColor.test('#GGGGGG');   // false (G is not hex)
hexColor.test('2f855a');    // false (missing #)

Extract markdown links

const mdLinkRegex = /\[([^\]]+)\]\(([^)]+)\)/g;
const markdown = 'See [example](https://example.com) and [doc](docs.html).';
for (const match of markdown.matchAll(mdLinkRegex)) {
  console.log(`Text: ${match[1]}, URL: ${match[2]}`);
}
// Text: example, URL: https://example.com
// Text: doc, URL: docs.html

Python regex

Python uses the re module:

import re

# Match (only at start of string)
re.match(r'\d+', '123abc')  # match object
re.match(r'\d+', 'abc123')  # None

# Search (anywhere in string)
re.search(r'\d+', 'abc123')  # matches '123'

# Find all
re.findall(r'\d+', 'abc 123 def 456')
# → ['123', '456']

# Substitute
re.sub(r'\s+', '-', 'hello world foo')
# → 'hello-world-foo'

# With flags
re.findall(r'python', 'Python is PYTHON', re.IGNORECASE)
# → ['Python', 'PYTHON']

# Named groups
pattern = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
m = pattern.match('2026-05-30')
m.group('year')  # '2026'
m.groupdict()    # {'year': '2026', 'month': '05', 'day': '30'}

# Split
re.split(r'[,;\s]+', 'a,b;c d')
# → ['a', 'b', 'c', 'd']

Flags

Flag	JavaScript	Python	Effect
Case insensitive	`/pattern/i`	`re.IGNORECASE`	`A` matches `a`
Global (find all)	`/pattern/g`	(default in findall)	Don't stop at first match
Multiline	`/pattern/m`	`re.MULTILINE`	`^` and `$` match line boundaries
Dotall	`/pattern/s`	`re.DOTALL`	`.` matches newlines
Extended	–	`re.VERBOSE`	Allows whitespace/comments in pattern

Lookahead and lookbehind

// Positive lookahead: match 'price' only if followed by '$'
/price(?=\$)/.test('price$100')  // true
/price(?=\$)/.test('price100')   // false

// Negative lookahead: match 'price' NOT followed by '$'
/price(?!\$)/.test('price100')   // true

// Positive lookbehind: match digits preceded by '$'
/(?<=\$)\d+/.exec('$100')  // matches '100'

// Negative lookbehind
/(?<!\$)\d+/.exec('100')   // matches '100' (no $ before)

Common gotchas

Greedy vs lazy:

'<a><b><c>'.match(/<.+>/)[0]   // '<a><b><c>' (greedy — matches as much as possible)
'<a><b><c>'.match(/<.+?>/)[0]  // '<a>'       (lazy — matches as little as possible)

Escaping special characters:

// To match a literal '.', escape it
'1.2.3'.split('.')   // ['1', '2', '3'] — string split, works
'1.2.3'.split(/\./)  // ['1', '2', '3'] — regex, escaped dot
'1.2.3'.split(/./)   // ['', '', '', '', '', ''] — unescaped, matches any char

The regex101.com shortcut: For complex patterns, paste into regex101.com — it shows match groups, flags, and explanations inline, and lets you test against multiple strings.

Regex has a reputation for being hard to read, but for the common cases (validation, extraction, replacement), simple patterns solve 90% of the work. Start with the simplest pattern that works and add complexity only when the simpler version fails.

DEV Community