DEV Community

Snappy Tools
Snappy Tools

Posted on

How to Use Regex for Text Processing: Practical Examples in JavaScript and Python

Regular expressions solve text problems that string operations can't handle cleanly. Here's a practical guide with real examples rather than theory.

The basics

A regex pattern is a sequence of characters that defines a search pattern. Some characters have special meaning:

Character Meaning
. Any character (except newline by default)
* Zero or more of the preceding
+ One or more of the preceding
? Zero or one of the preceding
^ Start of string (or line with m flag)
$ End of string (or line with m flag)
\d Any digit (0-9)
\w Word character (letter, digit, underscore)
\s Whitespace
\D Non-digit
\W Non-word character
[abc] Any of a, b, c
[a-z] Any lowercase letter
[^abc] Not a, b, or c
(abc) Capturing group
(?:abc) Non-capturing group
`a b`

Practical examples

Validate an email address

const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
emailRegex.test('user@example.com');  // true
emailRegex.test('not-an-email');      // false
Enter fullscreen mode Exit fullscreen mode

Note: This is a simple, practical check. A fully RFC 5321-compliant email regex is hundreds of characters long and overkill for most uses. The above catches the most common invalid formats.

Extract URLs from text

const urlRegex = /https?:\/\/[^\s<>"]+/g;
const text = 'Visit https://example.com or http://other.org for more.';
const urls = text.match(urlRegex);
// → ['https://example.com', 'http://other.org']
Enter fullscreen mode Exit fullscreen mode

Validate a phone number (flexible)

// Matches formats: 555-555-5555, (555) 555-5555, 555 555 5555, +1 555 555 5555
const phoneRegex = /^[\+]?[(]?[0-9]{3}[)]?[-\s\.]?[0-9]{3}[-\s\.]?[0-9]{4,6}$/;
Enter fullscreen mode Exit fullscreen mode

Replace all occurrences

// Using replace with /g flag
const result = 'hello world hello'.replace(/hello/g, 'hi');
// → 'hi world hi'

// With capture groups
const date = '2026-05-30';
const formatted = date.replace(/(\d{4})-(\d{2})-(\d{2})/, '$2/$3/$1');
// → '05/30/2026'
Enter fullscreen mode Exit fullscreen mode

Split on multiple delimiters

'a,b;c d'.split(/[,;\s]+/);
// → ['a', 'b', 'c', 'd']
Enter fullscreen mode Exit fullscreen mode

Extract named capture groups

const pattern = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const match = '2026-05-30'.match(pattern);
const { year, month, day } = match.groups;
// year='2026', month='05', day='30'
Enter fullscreen mode Exit fullscreen mode

Strip HTML tags

'<p>Hello <strong>world</strong></p>'.replace(/<[^>]*>/g, '');
// → 'Hello world'
Enter fullscreen mode Exit fullscreen mode

Note: This is fine for simple cases, but for complex HTML (nested tags, attribute values with >), use a proper HTML parser.

Validate hex color codes

const hexColor = /^#([0-9A-Fa-f]{3}|[0-9A-Fa-f]{6})$/;
hexColor.test('#fff');      // true
hexColor.test('#2f855a');   // true
hexColor.test('#GGGGGG');   // false (G is not hex)
hexColor.test('2f855a');    // false (missing #)
Enter fullscreen mode Exit fullscreen mode

Extract markdown links

const mdLinkRegex = /\[([^\]]+)\]\(([^)]+)\)/g;
const markdown = 'See [example](https://example.com) and [doc](docs.html).';
for (const match of markdown.matchAll(mdLinkRegex)) {
  console.log(`Text: ${match[1]}, URL: ${match[2]}`);
}
// Text: example, URL: https://example.com
// Text: doc, URL: docs.html
Enter fullscreen mode Exit fullscreen mode

Python regex

Python uses the re module:

import re

# Match (only at start of string)
re.match(r'\d+', '123abc')  # match object
re.match(r'\d+', 'abc123')  # None

# Search (anywhere in string)
re.search(r'\d+', 'abc123')  # matches '123'

# Find all
re.findall(r'\d+', 'abc 123 def 456')
# → ['123', '456']

# Substitute
re.sub(r'\s+', '-', 'hello world foo')
# → 'hello-world-foo'

# With flags
re.findall(r'python', 'Python is PYTHON', re.IGNORECASE)
# → ['Python', 'PYTHON']

# Named groups
pattern = re.compile(r'(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})')
m = pattern.match('2026-05-30')
m.group('year')  # '2026'
m.groupdict()    # {'year': '2026', 'month': '05', 'day': '30'}

# Split
re.split(r'[,;\s]+', 'a,b;c d')
# → ['a', 'b', 'c', 'd']
Enter fullscreen mode Exit fullscreen mode

Flags

Flag JavaScript Python Effect
Case insensitive /pattern/i re.IGNORECASE A matches a
Global (find all) /pattern/g (default in findall) Don't stop at first match
Multiline /pattern/m re.MULTILINE ^ and $ match line boundaries
Dotall /pattern/s re.DOTALL . matches newlines
Extended re.VERBOSE Allows whitespace/comments in pattern

Lookahead and lookbehind

// Positive lookahead: match 'price' only if followed by '$'
/price(?=\$)/.test('price$100')  // true
/price(?=\$)/.test('price100')   // false

// Negative lookahead: match 'price' NOT followed by '$'
/price(?!\$)/.test('price100')   // true

// Positive lookbehind: match digits preceded by '$'
/(?<=\$)\d+/.exec('$100')  // matches '100'

// Negative lookbehind
/(?<!\$)\d+/.exec('100')   // matches '100' (no $ before)
Enter fullscreen mode Exit fullscreen mode

Common gotchas

Greedy vs lazy:

'<a><b><c>'.match(/<.+>/)[0]   // '<a><b><c>' (greedy — matches as much as possible)
'<a><b><c>'.match(/<.+?>/)[0]  // '<a>'       (lazy — matches as little as possible)
Enter fullscreen mode Exit fullscreen mode

Escaping special characters:

// To match a literal '.', escape it
'1.2.3'.split('.')   // ['1', '2', '3'] — string split, works
'1.2.3'.split(/\./)  // ['1', '2', '3'] — regex, escaped dot
'1.2.3'.split(/./)   // ['', '', '', '', '', ''] — unescaped, matches any char
Enter fullscreen mode Exit fullscreen mode

The regex101.com shortcut: For complex patterns, paste into regex101.com — it shows match groups, flags, and explanations inline, and lets you test against multiple strings.


Regex has a reputation for being hard to read, but for the common cases (validation, extraction, replacement), simple patterns solve 90% of the work. Start with the simplest pattern that works and add complexity only when the simpler version fails.

Top comments (0)