Regular Expressions: The Guide I Always Wanted (2026)

#javascript #beginners #programming #tutorial

Regular Expressions: The Guide I Always Wanted (2026)

Regex looks like gibberish until it clicks — then it's a superpower. Here's the guide that makes regex actually make sense.

The Mental Model

Think of regex as a pattern-matching machine:

Input string:  "hello@example.com"
Pattern:       \w+@\w+\.\w+

The engine scans left-to-right, trying to match your pattern
at each position. When the full pattern matches → SUCCESS!

Key insight: Regex isn't about "finding text" — it's about
defining the SHAPE of what you're looking for.

Character Classes: What Are We Matching?

// Literal characters (match exactly)
/hello/           // Matches "hello" in "say hello there"

// Dot (wildcard) — matches ANY single character except newline
/h.t/             // "hat", "hot", "hit", but NOT "h" or "ht"

// Character classes — match ONE character from a set:
/[aeiou]/         // Any vowel
/[a-z]/           // Any lowercase letter
/[A-Z]/           // Any uppercase letter
/[0-9]/           // Any digit (same as /\d/)
/[a-zA-Z0-9]/     // Any alphanumeric (same as /\w/)
/[^0-9]/          // NOT a digit (negation with ^ inside [])
/[^aeiou]/        // Not a vowel

// Shorthand classes (use these!):
/\d/    // Digit: [0-9]
/\D/    // Non-digit: [^0-9]
/\w/    // Word char: [a-zA-Z0-9_]
/\W/    // Non-word char: [^\w]
/\s/    // Whitespace: [ \t\r\n\f\v]
/\S/    // Non-whitespace: [^\s]

// Examples:
/\d{3}-\d{4}/     // Phone-like: "555-1234"
/[A-Z][a-z]+/      // Capitalized word: "Hello", "World"
/#[0-9a-fA-F]{6}/  // Hex color code: "#ff5500"

Quantifiers: How Many Times?

// Quantifiers apply to the PRECEDING element:
/a*/     // Zero or more a's ("" is valid match!)
/a+/     // One or more a's (at least one)
/a?/     // Zero or one a (optional)
/a{3}/   // Exactly 3 a's ("aaa")
/a{2,4}/ // 2 to 4 a's ("aa", "aaa", "aaaa")
/a{2,}/  // 2 or more a's (no upper limit)

// ⚠️ Greedy vs Lazy (CRITICAL concept!)
// GREEDY (default): Match as MUCH as possible
const html = '<div>first</div><div>second</div>';
html.match(/<div>.*<\/div>/);
// Matches: '<div>first</div><div>second</div>' (greedy eats everything!)

// LAZY: Match as LITTLE as possible (add ? after quantifier)
html.match(/<div>.*?<\/div>/);
// Matches: '<div>first</div>' (lazy stops at first opportunity)

// Real-world example: Extract content between quotes
const str = 'He said "hello" and she said "world"';
str.match(/"(.*?)"/g);  // ["\"hello\"", "\"world\""]
// Without lazy: would match from first " to last "

Anchors & Boundaries

// Anchors don't match characters — they match POSITIONS:
/^Hello/     // "Hello" at START of string only
/world$/     // "world" at END of string only
/^Hello world$/  // Exact full-string match

// Word boundaries (\b) — position between word char and non-word char:
/\bcat\b/    // Matches "cat" but NOT "catalog" or "scattered"
// Useful for whole-word search!

// Common patterns using anchors:
/^[a-z0-9._%+-]+@[a-z0-9.-]+\.[a-z]{2,}$/i  // Email validation
/^https?:\/\/[^\s]+$/i                        // URL detection
/^\d{4}-\d{2}-\d{2}$/                         // Date YYYY-MM-DD
/^[+-]?\d+(\.\d+)?$/                           // Integer or decimal number

Groups & Capturing

// Capturing groups (...) — extract parts of the match:
const date = '2024-06-11';
date.match(/(\d{4})-(\d{2})-(\d{2})/);
// Full match: "2024-06-11"
// Group 1 (year):  "2024"
// Group 2 (month): "06"
// Group 3 (day):   "11"

// Named capture groups (much more readable!):
const result = 'user@domain.com'.match(/(?<name>\w+)@(?<domain>\w+)\.(?<tld>\w+)/);
result.groups.name;     // "user"
result.groups.domain;   // "domain"
result.groups.tld;      // "com"

// Non-capturing group (?:...) — group without capturing:
/(?:https?:\/\/)?(www\.\w+\.\w+)/
// First group doesn't capture, second does

// Backreferences — refer to earlier captured group:
/(\w+) \1/            // Matches "word word" (repeated word!)
/<(\w+)>.*?<\/\1>/    // Matches <b>...</b> but NOT <b></i>

// Lookahead assertions (match based on what FOLLOWS):
/\d+(?= dollars)/      // Match numbers only if followed by " dollars"
// "I have 100 dollars" → matches "100"
// "I have 100 euros"   → no match

// Negative lookahead:
/\d+(?! dollars)/      // Match numbers NOT followed by " dollars"
// "I have 100 euros"   → matches "100"
// "I have 100 dollars" → no match

// Lookbehind (what PRECEDES):
/(?<=\$)\d+/           // Match digits preceded by $
// "Price: $99" → matches "99"
/(?<!\$)\d+/           // Match digits NOT preceded by $
// "Price: $99, qty 5" → matches "5" only

Practical Examples You'll Actually Use

// === Validation ===
function validateEmail(email) {
  return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);
  // Simple but practical. Don't try RFC-compliant email regex!
}

function validatePassword(password) {
  return /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[!@#$%^&*]).{12,}$/.test(password);
}

function validateURL(url) {
  try { new URL(url); return true; }
  catch { return false; } // URL constructor > regex for URLs!
}

// === Extraction & Transformation ===
// Extract all hashtags:
'Check out #JavaScript and #WebDev'.match(/#\w+/g);
// ["#JavaScript", "#WebDev"]

// Format phone number:
'5551234567'.replace(/(\d{3})(\d{3})(\d{4})/, '($1) $2-$3');
// "(555) 123-4567"

// CamelCase to snake_case:
'myVariableName'.replace(/[A-Z]/g, '_$&').toLowerCase();
// "my_variable_name"

// Truncate words to max length:
'this is a very long sentence that needs shortening'.replace(/\b(\w{1,8})\s?\b/g, '$1\n');
// Break into lines where words exceed 8 chars

// Remove duplicate lines:
text.split('\n').filter((line, i, arr) => arr.indexOf(line) === i).join('\n');

// Find unquoted strings (complex!):
// Match strings not surrounded by quotes:
/'[^']*'|"[^"]*"|(\b\w{3,}\b)/g  // Then filter out captured group 1

// === Search & Replace in Code ===
// Add console.log before each function line (for debugging):
code.replace(/^(function\s+\w+)/gm, 'console.log("$1"); $1');

// Convert require() to import:
code.replace(/const\s+(\w+)\s*=\s*require\(['"](.*)['"]\)/g, 'import $1 from "$2";');

// Remove console.log statements (before deploy):
code.replace(/^\s*console\.(log|debug|info)\(.*\);\s*$/gm, '');

Testing & Debugging Regex

// Test in browser console or Node.js:
const pattern = /your-regex-here/;
pattern.test('test string');   // true/false
'test string'.match(pattern);  // Array of matches or null
'test string'.replaceAll(pattern, 'replacement');

// Debugging tips:
// 1. Start simple, build up piece by piece
// 2. Use regex101.com or regexr.com for visual debugging
// 3. Use .source to see the actual pattern string:
console.log(pattern.source);

// 4. Break complex patterns into pieces:
const WORD = /[a-zA-Z]+/;
const SPACE = /\s+/;
const sentencePattern = new RegExp(`^${WORD.source}(${SPACE.source}${WORD.source})*$`);

// 5. Common gotchas:
// - In JS strings, backslashes must be escaped: new RegExp('\\d+') not new RegExp('\d+')
// - . doesn't match newline by default; use [\s\S] instead if needed
// - Always use ^ and $ anchors for full-string validation
// - test() returns true on partial match; use anchors for exact matching

What's your favorite regex trick? What regex problem has been haunting you?

Follow @armorbreak for more practical developer guides.

DEV Community

Regular Expressions: The Guide I Always Wanted (2026)

Regular Expressions: The Guide I Always Wanted (2026)

The Mental Model

Character Classes: What Are We Matching?

Quantifiers: How Many Times?

Anchors & Boundaries

Groups & Capturing

Practical Examples You'll Actually Use

Testing & Debugging Regex

Top comments (0)