Regular Expressions: The Guide I Always Wanted (2026)

#javascript #beginners #programming #tutorial

Regular Expressions: The Guide I Always Wanted (2026)

Regex is everywhere — in your editor, in your code, in your data pipeline. Stop fearing it and start using it like a pro.

The Mental Model

Think of regex as a pattern-matching language:

"Find text that matches this pattern"
→ Pattern = rules describing what you want
→ Match = specific text that fits the rules

Core concept: Position matters
  The cat sat on the mat
  ^^^^                Start of string (anchor)
                    $$$                 End of string (anchor)

Every regex has two modes:
1. Literal matching: "cat" finds "cat" exactly
2. Pattern matching: "c.t" finds "cat", "cot", "cut" (. = any char)

The Essential Syntax

// === Character Classes ===
.       // Any single character except newline
\d      // Digit [0-9]
\D      // Non-digit [^0-9]
\w      // Word character [a-zA-Z0-9_]
\W      // Non-word character
\s      // Whitespace [ \t\r\n\f\v]
\S      // Non-whitespace

// Custom classes:
[abc]   // a, b, or c
[abc]   // Same as above
[^abc]  // NOT a, b, or c (negation)
[a-z]   // Any lowercase letter
[A-Z0-9] // Uppercase letter or digit

// Predefined (shorthand):
[0-9]  == \d
[a-zA-Z0-9_] == \w
[ \t\r\n] == \s

// === Anchors (position, not characters!) ===
^       // Start of string/line
$       // End of string/line
\b      // Word boundary (between \w and \W)
\B      // Non-word boundary

// Examples:
/^Hello/     // Matches "Hello world" but not "Say Hello"
/end$/      // Matches "the end" but not "ending"
/\bcat\b/   // Matches "cat" but not "catalog" or "scatter"

// === Quantifiers (how many times) ===
*       // Zero or more times (greedy: as many as possible)
+       // One or more times
?       // Zero or one time (optional)
{3}     // Exactly 3 times
{2,4}   // Between 2 and 4 times
{2,}    // 2 or more times
{,3}    // Up to 3 times

// ⚠️ Greedy vs Lazy:
// * + {n,m} are GREEDY (match as much as possible)
// Add ? to make them LAZY (match as little as possible)

"a<b>bold</b> and <b>italic</b>".match(/<b>.+<\/b>/)
// → "<b>bold</b> and <b>italic</b>" (greedy: goes to LAST </b>)

"a<b>bold</b> and <b>italic</b>".match(/<b>.+?<\/b>/)
// → "<b>bold</b>" (lazy: stops at FIRST </b>)

// === Groups and Alternation ===
(abc)   // Capturing group (remembers what matched)
(?:abc) // Non-capturing group (doesn't remember)
a|b|c   // a OR b OR c (alternation)

// Capturing groups let you extract parts of the match:
const match = "user@domain.com".match(/^(\w+)@(\w+\.\w+)$/);
if (match) {
  console.log(match[1]); // "user"
  console.log(match[2]); // "domain.com"
}

// Named capture groups (more readable!):
const emailMatch = "alice@example.com".match(
  /^(?<name>\w+)@(?<domain>\w+\.(?<tld>\w+))$/
);
if (emailMatch) {
  console.log(emailMatch.groups.name);    // "alice"
  console.log(emailMatch.groups.domain);  // "example.com"
  console.log(emailMatch.groups.tld);     // "com"
}

// Backreferences (refer to earlier captured group):
/(\w+)\s+\1/  // Matches "hello hello" but not "hello world"
// \1 refers to whatever the first group captured

Practical Examples You'll Use Every Day

// === Email validation (practical, not RFC-compliant) ===
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
// Explanation: non-@ chars @ non-@ chars . non-@ chars
// Simple, practical, catches 99% of real errors

// === URL extraction ===
const urlRegex = /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z()]{2,6}\b([-a-zA-Z0-9()@:%_\+.~#?&=]*)/;
const text = "Visit https://example.com/path?q=1 for more info";
text.match(urlRegex)[0]; // "https://example.com/path?q=1"

// === Phone number normalization ===
const phoneRegex = /^\+?(\d{1,3})?[-.\s]?\(?(?\d{3})\)?[-.\s]?\d{3}[-.\s]?\d{4}$/;
"+1 (555) 123-4567".replace(phoneRegex, "+$1$2$3"); // "+15551234567"

// === Password strength check ===
function checkPasswordStrength(password) {
  const checks = {
    length: password.length >= 8,
    lowercase: /[a-z]/.test(password),
    uppercase: /[A-Z]/.test(password),
    digit: /\d/.test(password),
    special: /[!@#$%^&*(),.?":{}|<>]/.test(password),
  };

  const score = Object.values(checkes.filter(Boolean)).length;
  if (score <= 2) return 'weak';
  if (score <= 4) return 'medium';
  return 'strong';
}

// === HTML tag stripping ===
const htmlRegex = /<[^>]*>/g;
"<p>Hello <b>world</b></p>".replace(htmlRegex, ''); // "Hello world"

// === Finding duplicate words ===
const duplicateRegex = /\b(\w+)\s+\1\b/gi;
"This is a test test of the regex".replace(duplicateRegex, '$1');
// "This is a test of the regex"

// === CSV parsing (simple cases) ===
const csvLine = '"Smith, John",25,"New York, NY",developer';
const csvRegex = /,(?=(?:(?:[^"]*"){2})*[^"]*$)/;
csvLine.split(csvRegex);
// ['"Smith, John"', '25', '"New York, NY"', 'developer']

// === Date format conversion ===
const dateStr = "05/31/2026";
dateRegex = /^(\d{2})\/(\d{2})\/(\d{4})$/;
const [, month, day, year] = dateStr.match(dateRegex);
console.log(`${year}-${month}-${day}`); // "2026-05-31"

// === Log level extraction ===
const logRegex = /^\[(DEBUG|INFO|WARN|ERROR)\]\[(\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2})\] (.+)/;
const logLine = "[INFO][2026-06-04T10:30:00] User logged in successfully";
const [, level, timestamp, message] = logLine.match(logRegex);

// === Number extraction from text ===
const numRegex = /-?\d+(?:\.\d+)?/g;
"The price is $42.99 and discount is 15%".match(numRegex);
// ["42.99", "15"]

Regex Methods in JavaScript

const str = "Hello World! Hello Universe!";
const pattern = /hello/gi; // g = global, i = case-insensitive

// test() — Does it match? (true/false)
pattern.test(str); // true

// exec() — Find match with details (returns null or match object)
let match;
while ((match = pattern.exec(str)) !== null) {
  console.log(`Found "${match[0]}" at index ${match.index}`);
}
// Found "Hello" at index 0
// Found "Hello" at index 14

// match() — Find all matches (with g flag returns array of strings)
str.match(pattern); // ["Hello", "Hello"]
// Without g flag: returns first match with groups info

// matchAll() — All matches with groups (ES2020!)
for (const m of str.matchAll(/hello (\w+)/gi)) {
  console.log(m[0], m[1]); // Full match, then captured group
}
// "Hello World!" "World"
// "Hello Universe!" "Universe"

// replace() — Replace matches
str.replace(/hello/gi, 'hi'); // "hi World! hi Universe!"
str.replace(/hello/gi, (match, offset) => `[${match.toUpperCase()}]`);
// "[HELLO] World! [HELLO] Universe!"

// replaceAll() — With replacement function (ES2021)
'2026-06-04'.replaceAll('-', '/'); // "2026/06/04"

// split() — Split by regex
"one,two;three,four".split(/[,;]/); // ["one", "two", "three", "four"]

// search() — Find index of first match
str.search(/world/i); // 6 (index where "world" starts)

Common Gotchas & How to Avoid Them

// ❌ Forgot global flag (only replaces first occurrence!)
"aaa".replace("a", "b"); // "baa" (only first!)
"aaa".replace(/a/g, "b"); // "bbb" (all!)

// ❌ Dot doesn't match newlines by default!
/multi\nline\nstring/.test("multi\nline\nstring"); // false!
// Fix: Use [\s\S] instead of . or enable dotall mode:
/multi[\s\S]*string/.test("multi\nline\nstring"); // true!

// ❌ Quantifiers are greedy (causes unexpected matches)
"<div><div>content</div></div>".replace(/<div>.*<\/div>/g, '');
// Removes EVERYTHING from first <div> to last </div>!
// Fix: Make it lazy with ?
/<div>.*?<\/div>/g

// ❌ Not escaping special characters
"price: $100 (USD)".replace(/\$(\d+)/, "$1 USD");
// If you forget to escape $: it means "end of string" in regex!

// Characters that MUST be escaped: \ ^ $ . | * + ? ( ) [ ] { } /
const escaped = str.replace(/[.*+?^${}()|[\]\\]/g, '\\$&');

// ❌ Using regex for complex parsing (HTML, JSON, etc.)
// Don't parse HTML with regex! Use a proper parser.
// Don't parse JSON with regex! Use JSON.parse.
// Regex is for PATTERN MATCHING, not parsing structured formats.

// ✅ Performance tip: Be specific!
/^[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}$/i  // Good (specific patterns)
/.+@.+\..+/                               // Bad (matches too much, slow on long strings)

// ✅ Compile regex once if reusing (in loops):
const EMAIL_REGEX = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
function validateEmail(email) {
  return EMAIL_REGEX.test(email); // Compiled once, reused
}
// vs (bad): new RegExp('...') inside loop (recompiles every iteration)