Regular Expressions: The Guide I Always Wanted (2026)

#javascript #beginners #programming #tutorial

Regular Expressions: The Guide I Always Wanted (2026)

Regex isn't magic. It's a mini-language for pattern matching, and once you learn the basics, you'll use it everywhere.

The Mental Model

A regex is a pattern that matches (or doesn't match) text.

Think of it as: "Find all strings that look like THIS"

Components:
→ Literals: exact characters to match (a, b, 1, @)
→ Character classes: WHAT can match ([a-z], \d, \w)
→ Quantifiers: HOW MANY times (+, *, ?, {3})
→ Anchors: WHERE in the string (^, $, \b)
→ Groups: capture parts of the match ((...))
→ Alternation: OR logic (|)

The secret to reading regex: left to right, character by character.

Character Classes

// Exact match
/hello/                    // Matches "hello" in "say hello there"
/Hello/                    // Case sensitive! Won't match "hello"

// Dot (matches any single character EXCEPT newline)
/h.t/                     // "hat", "hot", "hit", "h3t"... but NOT "ht"

// Character classes [ ] — match ONE from the set
/[aeiou]/                 // Any vowel
/[a-z]/                   // Any lowercase letter
/[A-Z]/                   // Any uppercase letter
/[a-zA-Z0-9]/            // Alphanumeric
/[^0-9]/                 // NOT a digit (^ inside [] = negation)

// Shorthand classes
/\d/                      // Digit = [0-9]
/\D/                      // Non-digit = [^0-9]
/\w/                      // Word char = [a-zA-Z0-9_]
/\W/                      // Non-word char
/\s/                      // Whitespace (space, tab, newline)
/\S/                      // Non-whitespace

// Examples
/\d{5}/                   // Exactly 5 digits (ZIP code)
/[A-Z]\w+/                // Capital letter + word chars (PascalCase identifier)
/[^ \t]+/                // One or more non-space/tab chars (a word)

Quantifiers

// ? — Zero or one (optional)
/colou?r/                // Matches "color" AND "colour"
/https?/                  // Matches "http" AND "https"

// * — Zero or more (greedy: matches as many as possible)
/a*c/                     // "c", "ac", "aac", "aaaac"...

// + — One or more
/\d+/                     // One or more digits: "123", "007"
/\S+@\S+\.\S+/           // Basic email pattern (simplified!)

// {n} — Exactly n times
/\d{4}/                   // Exactly 4 digits (year)
/[A-Z]{2}/               // Exactly 2 uppercase letters (country code)

// {n,m} — Between n and m times
/\d{1,3}/                // 1-3 digits (IP address octet)
/\w{3,16}/               // Username: 3-16 word characters

// {n,} — n or more times
/\d{2,}/                  // 2+ digits

// ⚠️ Greedy vs Lazy (CRITICAL!)
/<.+>/                    // GREEDY: "<div>hi</div>" → matches ENTIRE string
/<.+?>/                   // LAZY: "<div>hi</div>" → matches "<div>" only
// Add ? after any quantifier to make it lazy (match minimum)

Anchors & Boundaries

/^Hello/                  // "Hello" at START of string only
/world$/                  // "world" at END of string only
/^Hello world$/           // Exact full-string match

// Word boundary (\b) — position between word and non-word
/\bcat\b/                // Matches "cat" but NOT "catalog" or "scatter"
/\b\w+\b/                // Match whole words only

// String boundaries (JavaScript)
/^pattern/m               // With 'm' flag, ^/$ match start/end of each LINE
/^$/m                     // Empty lines (useful for removing blank lines)

// Lookahead (match IF followed by...)
/foo(?=bar)/              // "foo" only if followed by "bar" ("foobar" ✓, "food" ✗)
/foo(?!bar)/              // "foo" only if NOT followed by "bar" ("food" ✓, "foobar" ✗)

// Lookbehind (match IF preceded by...)
/(?<=\$)\d+/             // Digits only if preceded by "$" ($100 ✓, 100 ✗)
/(?<!\$)\d+/             // Digits only if NOT preceded by "$" (100 ✓, $100 ✗)

Groups & Capturing

// Capturing groups (...) — extract matched parts
/(\d{4})-(\d{2})-(\d{2})/    // Date pattern with 3 groups
const str = "Date: 2026-05-30";
const match = str.match(/(\d{4})-(\d{2})-(\d{2})/);
if (match) {
  match[0];   // "2026-05-30" (full match)
  match[1];   // "2026" (group 1: year)
  match[2];   // "05" (group 2: month)
  match[3];   // "30" (group 3: day)
}

// Named groups (much more readable!)
/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/
const match = str.match(/(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/);
if (match) {
  const { year, month, day } = match.groups; // Destructured!
}

// Non-capturing group (?:...) — group without capturing
/(?:https?:\/\/)?(?:www\.)?example\.com/
// Groups exist for grouping only, not extracted

// Backreferences — refer to earlier group in same pattern
/(\w+) \1/               // Repeated word: "the the" ✓, "the cat" ✗
/"([^"]*)"/              // Quoted string, extract content without quotes
/([A-Z])\w* \1\w*/      // Words starting with same letter: "Big Bad" ✓

// Replacement with backreferences
"John Smith".replace(/(\w+) (\w+)/, "$2, $1"); // → "Smith, John"
// Clean up phone number: "(555) 123-4567".replace(/\D/g, "") → "5551234567"

Practical Patterns You'll Actually Use

// Email (practical, not RFC-compliant — that's impossible)
const emailRegex = /^[^\s@]+@[^\s@]+\.[^\s@]+$/;
emailRegex.test("user@example.com");     // true
emailRegex.test("invalid@email");         // false

// URL (basic)
const urlRegex = /^https?:\/\/[^\s<>"]+$/i;
urlRegex.test("https://example.com/path?q=1"); // true

// Username (3-20 alphanumeric + underscore)
const usernameRegex = /^[a-zA-Z0-9_]{3,20}$/;

// Password (at least 8 chars, mixed case, number, special)
const pwdRegex = /^(?=.*[a-z])(?=.*[A-Z])(?=.*\d)(?=.*[@$!%*?&])[A-Za-z\d@$!%*?&]{8,}$/;

// IPv4 address
const ipRegex = /^((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)$/;

// Hex color code
const hexRegex = /^#?([a-fA-F0-9]{6}|[a-fA-F0-9]{3})$/;

// Date formats
const dateRegex = /^\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])$/; // YYYY-MM-DD

// Slug (URL-friendly string)
const slugRegex = /^[a-z0-9]+(?:-[a-z0-9]+)*$/;
slugRegex.test("my-blog-post-2026"); // true
slugRegex.test("My_Blog_Post");       // false

// Credit card (Luhn algorithm needed too, this is format only)
const ccRegex = /\b(?:4[0-9]{12}(?:[0-9]{3})?|5[1-5][0-9]{14}|3[47][0-9]{13}|6(?:011|5[0-9]{2})[0-9]{12})\b/;

// Time (24h format)
const timeRegex = /^([01]\d|2[0-3]):([0-5]\d)(:[0-5]\d)?$/;

// Extract hashtags from text
const hashtagRegex = /#\w+/g;
"Check out #coding and #webdev #2026".match(hashtagRegex);
// → ["#coding", "#webdev", "#2026"]

// Remove HTML tags (basic)
const stripHtml = /<[^>]*>/g;
"<p>Hello <b>world</b></p>".replace(stripHtml, ""); // → "Hello world"

// Trim whitespace (alternative to .trim())
const trimRegex = /^\s+|\s+$/g;
"  hello  ".replace(trimRegex, ""); // → "hello"

// CamelCase to snake_case
"camelCaseString".replace(/[A-Z]/g, '_$&').toLowerCase();
// → "camel_case_string"

// Snake_case to camelCase
"snake_case_string".replace(/_([a-z])/g, (_, c) => c.toUpperCase());
// → "snakeCaseString"

// Format number with commas
"1000000".replace(/\B(?=(\d{3})+(?!\d))/g, ","); // → "1,000,000"

Regex Methods in JavaScript

const text = "Hello World! Hello Universe!";
const pattern = /hello/gi; // g = global, i = case-insensitive, m = multiline

// test() — does it match? (boolean)
pattern.test(text);         // true

// exec() — find match with details (one at a time, even with /g)
let match;
while ((match = pattern.exec(text)) !== null) {
  console.log(match[0]);     // Matched text
  console.log(match.index);  // Position where it started
}

// String.match() — all matches (with /g returns array of strings)
text.match(pattern);         // ["Hello", "Hello"]

// String.matchAll() — ALL matches with groups (requires /g!)
for (const m of text.matchAll(/(\w+)!/g)) {
  console.log(m[0]);   // Full match: "World!"
  console.log(m[1]);   // Group 1: "World"
}

// String.replace() — replace matches
text.replace(/hello/gi, "hi");     // "hi World! hi Universe!"

// Replace with function (powerful!)
"price: 100, tax: 50".replace(/\d+/g, (num) => `$${Number(num) * 1.1.toFixed(2)}`);
// → "price: $110.00, tax: $55.00"

// String.replaceAll() — simpler than /g for fixed strings
"a b c d e".replaceAll(" ", "-"); // "a-b-c-d-e"

// String.split() — split by regex
"a, b,  c,   d".split(/\s*,\s*/); // ["a", "b", "c", "d"]
// Trims spaces around commas!

// String.search() — find position of first match
text.search(/world/i);        // 6 (index of "World")

// Flags:
// g — global (find all matches)
// i — case insensitive
// m — multiline (^ and $ match per line)
// s — dotAll (dot matches newlines too)
// u — Unicode support (emoji, etc.)
// y — sticky (matches only from lastIndex)

Debugging Regex

// When your regex doesn't work:

// 1. Break it into pieces
// Test each part separately
/\d{4}-\d{2}-\d{2}/
// Does \d{4} work? Yes → try \d{4}-
// Does that work? Yes → keep building up

// 2. Use an online tool (regex101.com, regexr.com)
// Visual breakdown, explanation, test cases

// 3. Common gotchas:
// Forgot to escape: . * + ? ^ $ | \ ( ) [ ] { } /
// → Use RegExp.escape() equivalent or double-backslash in strings

// In JavaScript strings, backslashes need escaping:
new RegExp("\\d{4}-\\d{2}-\\d{2}")  // NOT "\d{4}" (that's just "d4")

// Better: use regex literal when possible: /\d{4}-\d{2}-\d{2}/

// 4. Greedy quantifier eating too much
/<div>.*<\/div>/   // Might match across multiple divs!
/<div>.*?<\/div>/  // Lazy: stops at first </div>

// 5. Forgetting /g flag
"hello hello".replace(/l/, "L");  // Only first: "heLlo hello"
"hello hello".replace(/l/g, "L"); // All: "heLLo heLLo"

// 6. ^ and $ behavior
// Without /m: matches start/end of entire string
// With /m: matches start/end of each LINE

What's the most useful regex trick you know? What regex nightmare have you survived?

Follow @armorbreak for more practical developer guides.

DEV Community

Regular Expressions: The Guide I Always Wanted (2026)

Regular Expressions: The Guide I Always Wanted (2026)

The Mental Model

Character Classes

Quantifiers

Anchors & Boundaries

Groups & Capturing

Practical Patterns You'll Actually Use

Regex Methods in JavaScript

Debugging Regex

Top comments (0)