I used to treat regex as a black box. Copy a pattern from Stack Overflow, paste it in, pray it works. That approach holds up until the day it doesn't, and you're staring at a production bug caused by a pattern you never understood. So I forced myself to learn it properly, and the honest truth is that five patterns cover the vast majority of what I need day to day.
Here's each one broken down character by character.
1. Email validation (the practical version)
^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$
Let's walk through it.
^ anchors to the start of the string. Without this, the pattern could match an email embedded inside other text. [a-zA-Z0-9._%+-]+ matches one or more characters that are letters, digits, dots, underscores, percent signs, plus signs, or hyphens. This is the local part before the @.
@ matches the literal @ symbol. [a-zA-Z0-9.-]+ matches the domain name. \. matches a literal dot (the backslash escapes it because a bare dot matches any character). [a-zA-Z]{2,} matches the top-level domain with at least two letters. $ anchors to the end of the string.
I need to be upfront: this pattern does not comply with RFC 5322. The full email spec allows quoted strings, IP address domains, and characters that would surprise you. But this pattern catches 99.5% of real email addresses in the wild, and the edge cases it misses (like "quoted string"@example.com) are rare enough that this is the pattern most production codebases use.
2. URL matching
https?:\/\/[^\s/$.?#].[^\s]*
http matches the literal text. s? makes the s optional, matching both http and https. :\/\/ matches the :// separator — the forward slashes need escaping in some regex flavors (JavaScript regex literals require it, but new RegExp() doesn't).
[^\s/$.?#] matches any character that is not whitespace, a slash, dollar sign, dot, question mark, or hash. This ensures the domain starts with a valid character. . matches any single character. [^\s]* matches zero or more non-whitespace characters, greedily consuming the rest of the URL.
This is deliberately loose. For extracting URLs from plain text, it works well. A strict validator would need to handle ports, fragments, and encoded characters separately.
3. IPv4 addresses
\b(\d{1,3}\.){3}\d{1,3}\b
\b is a word boundary anchor. It prevents matching partial numbers embedded in longer strings (like serial numbers).
(\d{1,3}\.){3} is a group that matches one to three digits followed by a dot, repeated exactly three times. That covers the first three octets: 192.168.1.
\d{1,3} matches the final octet. The closing \b anchors the end.
The catch: this matches 999.999.999.999, which isn't a valid IP address. Each octet must be 0-255. Validating that range with pure regex is possible but ugly:
\b((25[0-5]|2[0-4]\d|[01]?\d\d?)\.){3}(25[0-5]|2[0-4]\d|[01]?\d\d?)\b
That pattern uses alternation to handle the three cases: 250-255, 200-249, and 0-199. It works, but this is a case where using regex for matching and code for validation is the cleaner approach. Capture the groups, parse them as integers, check the range.
4. Date matching (YYYY-MM-DD)
\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\d|3[01])
\d{4} matches a four-digit year. -(0[1-9]|1[0-2]) matches a hyphen followed by a month: either 01-09 or 10-12. The alternation with the pipe character handles both cases. -(0[1-9]|[12]\d|3[01]) matches the day: 01-09, 10-29, or 30-31.
This validates the format and rejects obviously wrong months (13, 00) and days (32, 00). It does not validate whether February 30th is a real date, or whether a given month has 30 or 31 days. Again, regex for format, code for semantics.
ISO 8601 dates are the most common format in APIs and databases, so this pattern gets a lot of use in log parsing and data validation pipelines.
5. Password strength (minimum requirements)
^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)(?=.*[!@#$%^&*]).{8,}$
This one uses lookaheads, which are the most conceptually difficult part of regex. A lookahead (?=...) asserts that what follows matches the pattern inside, but it doesn't consume characters. It's a check, not a match.
(?=.*[A-Z]) asserts that somewhere in the string, there's at least one uppercase letter. (?=.*[a-z]) asserts at least one lowercase letter. (?=.*\d) asserts at least one digit. (?=.*[!@#$%^&*]) asserts at least one special character from the specified set.
.{8,} then matches the actual string, requiring at least 8 characters total. The ^ and $ anchors ensure the entire string is evaluated.
Each lookahead independently scans the whole string. They all must pass before the engine moves on to .{8,}. This is why lookaheads are powerful for validation -- they let you assert multiple conditions without caring about order.
When NOT to use regex
Regex is terrible for parsing nested structures. You cannot reliably parse HTML with regular expressions because HTML is a context-free language and regex operates on regular languages. If you find yourself writing a regex for nested parentheses or balanced brackets, stop. Use a proper parser -- DOMParser for HTML, JSON.parse for JSON, or a library like acorn for JavaScript source code.
Regex is also overkill when string.includes(), startsWith(), or endsWith() does the job.
Testing patterns quickly
The feedback loop matters. Writing regex in a code editor, running the program, checking the output, and iterating is painfully slow. I keep a regex tester bookmarked for this reason — I test patterns live against sample inputs before putting them in code. I built one at zovo.one/free-tools/regex-tester that highlights matches in real time and breaks down the pattern into its components.
Memorize these five patterns. More importantly, understand why each character is there. Once you can read regex instead of just copying it, you stop being afraid of it, and you start writing patterns that actually work in edge cases instead of just in your test file.
I'm Michael Lip. I build free developer tools at zovo.one. 350+ tools, all private, all free.
Top comments (0)