Email Validation: Why Your Regex Is Wrong and What to Do Instead

#javascript #webdev #beginners #programming

The email validation regex that circulates in Stack Overflow answers and blog posts rejects perfectly valid email addresses and accepts some invalid ones. The RFC 5321 specification for email addresses is far more permissive than most developers realize, and trying to validate with a regex is a losing battle.

Here is what actually works.

Valid email addresses that your regex probably rejects

All of the following are valid per the RFC:

user+tag@example.com (plus addressing, used by Gmail for filtering)
"user name"@example.com (quoted local part with spaces)
user@[192.168.1.1] (IP address literal)
very.unusual."@".unusual.com@example.com (quoted strings with special characters)
x@example.com (single character local part)
user@subdomain.subdomain.example.com (multiple subdomains)

Most validation regexes reject several of these. The plus sign causes failures in roughly 20% of email validation implementations, which breaks Gmail's plus addressing feature.

The regex that tries to be complete

The "official" regex for RFC 5322 is over 6,000 characters long. It is technically correct but practically unusable in production code. It is unmaintainable, untestable, and confusing to anyone who reads it.

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])

Nobody should be using this in production.

The pragmatic approach

For most applications, a three-step validation is sufficient and far more reliable:

Step 1: Basic format check (catches typos, not RFC compliance):

function basicEmailCheck(email) {
  return /^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(email);
}

This checks for: at least one character before @, at least one character between @ and the last dot, and at least one character after the last dot. No spaces. It is intentionally permissive.

Step 2: Syntax validation using a proper parser (not a regex):

function isValidEmail(email) {
  if (email.length > 254) return false;  // RFC 5321 limit
  const [local, domain] = email.split('@');
  if (!local || !domain) return false;
  if (local.length > 64) return false;  // RFC 5321 limit
  if (domain.length > 253) return false;
  // Check domain has valid structure
  if (!/^[a-zA-Z0-9.-]+$/.test(domain)) return false;
  if (domain.startsWith('-') || domain.endsWith('-')) return false;
  if (!domain.includes('.')) return false;
  return true;
}

Step 3: Verification by actually sending an email. This is the only way to know if an email address is deliverable. A confirmation email with a link or code proves both validity and ownership.

MX record checking

An intermediate step between syntax validation and sending a confirmation email is checking that the domain has MX records (mail exchange servers). A domain without MX records cannot receive email.

This check eliminates typos in the domain portion: "gmail.con" has no MX records, catching the typo that syntax validation would miss.

Common validation mistakes

Rejecting plus signs (breaks Gmail plus addressing)
Requiring TLDs from a hardcoded list (new TLDs are created regularly)
Rejecting long TLDs (.photography, .construction exist)
Case-sensitive comparison (the local part is technically case-sensitive per RFC, but in practice all major providers treat it as case-insensitive)
Not trimming whitespace before validation

The tool

For quick email validation with format checking, MX record verification, and common typo detection, I built an email validator that goes beyond regex to provide practical validation results.

I'm Michael Lip. I build free developer tools at zovo.one. 500+ tools, all private, all free.