APIVerve

Posted on Mar 4 • Edited on Mar 16 • Originally published at blog.apiverve.com

Email Validation Best Practices: Beyond Simple Regex

#emailvalidation #formvalidation #userexperience #dataquality

A customer types their perfectly valid email address into your signup form. Your validation rejects it. They don't contact support—they just leave. You never know it happened.

This is more common than most teams realize. Overly strict email validation quietly kills conversions while giving the false comfort that your data is "clean." Meanwhile, the truly fake addresses—typos, disposable inboxes, nonexistent domains—sail right through a basic regex check.

Getting validation right means catching the junk without blocking the real people. That balance is harder than it sounds.

The Surprisingly Permissive Email Spec

The email address specification (defined in RFC 5321 and RFC 5322) is more permissive than most people realize. Valid email addresses can include:

Dots in various positions - Both first.last@example.com and f.i.r.s.t@example.com are valid. Dots can appear almost anywhere in the local part (the part before the @).

Plus signs for sub-addressing - user+newsletter@gmail.com is valid and widely used. Gmail and other providers use the plus sign to create aliases that deliver to the main address. Many users employ this for filtering and tracking which services share their email.

Apostrophes and other special characters - o'brien@company.ie is perfectly valid. Irish names, French names, and others frequently include apostrophes. Hyphens, underscores, and various other characters are also permitted.

Numeric local parts - 12345@example.com is valid. Some organizations use numeric identifiers as email addresses.

Long top-level domains - Modern TLDs go far beyond .com and .org. Addresses like user@company.photography or contact@brand.engineering are valid and increasingly common.

International characters - The email specification now supports internationalized email addresses with non-ASCII characters. 用户@例え.jp is a valid email address format.

IP address domains - Technically, user@[192.168.1.1] is valid, though rarely used in practice.

Quoted local parts - "john doe"@example.com with spaces inside quotes is valid per the specification.

Every restriction you add to email validation potentially rejects someone's real, working email address. That's worth sitting with for a moment.

Why Simple Regex Fails

The internet is full of email validation regex patterns, ranging from simple to absurdly complex. Most of them cause problems.

Simple patterns reject valid addresses. A pattern that only allows alphanumeric characters, dots, and @ symbols will reject plus signs, apostrophes, and other valid characters. Users with these addresses can't sign up.

Complex patterns are unmaintainable. The regex needed to fully match the email specification is hundreds of characters long and virtually impossible to debug. Even then, it only validates format—not whether the address actually works.

All regex only validates format. Whether simple or complex, regex can only answer "does this string match a pattern?" It cannot answer "does this email address receive mail?" which is usually the question that actually matters.

Format validation catches obviously malformed input—missing @ symbols, empty strings, addresses that are clearly not emails. For anything beyond that, format validation alone is insufficient.

Deliverability vs. Format Validity

Understanding the difference between format validity and deliverability is crucial for email validation.

Format validity asks: Does this string conform to the email address specification? This is what regex checks.

Deliverability asks: If I send an email to this address, will it arrive? This is what usually matters.

An email address can be perfectly formatted and completely undeliverable:

Typos in the domain - user@gmial.com passes format validation. Gmail doesn't own gmial.com. The email will never arrive.

Non-existent domains - user@thisdoesnotexist12345.com looks like an email address. But if the domain doesn't exist or has no mail servers configured, no email can be delivered.

Non-existent mailboxes - randomstring8472@gmail.com has correct format and a valid domain. But if no one has registered that Gmail account, emails bounce.

Disabled or full mailboxes - The address once worked but the account was closed, or the mailbox is full and rejecting new messages.

Spam traps - Some email addresses exist specifically to catch spammers. Sending to them damages your reputation.

Format validation catches maybe 5% of email problems. Just 5%. Deliverability validation catches the rest.

A proper email validation API returns comprehensive results:

const response = await fetch(
  'https://api.apiverve.com/v1/emailvalidator?email=user@example.com',
  { headers: { 'x-api-key': 'YOUR_API_KEY' } }
);
const { data } = await response.json();

// Check deliverability, not just format
if (data.isValid && data.isMxValid && data.isSmtpValid) {
  // Email is likely deliverable
}

// data also includes:
// - isFreeEmail: true for Gmail, Yahoo, etc.
// - isCompanyEmail: true for business domains
// - hasTypo: true if domain looks like a typo (gmial.com)

This tells you not just whether the format is correct, but whether the domain has mail servers, whether those servers accept connections, and whether it's a business or free email provider.

Checking Email Deliverability

Real email validation goes beyond format checking to verify deliverability through multiple steps:

DNS lookup - Does the domain exist? Every email domain must have DNS records. If the domain doesn't resolve, no email can be delivered.

MX record check - Does the domain have mail servers configured? The MX (Mail Exchanger) records specify which servers handle email for a domain. No MX records usually means no email capability.

SMTP verification - Can you connect to the mail server? Does it accept mail for this address? Some mail servers will tell you whether a specific mailbox exists. Others refuse to answer (to prevent address enumeration attacks).

Reputation assessment - Is this domain associated with spam or fraud? Are deliverability rates historically low?

These checks require network requests and can't be done with client-side regex. They typically take 1-3 seconds per address, which is acceptable for form submission but too slow for real-time validation on every keystroke.

Disposable Email Detection

Disposable email services provide temporary addresses that expire after minutes or hours. Popular services include Guerrilla Mail, 10 Minute Mail, Temp Mail, Mailinator, and hundreds of others.

People use disposable emails for various reasons:

Avoiding spam - Signing up for services that might sell their email or send unwanted messages.

One-time access - Downloading content or accessing gated material without providing a real address.

Testing - Developers testing email flows without cluttering their real inbox.

Abuse - Creating multiple accounts to exploit free trials, accumulate referral bonuses, evade bans, or engage in fraud.

Whether to block disposable emails depends on your use case:

Trial signups - If you're offering a free trial and want a relationship with the user, blocking disposables makes sense. Users who won't provide a real email are unlikely to convert.

Newsletter signups - Blocking might cost you some legitimate subscribers who are just cautious about spam.

Account creation - For platforms where account value builds over time, requiring a permanent email address is reasonable.

One-time downloads - Blocking disposables might be unnecessary friction. The transaction is complete; you may not need ongoing communication.

Disposable email detection requires maintaining an updated database of disposable domains. New services appear constantly, so static lists quickly become outdated. API-based detection stays current.

Handling Common Typos

Beyond validation, detecting and suggesting corrections for common typos improves user experience and data quality.

The most frequently mistyped email domains include:

gmial.com instead of gmail.com
gmal.com instead of gmail.com
gnail.com instead of gmail.com
hotmial.com instead of hotmail.com
yaho.com instead of yahoo.com
outlok.com instead of outlook.com
.con instead of .com

When you detect a likely typo, showing "Did you mean gmail.com?" converts a future bounce into a successful signup. The user appreciates the help, and you get a working email address.

Typo detection should suggest, not auto-correct. Users might have legitimate addresses at unusual domains. Let them confirm the correction rather than silently changing what they typed.

Validation Timing and User Experience

When and how you validate affects user experience as much as what you validate.

Don't validate on every keystroke. Red error messages appearing while the user is still typing are distracting and frustrating. Wait until they've finished—on blur (when they click away from the field) or on form submission.

Validate asynchronously when possible. Deep validation takes time. Start the validation when the user finishes typing the email, so results are ready by the time they submit the form.

Provide specific, actionable feedback. "Invalid email" tells users nothing. "Please include an @ symbol" identifies the problem. "Did you mean gmail.com?" solves it.

Distinguish between definite problems and warnings. A missing @ symbol is definitely wrong. A new domain you can't verify might be fine—warn but allow submission.

Remember that you might be wrong. If your validation rejects o'brien@company.ie, you're the one with the bug, not the user. Build in escape hatches for edge cases.

Different Validation for Different Contexts

Not all email collection points need the same validation rigor.

Account signup requires thorough validation. You need to send activation emails, password resets, and account notifications. An invalid email breaks the entire user experience. Full deliverability checking is justified.

Checkout and transactions also warrant careful validation. Order confirmations, shipping notifications, and receipts need to reach the customer. The transaction's value justifies the validation overhead.

Newsletter subscription can use lighter validation. A bad email just means one undelivered newsletter. The cost of false rejection (losing a subscriber) may exceed the cost of false acceptance (one bounce).

Contact forms often need only basic validation. Honestly, full deliverability checking here is overkill. You're going to read and respond manually anyway. As long as the address looks plausible, you can handle problems individually.

Profile updates should validate the new address thoroughly before replacing the old one. Consider requiring confirmation of both addresses—sending a verification to the new one and a notification to the old one.

Role-Based and Group Addresses

Some email addresses are technically valid but serve different purposes than individual mailboxes:

Role addresses like info@, support@, admin@, sales@, webmaster@ go to teams or rotate among staff. They're valid for business communication but may not be ideal for individual user accounts.

Group addresses deliver to multiple recipients. Perfectly valid, but again not individual accounts.

Auto-responders reply automatically to incoming messages. Sending transactional emails to these addresses generates noise.

Whether these matter depends on context. For a B2B service, role addresses are normal and expected. For a consumer app expecting individual users, they might warrant gentle discouragement (not hard blocking).

Free vs. Business Email Providers

Email validation can distinguish between free providers (Gmail, Yahoo, Outlook, etc.) and business domains (company-specific addresses).

Free email addresses are perfectly legitimate for consumers. They're also slightly higher risk for fraud, since creating new accounts is easy.

Business email addresses suggest professional context and are slightly harder to create in bulk. For B2B applications, business emails may indicate more serious prospects.

This distinction is useful for segmentation and prioritization, not for blocking. Plenty of legitimate business users use personal email addresses, especially for initial inquiries or small businesses.

Handling Validation Failures Gracefully

When validation fails, how you communicate matters.

Be specific about the problem. "We couldn't verify this email address exists. Please check for typos." is better than "Invalid email."

Offer suggestions when possible. "Did you mean @gmail.com?" helps users fix the actual problem.

Allow override with acknowledgment. "We couldn't verify this address. If you're sure it's correct, click Continue." respects user agency while flagging potential issues.

Don't lecture or blame. "You entered an invalid email" sounds accusatory. "Please check your email address" is neutral.

Make errors visible but not alarming. Red text is traditional for errors, but a subtle color change is often sufficient. Save the bold red for critical problems.

The Business Case for Good Validation

Poor email validation has measurable costs:

Bounced emails damage sender reputation. High bounce rates lead to deliverability problems where even valid emails land in spam folders.

Invalid data pollutes your database. Fake addresses skew metrics and waste resources on campaigns that can't reach anyone.

Lost customers leave when signup fails. A user whose valid email gets rejected will often abandon rather than contact support.

Support burden increases when users can't sign up or receive emails. These tickets could be prevented by better validation.

Good validation pays for itself in cleaner data, better deliverability, fewer support tickets, and more successful user signups.

Putting It All Together

Effective email validation is layered:

First layer: Basic format check. Is there an @ symbol? Is there content on both sides? Does the domain have a dot? This catches obvious mistakes instantly with no external calls.

Second layer: Deliverability verification. Does the domain exist? Does it have mail servers? Can you reach them? This catches typos, fake domains, and non-functional addresses.

Third layer: Quality assessment. Is it disposable? Is it a role address? Is it from a free provider? This provides context for how to treat the address.

Fourth layer: User experience. Can you suggest corrections? Can you explain problems clearly? Can you allow edge cases while flagging concerns?

Each layer builds on the previous. Skip a layer, and you miss a category of problems. Implement all layers thoughtfully, and you get clean data without frustrating legitimate users.

Validate email addresses comprehensively with the Email Validator API. Detect disposable addresses with the Disposable Email Checker API. Build signup flows that capture real email addresses without rejecting real customers.

Originally published at APIVerve Blog

DEV Community