You have a list of addresses. Maybe they came from a signup form, a partner CRM, an enrichment run, or last quarter's webinar. You want to know which ones are real before you put them on a campaign and torch your sender reputation.
You search "email verification" and find a hundred services with landing pages claiming 99% accuracy. You install the obvious package, run it on your list, and 95% come back "valid." You send. A quarter of them bounce, and the major inbox providers start flagging your domain.
What happened? Some combination of: syntax-valid addresses that don't exist, mail servers that lie, catch-all domains, greylisting, and anti-probe behavior. "This is a real mailbox" is much harder to prove than it looks. Here's how the protocol actually works, where it breaks, and what a serious verifier has to do about it.
Syntax checks filter the obvious garbage and nothing else
Run a regex against [email protected] and you'll catch the obvious malformed strings. Use a real RFC 5322 parser and you'll catch a few more (john..doe@example.com, leading whitespace, addresses with control characters).
import re
EMAIL_RE = re.compile(r"^[^@\s]+@[^@\s]+\.[^@\s]+$")
EMAIL_RE.match("definitelynotreal@gmail.com") # matches
definitelynotreal@gmail.com passes every syntax check ever written. It does not exist. Syntax validation tells you whether a string could be an address; it tells you nothing about whether it is one.
You still want this as a first pass — there's no point burning network calls on not an email. But anyone who ships a regex as their email verifier is solving a different problem than they think they are.
DNS proves the domain accepts mail (and nothing else)
dig +short MX example.com
# 0 mail.example.com.
If a domain has no MX record (and no fallback A record per RFC 5321), it doesn't accept mail at all. Marking the address invalid is correct. This catches typo-domains, expired domains, and domains that were never set up for email.
But gmail.com has MX records. So does every Fortune 500. So does every catch-all spam trap. MX-exists tells you the domain is in the mail business, not that the address you care about exists on it.
Talking SMTP to the server
This is where every "real" verifier lives. You connect to the destination MX server, walk through the SMTP handshake, and stop one step short of actually sending the email:
$ openssl s_client -starttls smtp -connect gmail-smtp-in.l.google.com:25 -crlf
220 mx.google.com ESMTP ready
EHLO verifier.example.com
250-mx.google.com at your service
...
MAIL FROM:<probe@verifier.example.com>
250 2.1.0 OK
RCPT TO:<linus@gmail.com>
550 5.1.1 The email account that you tried to reach does not exist
QUIT
The signal is in the RCPT TO response:
-
250→ server says it would accept mail for this address -
550/551→ mailbox doesn't exist -
4xx→ temporary failure, try again later - nothing, eventually a timeout → ¯\_(ツ)_/¯
A naive verifier writes this loop in twenty lines and ships. It is wrong roughly as often as it is right, for reasons that have nothing to do with the code.
Why the SMTP handshake lies
Modern mail servers know that automated probers exist, and they don't make life easy for them.
Catch-all domains. Many companies configure their inbound to accept any address at their domain and route unmatched ones to a default mailbox or a black hole. Probe xq8z29zz@somecompany.com and you get 250 OK. Probe ceo@somecompany.com and you get 250 OK. They both look identical from the outside; one of them is the CEO and the other is gibberish. If the domain is catch-all, RCPT TO is meaningless.
You detect this by sending a deliberate-fake probe first and inferring catch-all from a positive response — something like definitely-does-not-exist-12345@domain.com. Any verifier that doesn't do this catch-all check is silently classifying random nonsense as valid mail on every catch-all domain it sees.
Greylisting. A receiver returns 451 try again later on first contact from an unknown sender. Legit MTAs queue and retry minutes or hours later. Probers usually don't. Naive verifiers mark these as failed; the addresses are fine.
Anti-probe behavior on the big providers. Gmail, Outlook, Office 365, Proofpoint, Mimecast, and several other large inbound systems either always return 250 regardless of the mailbox, or always return 4xx to anything that smells like a verifier. A handshake against gmail.com does not tell you whether the address exists on Gmail; it tells you that Gmail received a connection. Any verifier that reports a clean "valid" on a Gmail address from a single RCPT TO probe is making it up. Serious tools maintain a list of these providers and fall back to other signals when they hit one.
Outbound IP reputation. Even when the receiver is willing to give you a real answer, it'll only do it if your sending IP doesn't look hostile. If you've been hammering a domain — or, more likely, if the IP block you happen to be on has been hammering it — you'll be tarpitted, throttled, or refused at HELO. Running verification from a residential IP, an EC2 box without rDNS, or a VPN basically doesn't work.
Tarpitting. Some servers respond to RCPT TO slowly on purpose — 30 seconds per probe, deliberately — to make automated verification economically unviable. Your verifier needs to handle long timeouts on some domains without falling over on the rest.
Some servers only check on DATA. They accept any RCPT TO and only bounce after you've sent the body. Verification-without-sending is impossible against those servers; the most you can do is flag them.
Port 25 is often blocked outbound. Most cloud providers and residential ISPs block outbound 25 to limit spam. A naive verifier from your laptop or your default EC2 instance will silently fail the SMTP step on every domain. Real verifiers connect from infrastructure with port 25 open, and fall back to 587/465 with STARTTLS when needed.
What "good" verification looks like
A serious verifier does, roughly:
- Syntax-check the address.
- Suggest typo corrections for common domains (
gmial.com→gmail.com). - Look up MX (and A as fallback).
- Check disposable-domain lists (Mailinator, 10MinuteMail, Guerrilla Mail, and a few hundred others).
- Identify the receiver: Gmail, Outlook/Office 365, Yahoo, ProtonMail, Proofpoint, Mimecast, an in-house Postfix, etc. The provider determines which signals are trustworthy.
- For receivers known to give honest
RCPT TOresponses, probe — from a warmed-up IP with valid rDNS, sensible HELO, and conservative pacing per destination domain. - Detect catch-all by probing a fake address first.
- Honor
4xxby retrying with backoff over hours, not seconds. - For receivers known to lie, fall back to historical signals: has this address shown up in our previous deliveries? Has it bounced before?
- Classify each result honestly. "Valid / invalid" is the wrong vocabulary; you need at least three buckets — deliverable, undeliverable, and risky (catch-all, greylisting timeout, large-provider unknown). Sending into the risky bucket is a business decision, not a technical one.
The infrastructure question is harder than the protocol question. You need a pool of warmed sending IPs, a monitoring system for IPs starting to bounce, per-receiver rate limiters, a queue that respects greylist retry intervals, and a database of which providers behave how. Get any of that wrong and the answers you get are noise.
Things people get burned by
- Treating "catch-all" as "valid." Catch-all means I don't know whether this mailbox exists, not yes, it does. Sending into a catch-all domain blindly is one of the cleanest ways to end up flagged for spam, because spam traps love catch-alls.
-
Trusting role-based addresses.
info@,support@,sales@,noreply@are almost always deliverable. They're almost never the right address for cold outreach, and many ESPs treat marketing mail to role addresses as a strong spam signal. - Running checks at send time. Don't verify in your signup form's request handler. The verification call can take seconds, sometimes tens of seconds. Verify async, or against a cached lookup, never inline.
- Bulk-probing without pacing. A loop that fires 1,000 RCPT TO calls at the same Google Workspace tenant gets that tenant's whole verification surface to lock you out for the rest of the day. Per-domain pacing is mandatory.
-
Ignoring the score. A binary "valid: true" hides a lot. An address that passes syntax + MX + disposable but where the SMTP step had to be assumed (problematic provider, timeout, IP-blocked) is a different animal from one that got a clean
250from a non-catch-all server. Anything that doesn't expose its confidence is hiding bad news.
The shortcut
This is what /api/v1/email_verifications does at PeopleDB. The protocol piece, the receiver classification, the catch-all detection, the IP reputation, the disposable-domain list, and the typo suggestions all run on the server side; you get one HTTP call:
curl "https://peopledb.co/api/v1/email_verifications?email_address=somebody@example.com" \
-H "Authorization: Bearer $PEOPLEDB_TOKEN"
{
"email": "somebody@example.com",
"valid": true,
"classification": "valid",
"score": 95,
"checks": {
"syntax": true,
"mx_record": true,
"smtp_deliverable": true,
"disposable": false,
"typo_suggestion": null,
"accepts_any_email": false
},
"warnings": [],
"errors": []
}
The classification has three states: valid, risky, invalid. Risky is the honest one — catch-all domains, large providers that won't tell you the truth over SMTP, and disposable addresses all surface there. The score is 0–100 so you can pick your own threshold per use case (a stricter cutoff for cold outreach than for password reset).
If you've got a list, run it through this before you put it on a sender. SMTP is a thirty-year-old protocol layered with anti-abuse, and the difference between a verifier that knows that and one that doesn't is the difference between a clean send and a deliverability incident.
Top comments (0)