I built an email verification API from scratch

#node #javascript #webdev #opensource

Most email verification services are a black box. You send them an address, they send back a result, and you have absolutely no idea what happened in between — or what they did with the data.

I wanted to understand what "real" email verification actually looks like under the hood, so I built one from scratch in Node.js. No paid third-party APIs. No external dependencies beyond standard DNS and TCP. Open source so anyone can read exactly what it does with their data.

Here's how it works.

The Pipeline
Every address goes through up to 7 checks in sequence. The pipeline is fail-fast — if an early check fails definitively, later ones are skipped.

Syntax Validation Not just a basic regex. Full RFC 5322 compliance — checks local part length, quoted strings, valid special characters, domain format, and TLD presence.

// src/services/syntaxChecker.js
const RFC5322 = /^[a-zA-Z0-9.!#$%&'+/=?^_`{|}~-]+@a-zA-Z0-9?(?:.a-zA-Z0-9?)$/;

If this fails, we stop immediately — no point doing a DNS lookup on not_an_email.

MX Record Lookup Checks whether the domain actually has mail servers configured. This catches things that syntax validation never would:

user@gmail.con — syntactically valid, no MX records
user@thisdomaindoesnotexist.xyz — looks fine, undeliverable
Defunct company domains that still resolve but stopped accepting mail
Results are cached in memory for 10 minutes (configurable via MX_CACHE_TTL_MS) to avoid hammering DNS on repeated lookups for the same domain.

const cached = cache.get(domain);
if (cached && cached.expiresAt > Date.now()) {
return cached.result;
}

Disposable Domain Detection
Checked against a blocklist of 5,361 known throwaway providers — Mailinator, TempMail, Guerrilla Mail, and thousands of others. The list is auto-generated via a script and can be refreshed with npm run download-blocklist.
Role-Based Address Detection
35 patterns that indicate a shared inbox rather than a real person:

const ROLE_ADDRESSES = new Set([
'admin', 'noreply', 'no-reply', 'support', 'info',
'help', 'contact', 'sales', 'billing', 'abuse',
// ... 25 more
]);

Useful for signup flows and lead generation — sales@company.com is rarely someone's personal inbox.

Typo Detection Levenshtein distance comparison against 30 major providers. Catches the typos that users actually make:

gmial.com → gmail.com
hotmial.com → hotmail.com
outloook.com → outlook.com
Threshold is set to 2 — close enough to catch typos, far enough to avoid false positives.

Catch-All Detection Some domains are configured to accept every incoming address regardless of whether the mailbox exists. anything@thatdomain.com gets through.

Detection works by probing a randomly generated address (e.g. _verify_abc123_nonexistent@domain.com). If the server accepts it, the domain is catch-all.

SMTP Mailbox Probe The most interesting part. A raw TCP connection to port 25 of the MX host, performing the minimum possible handshake:

→ EHLO verify.local
← 250 OK
→ MAIL FROM:verify@verify.local
← 250 OK
→ RCPT TO:user@example.com
← 250 OK (exists) or 550 (doesn't exist)
→ QUIT

No message is ever sent. No DATA command. The connection is closed immediately after RCPT TO.

Response codes mapped to results:

250, 251 → mailbox exists
550, 551, 552, 553, 554 → mailbox does not exist
421, 450, 451, 452 → temporary / unknown
Honest caveat: Most cloud hosting providers block outbound port 25. Railway (where this is deployed) is no exception, so this check typically returns "unknown". The address isn't marked invalid — it takes a -15 point penalty instead of -50. The other six checks still run fully and provide strong signal.

The Scoring System
Every result includes a 0–100 deliverability score:

Condition Effect
Base score 100
Disposable domain -40
Mailbox not found -50
Mailbox unknown -15
Catch-all domain -10
Role-based address -10
No MX / invalid syntax score → 0
The Privacy Question
First thing someone asked when I posted about this: "How do we know you're not harvesting addresses and selling them to spammers?"

The honest answer: you shouldn't just trust me. The code is open source for exactly this reason. The pipeline is stateless — addresses are never written to disk, a database, or any external service. Read src/services/verifyEmail.js and verify it yourself.

A Docker image for self-hosting is on the roadmap, which eliminates the trust question entirely.

The API
Single address:

GET /api/v1/verify?email=user@example.com

Bulk (up to 50):

POST /api/v1/verify/bulk
{ "emails": ["user@example.com", "test@gmail.com"] }

Response:

{
"email": "user@gmail.com",
"valid": true,
"score": 85,
"reason": "smtp_timeout",
"suggestion": null,
"checks": {
"syntax": true,
"mx_found": true,
"mx_host": "gmail-smtp-in.l.google.com",
"disposable": false,
"role_based": false,
"catch_all": false,
"smtp_connectable": false,
"mailbox_exists": "unknown"
},
"processing_time_ms": 538
}