Vivek Singh

Posted on May 9

How I built SMTP email verification at scale for findmemail.io

#architecture #backend #saas #showdev

When I started building findmemail.io, I made one architectural decision early: never return an email we haven't SMTP-verified at request time. No pattern guessing, no domain-only validation. This post is about how that constraint shaped the system.

Why SMTP verification at request time

Most B2B email finders do verification asynchronously, in batches. The result is a database where some entries are verified, some are stale, some never were. Users don't know which is which.

This kills cold email deliverability. A 5% bounce rate flags your sender domain. A 30% bounce rate (common with mixed-quality data) burns it.

The constraint at findmemail.io: every email returned to the user has been SMTP-probed in the last 7 days. If it's older or the probe failed, we don't return it.

The SMTP probe

For each candidate email, we do:

MX lookup — find the recipient's mail server
TCP connect — open port 25 to the MX
HELO/EHLO — introduce ourselves
MAIL FROM — declare a throwaway sender
RCPT TO — ask the server if it'll accept the recipient
QUIT — close cleanly, never DATA, never deliver

The key step is RCPT TO. The server returns:

250 — accept (email exists, deliverable)
550 — reject (mailbox doesn't exist)
421/451 — temporary failure (greylist or rate limit)

We classify each result and only "250" emails get returned to the user.

Anti-anti-bot: how mail servers fight you

This sounds straightforward but isn't, because mail servers actively fight verification probes. Common defenses:

1. Catch-all domains. Server returns 250 for any address. literally.anything@catchall-domain.com "accepts". Useless.

Detection: probe a known-bad address (asdf-not-a-real-user-12345@<domain>). If that also returns 250, the domain is catch-all. We tag it and don't return individual emails for the domain.

2. Greylisting. Server returns 451 the first time, expects a retry. Real mail servers retry; one-shot probes don't.

Mitigation: retry the probe with exponential backoff up to 3 times across 1 hour. Track which domains require this.

3. Rate limiting per IP. Send too many RCPT TO requests, you get throttled or blacklisted.

Mitigation: rotate sender IPs, never probe the same domain more than 1x/min, distribute load.

4. Anti-Spoofing. Server checks SPF/DKIM on your sender domain before responding to RCPT TO.

Mitigation: maintain a real, warmed-up sending domain with proper SPF/DKIM/DMARC. Use a different domain than your product domain.

5. Honeypot / tarpit. Server responds slowly to waste your time. Or returns 250 for every address to pollute your dataset.

Detection: time the response. >5 sec = suspicious. Cross-check by probing 2-3 known-good emails on the domain — if they all "accept" but bounce when actually sent, flag the domain.

The cache strategy

Probing on every request would be slow and rude (you'd hammer the same MX). So:

First-time probe → cache result for 7 days
Re-probe on cache miss
Re-probe sooner if the user reports a bounce (feedback loop)
Background re-validation for top 1000 most-queried emails daily

This keeps p50 response time under 800ms while maintaining freshness.

Failure modes I had to learn the hard way

Major email providers (Google, Microsoft, AOL) often refuse RCPT TO probes entirely. They return 252 ("can't verify, try sending") for everything. That's because so many spammers used SMTP probing to harvest valid addresses that the providers gave up on telling you.

For these domains, we fall back to:

Pattern matching against historical sender data
LinkedIn-based name verification
Domain-pattern enrichment (if other emails at the company verified, the same pattern likely works)

We label these as "deliverable, lower confidence" in the API response. The user sees the difference.

What this enables

Because every email is verified at the call site, our customers get a bounce rate <2%. Compare that to the typical Apollo/ZoomInfo ~5-10% in industry reports.

This was the design constraint that shaped everything else about findmemail.io — the database is smaller than Apollo (32k+ companies vs millions) but the per-email quality is much higher.

Try it

Free tier on findmemail.io is 50 credits, no card required. The API returns deliverability confidence per email so you can decide which to send.

If you're shipping a B2B email finder yourself, feel free to ask architecture questions in the comments.

DEV Community