NexGenData

Posted on Jul 2 • Originally published at thenextgennexus.com

Email List Verification and Deliverability: The Complete Technical Guide

#api #ai #webscraping #opensource

Why Email Verification Matters More Than You Think

A 2% bounce rate might not sound bad — until you realize that email service providers (Gmail, Outlook, Yahoo) use bounce rates as a primary signal for sender reputation. Cross the 5% threshold and your entire domain gets flagged. Cross 10% and you're landing in spam for everyone, including legitimate subscribers.

Email verification isn't just about cleaning your list — it's about protecting your sender reputation, which directly impacts revenue. Companies with poor sender reputation see 20-40% lower open rates across all campaigns.

The Email Verification Stack

A complete verification pipeline checks emails at multiple levels, from simple format validation to sophisticated deliverability prediction.

Level 1: Syntax Validation

The simplest check: does the email address follow RFC 5322 format? This catches typos, missing @ symbols, invalid characters, and malformed domains. Sounds basic, but 3-5% of user-submitted emails fail syntax validation.


    import re

    def validate_email_syntax(email):
        """RFC 5322 compliant email validation."""
        pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
        if not re.match(pattern, email):
            return False, "Invalid format"

        local, domain = email.rsplit('@', 1)
        if len(local) > 64:
            return False, "Local part too long"
        if len(domain) > 253:
            return False, "Domain too long"
        if '..' in email:
            return False, "Consecutive dots"

        return True, "Valid syntax"

Level 2: DNS and MX Record Verification

Check that the domain exists and has MX (Mail Exchange) records configured. A domain without MX records can't receive email — period. This catches expired domains, typo domains (gmal.com instead of gmail.com), and fake domains.

Use Python's dns.resolver module to query MX records. Cache results aggressively — MX records rarely change, and DNS lookups add latency at scale.

Level 3: SMTP Verification

The most reliable check: connect to the mail server and ask if the address exists. The SMTP RCPT TO command triggers a response indicating whether the mailbox is valid. This catches non-existent mailboxes, disabled accounts, and full inboxes.

Important caveat: many mail servers implement catch-all policies (accepting all addresses regardless of existence) or greylisting (temporarily rejecting unknown senders). Your verification logic needs to handle both cases.

Level 4: Catch-All Detection

Catch-all domains accept email for any address, even non-existent ones. This makes SMTP verification useless for these domains — every address returns "valid." Detect catch-all domains by testing a random, definitely-fake address. If the server accepts it, the domain is catch-all.

For catch-all domains, you can't definitively verify individual addresses. Flag them separately and apply additional heuristics: common name patterns (firstname.lastname), role addresses (info@, sales@), and historical engagement data.

Level 5: Disposable Email Detection

Disposable email services (Guerrilla Mail, Temp Mail, 10MinuteMail) provide throwaway addresses. These are useless for marketing — the addresses expire within hours. Maintain a blocklist of known disposable email domains (2000+ and growing) and reject them at signup.

Level 6: Risk Scoring

Combine all signals into a deliverability risk score: syntax check (pass/fail), MX records (valid/invalid/missing), SMTP response (deliverable/undeliverable/unknown), catch-all status (yes/no), disposable domain (yes/no), role address (info@, admin@ — lower engagement), and free email provider (gmail, yahoo — vs corporate domain).

Score each email from 0-100 and set thresholds for your use case. For marketing campaigns, reject anything below 70. For transactional email, you can go lower since the user explicitly provided their address.

Verification at Scale

Verifying a list of 100K emails requires careful architecture. SMTP verification is the bottleneck — each check requires a TCP connection to the mail server, and aggressive verification triggers rate limiting or IP blocks.

Best practices for scale: use connection pooling (reuse SMTP connections for same-domain emails), implement exponential backoff on rate limits, rotate source IPs for SMTP connections, process domains in batches (all emails for gmail.com together), and set reasonable timeouts (5-10 seconds per SMTP check).

Our Email Validator actor on Apify handles all of this automatically — it runs all 6 verification levels, processes lists in parallel with proper rate limiting, and outputs a scored CSV ready for import into your email platform.

Maintaining List Hygiene

Verification isn't a one-time event. Email addresses decay at 2-3% per month — people change jobs, abandon accounts, and switch providers. Set up a recurring verification schedule: verify new signups in real-time (API integration at the form level), re-verify your full list quarterly, remove hard bounces immediately after every campaign, and suppress addresses that haven't engaged in 6+ months.

API vs DIY: Cost Comparison

Commercial email verification APIs (ZeroBounce, NeverBounce, BriteVerify) charge $0.003-0.01 per verification. For a 100K list, that's $300-1000 per verification pass. Building your own pipeline costs more upfront (development time) but runs at roughly $0.001 per verification in infrastructure costs.

The middle ground: use a pre-built actor like our Email Validator that handles the infrastructure complexity at Apify's pay-per-use pricing. You get the cost efficiency of DIY without the maintenance burden.

Tools Referenced

Email Validator — 6-level email verification with risk scoring
Lead Gen AI Agent — Automated lead discovery + email enrichment
Google Maps Scraper — Business email extraction from Google Maps
Full nexgendata toolkit — 50+ data collection actors

Building a lead generation pipeline? Our Local Business Leads Data Pack on Gumroad includes pre-verified email lists for 50+ industries, updated weekly.

About the Author

The Next Gen Nexus covers AI agents, automation, and web data — practical guides for developers, analysts, and businesses working with data at scale.

DEV Community