Need to find email addresses on a website? Here is a clean approach using regex patterns and DNS validation.
The Approach
Most email extraction tools just use a simple regex. But for production use, you need:
- Multiple regex patterns to catch different email formats
- DNS MX record validation to verify the domain exists
- Filtering out false positives (image filenames, CSS classes)
- Deduplication
The Regex
const EMAIL_REGEX = /[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}/g;
This catches 99% of email formats. But raw results need cleaning:
function cleanEmails(rawEmails) {
return [...new Set(rawEmails)]
.map(e => e.toLowerCase().trim())
.filter(e => !e.endsWith(".png"))
.filter(e => !e.endsWith(".jpg"))
.filter(e => !e.includes("example.com"));
}
MX Record Validation
import dns from "dns/promises";
async function validateDomain(domain) {
try {
const mx = await dns.resolveMx(domain);
return mx.length > 0;
} catch {
return false;
}
}
Provider Detection
function detectProvider(mxRecords) {
if (mxRecords.some(r => r.exchange.includes("google"))) return "Google Workspace";
if (mxRecords.some(r => r.exchange.includes("outlook"))) return "Microsoft 365";
if (mxRecords.some(r => r.exchange.includes("protonmail"))) return "ProtonMail";
return null;
}
Complete Workflow
- Fetch the webpage HTML
- Extract all email-like strings with regex
- Clean and deduplicate
- Validate each domain via MX records
- Detect email provider
- Return structured results
Free Tools
I built two free tools that do this automatically:
- Email Extractor — find emails on any webpage
- Email Validator — verify addresses with MX records
Both on the Apify Store.
More tools: 60+ web scrapers, 15 MCP servers for AI, free market research.
Top comments (0)