📝 Follow-up to "I Built a Google Maps Email Scraper That Finds 74% More Emails Than the Competition" (Apr 16, 2026). Two weeks later, I shipped inline email validation (MX/SPF/DMARC) — and turns out half of the emails the scraper had been finding wouldn't have delivered. Here's what I added and what I learned.
TL;DR
I built a Google Maps scraper on Apify that validates every email it finds inline (MX records + SPF + DMARC + catch-all detection). On a 168-record Austin dentists run, only 42 of the 89 emails graded "high deliverability" — meaning roughly half of the scraped emails would have hurt my domain reputation if I'd sent cold-email to them unchecked.
Bigger surprise: in EU markets, the email hit rate jumps from ~5% to 85% when you add localized contact-page paths (kapcsolat, impressum, contacto, contatti, contactez-nous, kontakty). Most scrapers don't.
Code patterns + actor link below.
The problem
50+ Google Maps scrapers exist on Apify alone. Most do the same pipeline:
Search Google Maps → harvest place URLs → visit each website → regex-grep emails → return JSON
Output looks fine. But ~50% of those emails bounce or land in spam when you actually send to them. Why?
- Typos in the website itself (
info@compant.com) - Dead domains (MX returns NXDOMAIN)
- Catch-all servers (accept any RCPT TO, then bounce silently)
- No SPF/DMARC at the receiver — your sender reputation gets clobbered
And in non-English markets, the scraper often returns no email at all because the contact page is at /kapcsolat or /impressum, not /contact.
Two fixes:
- Inline email validation (MX/SPF/DMARC + catch-all)
- Multilingual contact-page crawl
Inline email validation, in code
Five-layer probe per email:
const dns = require('dns/promises');
async function validateEmail(email) {
const [, domain] = email.split('@');
const result = { mxRecords: 0, hasSpf: false, hasDmarc: false,
smtpValid: null, isCatchAll: null, deliverability: 'unknown' };
// 1. MX records — does the domain accept mail?
try {
const mx = await dns.resolveMx(domain);
result.mxRecords = mx.length;
} catch { result.mxRecords = 0; }
if (result.mxRecords === 0) {
result.deliverability = 'low';
return result;
}
// 2-3. SPF + DMARC TXT lookups
try {
const txt = await dns.resolveTxt(domain);
result.hasSpf = txt.some(arr => arr.join('').toLowerCase().startsWith('v=spf1'));
} catch {}
try {
const dmarcTxt = await dns.resolveTxt(`_dmarc.${domain}`);
result.hasDmarc = dmarcTxt.some(arr => arr.join('').toLowerCase().startsWith('v=dmarc1'));
} catch {}
// 4. SMTP RCPT TO probe (optional, often blocked by Gmail/Outlook)
// ... skipped for brevity, see full code
// 5. Roll up to grade
if (result.mxRecords > 0 && result.hasSpf && result.hasDmarc) {
result.deliverability = 'high';
} else if (result.mxRecords > 0) {
result.deliverability = 'medium';
}
return result;
}
Per-domain DNS probing takes ~50ms. For 100 emails, you spend ~5 seconds total. Caching by domain makes this cheaper across batches.
Compare this to paid validators:
- ZeroBounce: $0.007/email
- NeverBounce: $0.008/email
- Bouncer: $0.004/email
- Kickbox: $0.01/email
For 1,000 leads, that's $4-$10 you don't need to spend if validation is built into the scraper.
Multilingual contact-page crawl, in code
The actual content of the URL frontier:
const CONTACT_PATHS = [
// English
'/contact', '/contact-us', '/about', '/about-us',
// Hungarian
'/kapcsolat', '/elerhetoseg',
// German (Impressum is legally required)
'/kontakt', '/impressum', '/ansprechpartner',
// Spanish
'/contacto', '/contactar', '/contactenos',
// Italian
'/contatti', '/contattaci',
// French
'/contactez-nous', '/nous-contacter',
// Polish
'/kontakt-z-nami', '/kontakty',
// Czech / Slovak
'/kontakt', '/kontaktujte-nas',
// Portuguese (BR + PT)
'/contato', '/contatos', '/contacto',
// Dutch
'/over-ons',
];
async function crawlForEmails(baseUrl) {
const emails = new Set();
const re = /\b[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,}\b/gi;
for (const path of CONTACT_PATHS) {
try {
const html = await fetch(new URL(path, baseUrl)).then(r => r.text());
(html.match(re) || []).forEach(e => emails.add(e.toLowerCase()));
} catch {}
}
return [...emails];
}
In a 20-record Berlin Mitte sample (Zahnarzt Berlin Mitte), this single change moved the email hit rate from 5% to 85%. Why? Because the German Impressum page is legally required to disclose owner name + email, so it's almost always present and almost always contains a real human's email.
Real run stats
I ran the actor with geoGridTiles: 3 (a 3×3 viewport grid over Austin, TX) and maxResults: 150. Cross-tile dedup filtered 43 duplicates. Final dataset: 168 unique leads.
| Metric | Count | Rate |
|---|---|---|
| With email | 89 / 168 | 53% |
| With phone | 168 / 168 | 100% |
| With website | 162 / 168 | 96% |
Email graded high
|
42 / 89 | 47% of emails |
Lead readiness hot
|
139 / 168 | 83% |
| Modern websites | 117 / 168 | 70% |
| Total cost | — | $0.84 |
| Runtime | — | 22 min |
EU markets gave even better numbers because of the Impressum law:
| Market | Sample | Email Hit Rate | High Deliverability |
|---|---|---|---|
| Austin TX (US) | 168 | 53% | 25% |
| Manhattan (US) | 25 | 64% | 32% |
| Shoreditch (UK) | 20 | 70% | 40% |
| Berlin Mitte (DE) | 20 | 85% | 65% |
Three patterns I'd reuse on any scraper
Preflight budget check. Estimate runtime BEFORE the run starts. If estimated > timeout, refuse to start (zero events charged) and tell the user exactly which knob to lower. Lots of users hate guessing whether their config will fit; once preflight shipped, my actor's timeout-rate dropped from 20% to ~0%.
Pay-per-result instead of CU-based. Apify lets you bill per "event" (PAY_PER_EVENT). I switched to $0.005 per delivered lead + $0.00005 per run start. Failed/timed-out runs cost $0. Customers love the predictability — they can budget exactly.
Delta mode. Pass the previous run's dataset ID; skip already-seen placeId AND cid BEFORE any billable event fires. Weekly recurring scrape costs the same as a one-off — you only pay for genuinely new businesses.
// Skip-before-bill check
if (knownPlaceIds.has(item.placeId) || knownCids.has(item.cid)) {
continue; // no enrich, no email validation, no billing event
}
Try it / fork it
The actor is on Apify Store: Google Maps Email Extractor with Built-in Email Validation
Free tier gives ~100 leads to test on your specific market. Drop a query like dentist Austin Texas or Zahnarzt Berlin Mitte and see what comes out.
Source code: the actor's source is closed on Apify, but the patterns above are MIT-licensed in this article — feel free to copy them into your own scraper. The biggest leverage is the multilingual contact-page list; the validation code is straightforward DNS plumbing.
If you've built something similar, or you have a market where my localization paths break, drop a comment. I'm tracking failure cases on the actor's Issues tab.
Top comments (1)
Curious — for anyone running cold-outreach pipelines, what's your DNS-only validation false-negative rate vs paid services like ZeroBounce/NeverBounce?
In my own testing, ~30% of emails graded "high deliverability" still bounced when sent, mostly because the receiver MX accepts but the inbox is dormant or the catch-all flag was misread.
Anyone getting better numbers with a different scoring scheme?