How to Scrape 798 Marketing Agency Emails Across 54 Countries Using Python

i built a scraping pipeline that extracts verified email addresses from marketing agency websites across 54 countries. here's the exact process, the code patterns, and what i learned scraping at scale.

why agency emails?

i sell cold outreach services to marketing agencies. to pitch them, i need their email addresses. buying lists is expensive and often outdated. scraping them myself means fresher data and zero cost.

the tools

beautifulsoup4    # HTML parsing
requests          # HTTP requests  
re                # email regex extraction
json              # batch file management
smtplib           # sending (later)

step 1: find agencies via web search

for each target city, i search for "digital marketing agency {city} {country} email contact". the first 2-3 pages of results usually contain 8-15 agency websites.

step 2: extract emails from websites

for each agency URL, i scrape their contact page, about page, and homepage. the email extraction regex:

import re

def extract_emails(html_text):
    pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
    raw = re.findall(pattern, html_text)
    # filter out image filenames and spam traps
    filtered = [e for e in raw if not any(
        ext in e.lower() for ext in 
        ['.png', '.jpg', '.avif', '.svg', 'sentry', 'schema', 'cloudflare']
    )]
    return list(set(filtered))

the key insight: always check /contact, /contact-us, and /about pages. many agencies hide their email on the contact page only.

step 3: scan their site for personalization data

before emailing, i run my SEO analyzer against their domain. this gives me specific issues to reference in the pitch — missing alt text, no meta descriptions, slow page load. real problems, not generic "your SEO could be better."

step 4: generate personalized emails

each email template includes:

their domain name in the subject line
specific SEO findings from the scan
a clear value proposition
link to a free sample or landing page

the results

798 verified agency emails across 54 countries
additional 222 dentist practices and 554 multi-niche prospects
total pipeline: 1,204 emails ready to send

common scraping pitfalls

image filenames match email regex — always filter .png, .jpg, .avif
contact forms without visible emails — skip these, move to next agency
cloudflare/sentry emails in page source — filter by domain
rate limiting — add 2-3 second delays between requests
broken SSL certificates — use verify=False with caution

the data product

i packaged the agency contacts into a downloadable CSV:

free 50-agency sample — verify the data quality yourself
SEO chrome extension — scan any site for SEO issues ($9)
full outreach service — managed cold email campaigns

the scraping code runs on a basic linux server with cron jobs. total infrastructure cost: $0/month (using existing server).

if you're building outreach tools or scraping at scale, the hardest part isn't the code — it's maintaining data quality as you scale past hundreds of entries.