DEV Community

Valentina Skakun for HasData

Posted on

How to Build an Email Scraper (with Code + Free Tool)

This tutorial is for those who want to build their own email scraper. At the end, I’ll show an improved version – a Streamlit app that anyone can use (and check the source code on GitHub).

Step 1: Get the Page Source Code

Before scraping emails, you need the raw HTML. You’ve got two ways to get it: either use a web scraping API or do it yourself with requests or headless browser tools like Selenium or Playwright.

Option A: Use a Scraping API

For sites that load content dynamically or block scrapers, APIs like HasData make life easier. They handle proxies, captchas, and JavaScript.
First, import the libraries, set variables (HasData API key, URL), and the request headers:

import requests
import json

api_key = "YOUR-API-KEY"
url= "https://example.com"

headers = {
    'Content-Type': 'application/json',
    'x-api-key': api_key
}
Enter fullscreen mode Exit fullscreen mode

Then set up the request body with the target URL and extractEmails: true. This tells the API to return both the page content and a list of emails.

payload = json.dumps({
    "url": url,
    "proxyType": "datacenter",
    "proxyCountry": "US",
    "jsRendering": True,
    "extractEmails": True,
})
Enter fullscreen mode Exit fullscreen mode

Now, make the request:

response = requests.post("https://api.hasdata.com/scrape/web", headers=headers, data=payload)
Enter fullscreen mode Exit fullscreen mode

If you’re sticking with this method, jump to the next part – handling the API response.

Option B: Use Python’s requests Library

For static pages or quick tests, or if you just don’t want to mess with APIs, use requests to fetch the page:

import requests
import re

found_emails = set()
url = "https://example.com"
response = requests.get(url, timeout=10)
Enter fullscreen mode Exit fullscreen mode

Next, we’ll extract emails, but we’ll get to that soon.

Step 2: Extract Email Addresses

Once you have the HTML, there are a couple of ways to extract emails.

Option A: Using the API’s Built-in Email Extractor

After the HasData API request:

response = requests.post("https://api.hasdata.com/scrape/web", headers=headers, data=payload)
Enter fullscreen mode Exit fullscreen mode

Parse the response and extract the email data:

data = response.json()
emails = data.get("emails", [])
Enter fullscreen mode Exit fullscreen mode

Print the results:

print(url, " | ", emails)
Enter fullscreen mode Exit fullscreen mode

It’s not much use to save emails from just one site, but we’ll handle saving after wrapping this in a loop.

Option B: Use Regex to Extract Emails from HTML

If you’re going the hard way, use regex to extract emails matching a pattern:

if response.status_code == 200:
    email_pattern = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}"
    emails = re.findall(email_pattern, response.text)
    for email in emails:
        found_emails.add((url, email))
else:
    print(f"[{response.status_code}] {url}")
Enter fullscreen mode Exit fullscreen mode

I did it this way to catch duplicate emails scattered around the page.

Step 3: Loop Through Multiple URLs

To scrape more than one site, load a list of URLs from a file or a variable:

with open("urls.txt", "r", encoding="utf-8") as file:
    urls = [line.strip() for line in file if line.strip()]
Enter fullscreen mode Exit fullscreen mode

Then loop through them, scraping each and collecting emails:

for url in urls:
Enter fullscreen mode Exit fullscreen mode

Don’t forget a variable to store all found emails:

results = []
Enter fullscreen mode Exit fullscreen mode

Add new pairs of URL and emails:

        results.append({
            "url": url,
            "emails": emails
        })
Enter fullscreen mode Exit fullscreen mode

The code stays the same whether you use the API or scrape without it.

Step 4: Save the Results

Finally, save the collected emails. The easiest way is to write them to JSON:

with open("results.json", "w", encoding="utf-8") as json_file:
    json.dump(results, json_file, ensure_ascii=False, indent=2)
Heres how to save to CSV as well:
with open("results.csv", "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["url", "email"])  
    for result in results:
        for email in result["emails"]:
            writer.writerow([result["url"], email])
Enter fullscreen mode Exit fullscreen mode

Other formats aren’t worth the trouble.

TL;DR

If you got lost, skipped stuff, or just want the code, this part is for you.

Email Scraper with HasData’s API

Here’s the full scraper code:

import requests
import json
import csv

api_key = "YOUR-API-KEY"

headers = {
    'Content-Type': 'application/json',
    'x-api-key': api_key
}

results = []

with open("urls.txt", "r", encoding="utf-8") as file:
    urls = [line.strip() for line in file if line.strip()]

for url in urls:
    payload = json.dumps({
        "url": url,
        "proxyType": "datacenter",
        "proxyCountry": "US",
        "jsRendering": True,
        "extractEmails": True,
    })

    try:
        response = requests.post("https://api.hasdata.com/scrape/web", headers=headers, data=payload)
        response.raise_for_status()
        data = response.json()
        emails = data.get("emails", [])

        results.append({
            "url": url,
            "emails": emails
        })

    except Exception as e:
        results.append({
            "url": url,
            "emails": []
        })

with open("results.json", "w", encoding="utf-8") as json_file:
    json.dump(results, json_file, ensure_ascii=False, indent=2)

with open("results.csv", "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["url", "email"])  
    for result in results:
        for email in result["emails"]:
            writer.writerow([result["url"], email])
Enter fullscreen mode Exit fullscreen mode

I also added try..except blocks to catch errors.

Email Scraper with Regex

If you’re anti-API and into the hardcore way, here’s the code:

import requests
import re
import csv


found_emails = set()
output_file = "found_emails.csv"
file_path = "urls.txt"


with open(file_path, "r", encoding="utf-8") as file:
    websites = [line.strip() for line in file if line.strip()]


email_pattern = r"[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Z|a-z]{2,}"
for website in websites:
    response = requests.get(website, timeout=10)
    if response.status_code == 200:
        emails = re.findall(email_pattern, response.text)
        for email in emails:
            found_emails.add((website, email))
    else:
        print(f"[{response.status_code}] {website}")


with open(output_file, "w", newline="", encoding="utf-8") as csv_file:
    writer = csv.writer(csv_file)
    writer.writerow(["Website", "Email"])
    for website, email in found_emails:
        writer.writerow([website, email])


print(f"Saved {len(found_emails)} emails to {output_file}")
Enter fullscreen mode Exit fullscreen mode

But be ready – this won’t work on every site. For improvement, switch requests for Selenium or any tool that mimics real user actions.

Bonus: Ready-to-Use Email Scraper Tool

Want to skip coding? Or just see how the tutorial code can be leveled up?
Try the free Email Scraper Tool built with Streamlit + HasData API. It works with Google Search, Maps, and raw URLs, and exports to CSV/JSON.
Email Scraper Tool
That’s all. Now go scrape what you need. Just don’t overdo it.

Top comments (0)