DEV Community

Mate Technologies
Mate Technologies

Posted on

Build Your Own Email Scraper with Python – Step by Step 🐍

Ever wanted to find public email addresses online automatically? In this tutorial, we’ll build EmailScout – Public Contact Finder, a Python tool that searches Google, scrapes pages, and exports results. We’ll break it down so even beginners can follow!

GitHub repo for this project: EmailScout on GitHub

Step 1: Setup & Install Dependencies

We’ll use a few libraries:

tkinter for the GUI

ttkbootstrap for styling

requests for HTTP requests

BeautifulSoup for parsing HTML

re for regex email matching

Install the extra packages with pip:

pip install ttkbootstrap beautifulsoup4 requests

Step 2: Import Libraries

Start your script by importing everything you’ll need:

import tkinter as tk
from tkinter import messagebox, filedialog
import ttkbootstrap as tb
from ttkbootstrap.widgets.scrolled import ScrolledText
import threading
import time
import json
import csv
import requests
import re
import os
import sys
from collections import defaultdict
from bs4 import BeautifulSoup
Enter fullscreen mode Exit fullscreen mode

Explanation:
These imports give us GUI tools, threading for running tasks in the background, and libraries to handle HTTP requests and HTML parsing.

Step 3: Setup Basic Variables & Regex

We need a regex pattern to detect emails and a place to store results:

HEADERS = {"User-Agent": "Mozilla/5.0"}
SEARCH_URL = "https://www.google.com/search"

emails_found = set()
sources = defaultdict(list)
stop_event = threading.Event()
scrape_completed = False

# Regex pattern for emails
EMAIL_REGEX = re.compile(r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}")
Enter fullscreen mode Exit fullscreen mode

Explanation:

emails_found stores unique emails.

sources keeps track of where each email was found.

EMAIL_REGEX matches common email formats.

stop_event allows us to stop the scraper mid-run.

Step 4: Create the GUI Window

We use ttkbootstrap to make a styled window:

app = tb.Window("EmailScout – Public Contact Finder", themename="superhero", size=(1300, 680))
app.grid_columnconfigure(0, weight=1)
app.grid_rowconfigure(1, weight=1)
Enter fullscreen mode Exit fullscreen mode

Explanation:

This sets up a resizable window with a “superhero” theme.

grid_columnconfigure and grid_rowconfigure make the layout flexible.

Step 5: Input Section – Enter Keywords

Users need to input search queries:

input_card = tb.Labelframe(app, text="Search Keywords", padding=15)
input_card.grid(row=0, column=0, sticky="nsew", padx=10, pady=10)

tb.Label(input_card, text="One search per line (e.g. 'AI developer contact email')").pack(anchor="w")
keywords_input = ScrolledText(input_card, height=7)
keywords_input.pack(fill="both", expand=True)
Enter fullscreen mode Exit fullscreen mode

Explanation:

A Labelframe organizes the input area.

ScrolledText allows multi-line input with scrollbars.

Step 6: Output Section – Live Results

We want users to see emails as they are found:

output_card = tb.Labelframe(app, text="Live Results", padding=15)
output_card.grid(row=1, column=0, sticky="nsew", padx=10, pady=10)

log = ScrolledText(output_card)
log.pack(fill="both", expand=True)
log.text.config(state="disabled")
Enter fullscreen mode Exit fullscreen mode

Explanation:

This is a read-only scrollable text area.

We’ll append new emails to this as they are scraped.

Step 7: Footer Buttons

Add buttons for Start, Stop, and Export:

footer = tb.Frame(app)
footer.grid(row=2, column=0, sticky="ew", padx=10, pady=5)

start_btn = tb.Button(footer, text="Start", bootstyle="success", width=18)
start_btn.pack(side="left", padx=5)

stop_btn = tb.Button(footer, text="Stop", bootstyle="danger", width=15)
stop_btn.pack(side="left", padx=5)
stop_btn.config(state="disabled")

export_txt = tb.Button(footer, text="Export TXT", width=15)
export_txt.pack(side="left", padx=5)

export_csv = tb.Button(footer, text="Export CSV", width=15)
export_csv.pack(side="left", padx=5)

export_json = tb.Button(footer, text="Export JSON", width=15)
export_json.pack(side="left", padx=5)
Enter fullscreen mode Exit fullscreen mode

Explanation:

Start begins scraping.

Stop allows interrupting.

Export saves results in different formats.

Step 8: Logging Helper

We need a simple function to log emails in real-time:

def log_line(t):
    log.text.config(state="normal")
    log.text.insert("end", t + "\n")
    log.text.see("end")
    log.text.config(state="disabled")
Enter fullscreen mode Exit fullscreen mode

Step 9: Google Search & Scraper Functions

Here’s the core scraping logic:

def google_search(query):
    params = {"q": query, "num": 5}
    r = requests.get(SEARCH_URL, params=params, headers=HEADERS, timeout=10)
    soup = BeautifulSoup(r.text, "html.parser")
    return [a["href"] for a in soup.select("a") if a.get("href", "").startswith("http")]

def scrape_page(url):
    try:
        r = requests.get(url, headers=HEADERS, timeout=10)
        return set(EMAIL_REGEX.findall(r.text))
    except:
        return set()
Enter fullscreen mode Exit fullscreen mode

Explanation:

google_search finds links from Google.

scrape_page downloads the page and finds emails using regex.

Step 10: Running the Scraper in Threads

We don’t want the GUI to freeze, so we use threads:

def run_scraper(queries):
    global scrape_completed

    for q in queries:
        if stop_event.is_set(): return
        log_line(f"🔍 Searching: {q}")

        urls = google_search(q)
        for url in urls:
            if stop_event.is_set(): return

            emails = scrape_page(url)
            for e in emails:
                if e not in emails_found:
                    emails_found.add(e)
                    sources[e].append(url)
                    log_line(e)

            time.sleep(0.6)

    scrape_completed = True
    messagebox.showinfo("Done", f"Found {len(emails_found)} public emails.")
Enter fullscreen mode Exit fullscreen mode

Explanation:

Loops through queries and URLs.

Adds new emails to the set and logs them.

time.sleep prevents overwhelming servers.

Step 11: Start & Stop Buttons

Connect the buttons to actions:

def start_scraping():
    global scrape_completed
    scrape_completed = False
    stop_event.clear()
    emails_found.clear()
    sources.clear()

    queries = [q.strip() for q in keywords_input.get("1.0", "end").splitlines() if q.strip()]
    if not queries:
        messagebox.showerror("Input Error", "Please enter at least one search query.")
        return

    log.text.config(state="normal")
    log.text.delete("1.0", "end")
    log.text.config(state="disabled")

    stop_btn.config(state="normal")
    start_btn.config(state="disabled")

    threading.Thread(target=run_scraper, args=(queries,), daemon=True).start()

def stop_scraping():
    stop_event.set()
    log_line("⛔ Stopped by user")
    stop_btn.config(state="disabled")
    start_btn.config(state="normal")
Enter fullscreen mode Exit fullscreen mode

Step 12: Exporting Results

Allow users to save emails:

def export_file(fmt):
    if not emails_found or not scrape_completed:
        messagebox.showerror("Export Error", "Nothing to export.")
        return

    path = filedialog.asksaveasfilename(defaultextension=f".{fmt}")
    if not path: return

    if fmt == "txt":
        with open(path, "w") as f:
            for e in sorted(emails_found):
                f.write(e + "\n")

    elif fmt == "csv":
        with open(path, "w", newline="") as f:
            w = csv.writer(f)
            w.writerow(["email", "source"])
            for e, s in sources.items():
                w.writerow([e, ", ".join(s)])

    elif fmt == "json":
        with open(path, "w") as f:
            json.dump(sources, f, indent=2)

    messagebox.showinfo("Exported", "File saved successfully.")
Enter fullscreen mode Exit fullscreen mode

Step 13: Run the App

Finally, start the Tkinter main loop:

app.mainloop()
Enter fullscreen mode Exit fullscreen mode

🎉 Congratulations!

You’ve built a fully functional public email finder with Python.
You can now:

Enter search queries

Scrape public emails from websites

Export results in TXT, CSV, or JSON

GitHub repo link for the full project:
https://github.com/rogers-cyber/python-tiny-tools/tree/main/EmailScout_Public_Contact_Finder

Python #WebScraping #Automation #Tkinter #BeginnerPython #EmailScraper #DevTutorial

Top comments (0)