A surprising amount of lead generation is just this: you have a list of company websites, and you need an email, a phone number, or a LinkedIn page for each one. No API can sell you that for free, but the sites themselves publish it. Here is the whole recipe, no browser required.
The shape of the problem
For each domain you need to answer three questions: is the site alive, which pages would hold contact info, and what is on them. That is one homepage fetch, a little link analysis, and a few more fetches. Plain fetch handles all of it.
Finding the contact pages
Do not guess paths. Fetch the homepage, then rank its own links:
const CONTACT_RE = /contact|about|impressum|kontakt|support|team/i;
const links = [...$('a[href]')].map((el) => $(el).attr('href'))
.filter((href) => CONTACT_RE.test(href));
Same-host links only, deduplicated by pathname, sorted so contact beats about beats team. Three or four extra pages per site is plenty.
Extraction, precision first
Emails come from two sources: mailto: links (high precision) and a pattern match over the HTML (high recall). The recall side needs filtering or you will ship garbage:
- Asset filenames match email patterns:
logo@2x.png - Package versions do too:
core-js@3.32.1 - Error trackers embed DSN keys that look exactly like emails:
c3ab85...@o1069899.ingest.sentry.io
Phone numbers are worse. A pattern like 00\d{9,} will match random digit runs in inline JSON on almost every modern site. What survived testing: trust tel: links completely, and only accept on-page text that is explicitly formatted, + international, (555) 123-4567, or dot/dash separated.
Social profiles are anchor hrefs pointing at the networks' hostnames, minus the plumbing: share widgets (/sharer, /intent), login pages, and embed iframes all live on the same hosts as real profiles.
Picking the best email
A site often exposes several addresses. Rank them: an inbox on the site's own domain beats a third party one, and role inboxes like contact@, info@, and sales@ beat personal ones for cold outreach. Return the whole list, but give the buyer a bestEmail field so the common case needs no thought.
What this costs
Nothing, basically. Twelve sites with up to five pages each ran in 21 seconds for well under a cent of compute. No browser, no proxy, no key. The data is on public pages that companies want you to find; the work is fetching politely and filtering carefully.
If you would rather not maintain the filters yourself, I packaged this as a pay per use actor: Website Contact Scraper: Emails, Phones & Social Profiles. One row per site, and you are only charged when it actually finds a contact. The first 10 contact rows of every run are free.
Top comments (0)