Ava Torres

Posted on Mar 26

How I Built a Lead Generation Pipeline Using YellowPages Data

#leadgeneration #sales #automation #webscraping

If you're doing local lead generation -- for an agency, a SaaS startup, or your own outreach -- you need business contact data. The expensive way is paying $200-500/month for a tool like ZoomInfo, Apollo, or Lusha. The scrappy way is pulling it from public business directories.

YellowPages.com has over 20 million business listings across every US city and category. Each listing includes the business name, phone number, address, website, hours, and often an email address. It's one of the most comprehensive sources of small/medium business data that exists.

The catch: YellowPages uses Cloudflare protection that blocks simple HTTP scrapers. Every popular Python library (BeautifulSoup, Scrapy, requests) gets blocked within a few pages. You need browser automation with anti-detection to extract data reliably.

What you get from a YellowPages listing

Each business listing contains:

Business name and category
Phone number (primary + sometimes mobile)
Full address with city, state, ZIP
Website URL
Email address (when listed -- roughly 20% of businesses)
Business hours
Years in business
Rating and review count

For lead generation, the combination of phone + email + website + category + location is exactly what you need to build targeted outreach lists.

Why most scrapers fail

YellowPages has progressively tightened their anti-bot protection:

Cloudflare challenge -- every request is checked for browser fingerprint, TLS signature, and JavaScript execution. Python's requests library fails immediately.
Rate limiting -- too many requests from the same IP in a short window triggers blocks.
Dynamic rendering -- email addresses and some phone numbers are loaded via JavaScript after the initial page load. Simple HTML parsing misses them entirely.
Page reuse detection -- reusing the same browser page for multiple listings causes data extraction to silently fail (emails drop from ~20% to ~3%).

These aren't theoretical problems. I built and broke several versions of a YP scraper before finding a reliable approach.

What actually works

The working approach uses:

Headless Chrome with stealth modifications (spoofed navigator properties, WebGL fingerprint, plugin arrays)
Fresh browser page per listing -- critical for email extraction reliability
Structured data extraction from JSON-LD embedded in the page (more reliable than CSS selectors)
Controlled concurrency -- enough parallelism to be fast, not enough to trigger blocks

I packaged this into a YellowPages Scraper on Apify that handles all of this automatically. You provide a search term (e.g., "plumbers") and location (e.g., "Chicago, IL"), and it returns structured data for every matching business.

Building a lead gen pipeline

Raw data isn't a pipeline. Here's how to turn YellowPages data into actual outreach:

Step 1: Define your ICP

Before you scrape anything, know exactly who you're targeting. "Plumbers in Chicago" is 2,000+ results. "Plumbers in Chicago with a website but no email listed" is a much smaller, more actionable list -- these are businesses that have an online presence but might be harder to reach, suggesting they'd benefit from better digital presence.

Step 2: Extract and filter

Run the scraper with your category and location. Export to CSV. Filter by:

Has website (indicates some digital maturity)
Has email (for email outreach)
Years in business > 3 (established enough to have budget)
Has reviews (actively engaged with customers)

Step 3: Enrich

YellowPages gives you the basics. For B2B outreach, you might want to enrich with:

LinkedIn company page (search by business name + city)
Estimated revenue (SBA size standards by NAICS code)
Technology stack (via their website)

Step 4: Outreach

With a clean, filtered list, you can run personalized outreach. The key differentiator is specificity -- "I noticed your plumbing business in Lincoln Park has great reviews but your website doesn't show up for 'emergency plumber Chicago'" is 10x more effective than a generic cold email.

Scale considerations

YellowPages has listings for every US city and hundreds of business categories. A few numbers:

"Restaurants" in New York City: 8,000+ listings
"Dentists" in Los Angeles: 3,000+ listings
"HVAC" nationwide: 50,000+ listings

The Apify actor handles pagination automatically. For large extractions, use the maxResults parameter to control volume and cost.

Cost comparison

Source	Cost	Data quality
ZoomInfo	$15,000+/year	High, but expensive
Apollo	$200-500/month	Good for tech companies
Lusha	$300+/month	Good for direct contacts
YellowPages scraper	~$0.005/result	Local business focus

For local business lead generation specifically, the cost difference is 10-50x.

Getting started

Go to the YellowPages Scraper
Enter a search term and location
Set max results (start with 50 to test)
Export to CSV or connect via API

The actor runs on Apify's infrastructure, so you don't need to set up proxies, manage browsers, or handle Cloudflare challenges yourself.

DEV Community