If you're doing local lead generation -- for an agency, a SaaS startup, or your own outreach -- you need business contact data. The expensive way is paying $200-500/month for a tool like ZoomInfo, Apollo, or Lusha. The scrappy way is pulling it from public business directories.
YellowPages.com has over 20 million business listings across every US city and category. Each listing includes the business name, phone number, address, website, hours, and often an email address. It's one of the most comprehensive sources of small/medium business data that exists.
The catch: YellowPages uses Cloudflare protection that blocks simple HTTP scrapers. Every popular Python library (BeautifulSoup, Scrapy, requests) gets blocked within a few pages. You need browser automation with anti-detection to extract data reliably.
What you get from a YellowPages listing
Each business listing contains:
- Business name and category
- Phone number (primary + sometimes mobile)
- Full address with city, state, ZIP
- Website URL
- Email address (when listed -- roughly 20% of businesses)
- Business hours
- Years in business
- Rating and review count
For lead generation, the combination of phone + email + website + category + location is exactly what you need to build targeted outreach lists.
Why most scrapers fail
YellowPages has progressively tightened their anti-bot protection:
Cloudflare challenge -- every request is checked for browser fingerprint, TLS signature, and JavaScript execution. Python's
requestslibrary fails immediately.Rate limiting -- too many requests from the same IP in a short window triggers blocks.
Dynamic rendering -- email addresses and some phone numbers are loaded via JavaScript after the initial page load. Simple HTML parsing misses them entirely.
Page reuse detection -- reusing the same browser page for multiple listings causes data extraction to silently fail (emails drop from ~20% to ~3%).
These aren't theoretical problems. I built and broke several versions of a YP scraper before finding a reliable approach.
What actually works
The working approach uses:
- Headless Chrome with stealth modifications (spoofed navigator properties, WebGL fingerprint, plugin arrays)
- Fresh browser page per listing -- critical for email extraction reliability
- Structured data extraction from JSON-LD embedded in the page (more reliable than CSS selectors)
- Controlled concurrency -- enough parallelism to be fast, not enough to trigger blocks
I packaged this into a YellowPages Scraper on Apify that handles all of this automatically. You provide a search term (e.g., "plumbers") and location (e.g., "Chicago, IL"), and it returns structured data for every matching business.
Building a lead gen pipeline
Raw data isn't a pipeline. Here's how to turn YellowPages data into actual outreach:
Step 1: Define your ICP
Before you scrape anything, know exactly who you're targeting. "Plumbers in Chicago" is 2,000+ results. "Plumbers in Chicago with a website but no email listed" is a much smaller, more actionable list -- these are businesses that have an online presence but might be harder to reach, suggesting they'd benefit from better digital presence.
Step 2: Extract and filter
Run the scraper with your category and location. Export to CSV. Filter by:
- Has website (indicates some digital maturity)
- Has email (for email outreach)
- Years in business > 3 (established enough to have budget)
- Has reviews (actively engaged with customers)
Step 3: Enrich
YellowPages gives you the basics. For B2B outreach, you might want to enrich with:
- LinkedIn company page (search by business name + city)
- Estimated revenue (SBA size standards by NAICS code)
- Technology stack (via their website)
Step 4: Outreach
With a clean, filtered list, you can run personalized outreach. The key differentiator is specificity -- "I noticed your plumbing business in Lincoln Park has great reviews but your website doesn't show up for 'emergency plumber Chicago'" is 10x more effective than a generic cold email.
Scale considerations
YellowPages has listings for every US city and hundreds of business categories. A few numbers:
- "Restaurants" in New York City: 8,000+ listings
- "Dentists" in Los Angeles: 3,000+ listings
- "HVAC" nationwide: 50,000+ listings
The Apify actor handles pagination automatically. For large extractions, use the maxResults parameter to control volume and cost.
Cost comparison
| Source | Cost | Data quality |
|---|---|---|
| ZoomInfo | $15,000+/year | High, but expensive |
| Apollo | $200-500/month | Good for tech companies |
| Lusha | $300+/month | Good for direct contacts |
| YellowPages scraper | ~$0.005/result | Local business focus |
For local business lead generation specifically, the cost difference is 10-50x.
Getting started
- Go to the YellowPages Scraper
- Enter a search term and location
- Set max results (start with 50 to test)
- Export to CSV or connect via API
The actor runs on Apify's infrastructure, so you don't need to set up proxies, manage browsers, or handle Cloudflare challenges yourself.
Top comments (0)