apify forge

Posted on Mar 24 • Originally published at apifyforge.com

Stop Manual Prospecting: How a 3-Actor Pipeline Finds and Scores B2B Leads

#webscraping #leadgeneration #python #automation

Most SDRs I've talked to spend 2-3 hours a day on manual prospecting. Open LinkedIn. Find a company. Google their website. Hunt for emails on the contact page. Copy-paste into a spreadsheet. Repeat 50 times. Maybe score them by gut feel. A 2024 Gartner report on sales development found that SDR teams spend 21% of their working time on manual data entry and research alone. That's a full day per week, gone.

I built the B2B Lead Gen Suite because I got tired of watching people run three separate tools in sequence when the whole thing could be one pipeline. You give it company URLs. It gives you back scored, graded, enriched leads with emails, phone numbers, contact names, email patterns, tech stack signals, and a 0-100 quality score. One input, one output, three actors chained together under the hood.

Here's how it works, when to use it, and what it actually costs compared to doing this manually or paying for Clay.

What is the B2B Lead Gen Suite?

The B2B Lead Gen Suite is an Apify actor that chains three specialized sub-actors into a single automated pipeline: Website Contact Scraper extracts emails, phones, and named contacts; Email Pattern Finder detects the company's email format and generates addresses; B2B Lead Qualifier scores each lead 0-100 on reachability, legitimacy, and web presence.

That's the snippet version. Here's the longer story.

I already had the three pieces built as standalone actors on ApifyForge. The Website Contact Scraper was pulling 11,000+ runs with a 99.8% success rate. The Email Pattern Finder was detecting patterns like first.last@company.com across domains. The B2B Lead Qualifier was grading leads A through F based on five scoring dimensions. People were already using all three together — just manually, one after the other, copying datasets between runs.

The suite eliminates the copying. It runs all three in sequence, passes data between them automatically, merges everything into a single enriched lead record, and sorts by score so your best prospects are at the top.

How does the 3-actor pipeline work?

The pipeline runs in three sequential steps. Step 1 crawls each company website to extract raw contact data. Step 2 analyzes discovered emails to detect the company's naming pattern and generate additional addresses. Step 3 re-crawls each site to score leads on five quality dimensions and assign a letter grade.

Each step feeds into the next. The contact scraper's output becomes the pattern finder's input. Both outputs feed into the qualifier. The orchestrator merges all three datasets by domain, deduplicates emails and phone numbers, and produces one combined record per company.

Step 1: Website Contact Scraper crawls up to 20 pages per domain (default is 5) looking for emails, phone numbers, named contacts with job titles, and social media links. It uses Apify's CheerioCrawler — fast HTTP parsing, no browser overhead. According to Apify's 2025 platform benchmarks, CheerioCrawler processes pages 5-10x faster than browser-based approaches while using a fraction of the compute.

Step 2: Email Pattern Finder takes the scraped emails and contact names, detects the dominant pattern (first.last@, firstinitiallast@, first@, etc.), and generates email addresses for any named contacts who didn't have a discoverable email. It also searches GitHub commits for additional email evidence. According to research by Return Path (now Validity), the first.last@ pattern alone accounts for roughly 32% of corporate email formats globally — but there are over a dozen common variations, and guessing wrong means bounced emails and damaged sender reputation.

Step 3: B2B Lead Qualifier makes a second pass over each domain to gather business signals. It scores leads across five dimensions:

Dimension	What it measures
Contact Reachability	How many emails, phones, named contacts are accessible
Business Legitimacy	Company address, registration signals, terms/privacy pages
Online Presence	Social media profiles, review site presence
Website Quality	SSL, load speed, CMS detection, mobile-friendliness
Team Transparency	Named team members, leadership pages, org structure signals

Each lead gets a 0-100 score and an A-F letter grade. The output is sorted highest score first.

What data comes back in each lead record?

Every enriched lead record includes the domain, URL, all discovered emails and phone numbers, named contacts with titles, social links, detected email pattern with confidence score, generated email addresses, lead score, grade, score breakdown across five dimensions, qualification signals, tech stack signals, business address, CMS platform, and pipeline metadata.

That's a lot of fields. Let me show what one actually looks like in practice. Say you run it against apify.com:

{
  "domain": "apify.com",
  "url": "https://apify.com",
  "emails": ["info@apify.com", "support@apify.com"],
  "phones": [],
  "contacts": [
    { "name": "Jan Curn", "title": "CEO" }
  ],
  "socialLinks": {
    "linkedin": "https://linkedin.com/company/apify"
  },
  "emailPattern": "first@apify.com",
  "emailPatternConfidence": 0.85,
  "generatedEmails": [
    { "name": "Jan Curn", "email": "jan@apify.com" }
  ],
  "score": 82,
  "grade": "A",
  "scoreBreakdown": {
    "contactReachability": 18,
    "businessLegitimacy": 20,
    "onlinePresence": 16,
    "websiteQuality": 18,
    "teamTransparency": 10
  },
  "signals": [
    { "signal": "ssl-valid", "category": "websiteQuality", "points": 5, "detail": "Valid SSL certificate" },
    { "signal": "has-linkedin", "category": "onlinePresence", "points": 8, "detail": "LinkedIn company page found" }
  ],
  "techSignals": ["react", "next.js", "vercel"],
  "cmsDetected": "next.js",
  "pipelineSteps": ["contact-scraper", "email-pattern-finder", "lead-qualifier"],
  "processedAt": "2026-03-24T14:30:00.000Z"
}

You get contacts, patterns, scores, and signals in one record. Import it into your CRM, filter by grade, and start outreach. No spreadsheet gymnastics.

How much does automated lead prospecting cost?

The B2B Lead Gen Suite uses Apify's pay-per-event pricing at $0.50 per enriched lead. A batch of 100 companies costs $50. There are no monthly subscriptions, seat licenses, or minimum commitments. Apify's free tier includes $5 of monthly platform credits for testing.

Here's the real comparison that matters — what you're actually paying across different approaches:

Approach	Cost for 100 leads/month	Annual cost	Data freshness
B2B Lead Gen Suite	$50	$600	Live (scraped at runtime)
Running 3 actors manually	~$40	$480	Live
Clay	$149-$720/mo	$1,788-$8,640	Database (days-weeks old)
Apollo.io	$49-$119/mo	$588-$1,428	Database (days-weeks old)
ZoomInfo	$15,000+/yr	$15,000+	Database (quarterly refresh)
Manual research (SDR @ $60K/yr)	$250+ in labor	$3,000+	Live but slow

The manual research line comes from real numbers. If an SDR makes $60K/year and spends 20% of their time on prospecting research (that Gartner 21% figure again), that's $12,000/year in labor just to find contacts. For 100 leads a month, you're paying roughly $250 in labor time. The suite does it for $50 and takes minutes instead of hours.

The $0.50 per lead covers all three pipeline steps. You can also skip steps you don't need — set skipEmailPatternFinder or skipLeadQualifier to true and you'll only pay for the steps that run. ApifyForge has a cost calculator that estimates your monthly spend based on volume.

Why not just use Hunter.io or Apollo?

Because they're searching a database, not the actual website.

Hunter.io, Apollo, and ZoomInfo maintain pre-scraped databases of email addresses and company information. When you query them, you're searching an index that might be days, weeks, or months old. A 2024 study by Validity found that B2B contact databases decay at roughly 2-3% per month — meaning 25-30% of records in an annual database are stale or wrong.

The B2B Lead Gen Suite scrapes the live website every time. If a company updated their team page yesterday, you get yesterday's data. If they changed their contact email this morning, you get this morning's data. There's no index to go stale.

The other difference is the scoring. Hunter.io and Apollo give you email confidence scores, but they don't tell you whether the company itself is worth pursuing. The Lead Qualifier step in this pipeline analyzes the company's web presence, tech stack, team transparency, and business legitimacy. An "A" grade company with a real address, named leadership team, active social profiles, and modern tech stack is a fundamentally different prospect than an "F" with a parked domain and no contact info. That signal matters more than knowing whether the email is deliverable.

For teams already paying for Apollo or ZoomInfo, this isn't necessarily a replacement. It's a supplement. Use your database tools for broad searches, then run the suite on your shortlist to get fresh data and quality scores before outreach.

How do you set up the B2B Lead Gen Suite?

Setting it up takes about two minutes. There's no code to write unless you want to call it programmatically.

Go to the B2B Lead Gen Suite on Apify. Paste your company URLs into the input — one per line, bare domains like stripe.com or full URLs like https://stripe.com both work. The actor normalizes everything.

The key settings:

maxPagesPerDomain (default 5) — how many pages to crawl during contact scraping. 5 covers homepage, contact, about, and team pages for most sites. Bump to 10-15 for large corporate sites.
maxQualifierPagesPerDomain (default 5) — pages to crawl during the qualification step. Same logic.
minScore (default 0) — filter out leads below this score. Set to 50 to only get C-grade or better. Set to 70 for B-grade and above.
skipEmailPatternFinder — skip pattern detection if you only need raw contacts and scores.
skipLeadQualifier — skip scoring if you just want contacts and email patterns.

For programmatic access:

from apify_client import ApifyClient

client = ApifyClient("YOUR_API_TOKEN")

run = client.actor("ryanclinton/b2b-lead-gen-suite").call(run_input={
    "urls": [
        "stripe.com",
        "notion.so",
        "linear.app",
        "vercel.com",
        "planetscale.com"
    ],
    "maxPagesPerDomain": 5,
    "minScore": 50
})

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if item.get("type") == "summary":
        print(f"Pipeline complete: {item['totalLeads']} leads")
        continue
    print(f"{item['domain']}: Grade {item['grade']} (Score: {item['score']})")
    print(f"  Emails: {', '.join(item.get('emails', []))}")
    print(f"  Pattern: {item.get('emailPattern', 'none detected')}")

Five URLs, one API call, three pipeline steps, scored leads sorted in your terminal. The whole run typically finishes in 5-10 minutes depending on how many pages each site has.

What can you do with scored leads?

The score and grade aren't just vanity metrics. They change how you prioritize outreach.

I've seen users split their output into three tiers. A/B leads (score 70+) go directly to personalized email sequences. C leads (50-69) get added to nurture campaigns. D/F leads (below 50) get dropped or flagged for manual review. HubSpot's 2025 Sales Trends Report found that companies using lead scoring see 77% higher lead gen ROI compared to those without any scoring system.

The score breakdown tells you why a lead scored the way it did. If a company scores 85 overall but has a 5/20 on team transparency, you know they don't list their team publicly — which might mean a smaller startup, or a company that's intentionally opaque. That context shapes your approach.

Some specific workflows I've seen ApifyForge users run:

Agency client prospecting. A marketing agency scrapes 500 local businesses from Google Maps Lead Enricher, feeds those URLs into the suite, filters by grade B+, and pitches the top 50. The score breakdown tells them which companies have weak online presence (their service) versus which already have it together (harder sell).

Competitor customer mining. You have a competitor's customer list (from case studies, testimonials, G2 reviews). Run those URLs through the suite. The tech signals tell you what stack they're on. The contact data gives you decision-maker emails. The score tells you which ones are worth pursuing. ApifyForge's lead generation comparison covers how this fits into broader competitive intelligence.

Inbound lead qualification. Someone fills out your demo form. Before the SDR picks up the phone, run the company domain through the suite. In 2 minutes you know their team size signals, tech stack, online presence, and whether they even have a real business behind the website. A McKinsey 2024 analysis found that companies qualifying inbound leads with automated data enrichment close 23% faster than those relying on manual research.

What are the limitations?

I'd rather you know the boundaries before you hit them in production.

JavaScript-rendered sites. The contact scraper uses CheerioCrawler (static HTML parsing). React/Angular/Vue apps that load contact info via client-side JavaScript won't have that content captured. According to W3Techs' 2025 survey, about 20% of business sites are fully client-rendered. For those, you'd need to run the individual sub-actors separately with a browser-based crawler, or use the Pro version of the contact scraper. ApifyForge has a contact scraper comparison that breaks down the differences.

Pipeline step failures are graceful, not fatal. If the Email Pattern Finder fails on a domain, the suite continues without pattern data. If the Lead Qualifier fails, you still get contacts and patterns but no scores. The pipelineSteps field in each record tells you exactly which steps completed. This is by design — I'd rather return partial data than nothing.

1,000 domains per run is the practical ceiling. The orchestrator calls sub-actors sequentially and caps dataset reads at 1,000 items. For bigger batches, split into multiple runs.

No CRM push built in. The output is a dataset you download as JSON, CSV, or Excel. If you want automatic Salesforce/HubSpot sync, use Apify's webhook integrations or their Zapier connector.

PPE spending limits are respected. If you set a spending cap on your Apify account, the suite stops mid-output when the cap is hit and tells you how many leads were processed versus skipped. No surprise bills.

How does this compare to building your own pipeline?

You could absolutely wire up the three sub-actors yourself. Call Website Contact Scraper, parse its dataset, transform the output into Email Pattern Finder input, call that, parse again, transform into Lead Qualifier input, call that, then merge all three datasets by domain. I know because people were doing exactly this before I built the suite.

The suite saves you from writing and maintaining that glue code. It handles URL normalization, error recovery (if a sub-actor fails, the pipeline continues), result merging with email/phone deduplication across all three data sources, score-based sorting, and the minScore filter. The transforms are non-trivial — the contact-scraper-to-pattern-finder transform matches emails to contact names, passes discovered names for email generation, and skips the website search since the scraper already covered it.

If you need custom logic between steps — say, filtering domains by industry before qualifying them — then running the actors individually gives you more control. ApifyForge's learn guide on PPE pricing explains how pay-per-event works across chained actors.

But if you just want "URLs in, scored leads out," the suite is the path of least resistance.

Is web scraping contacts legal?

This comes up in every conversation about lead gen tools, so let me address it directly.

Scraping publicly accessible business contact information from company websites is broadly considered legal in the US and EU, provided you're collecting data the company intentionally published for public consumption. The 2022 US Ninth Circuit ruling in hiQ v. LinkedIn confirmed that scraping publicly available data doesn't violate the Computer Fraud and Abuse Act. In the EU, GDPR allows processing of business contact data under the "legitimate interest" basis (Article 6(1)(f)), though you need to be able to justify your purpose.

That said, I'm not a lawyer, and you should consult one if you're scraping at scale for commercial outreach. Some ground rules: only scrape data the company published intentionally (contact pages, team pages, footer info). Don't scrape behind logins. Respect robots.txt. And if you're sending cold email with scraped addresses, comply with CAN-SPAM (US), GDPR (EU), or CASL (Canada) depending on your recipients' location.

ApifyForge's personal data exposure report can help you understand what data is publicly visible about individuals — useful for both prospecting and privacy compliance.

Who should use the B2B Lead Gen Suite?

SDRs who prospect more than 20 companies a week. Growth marketers who need enriched lead lists for campaigns. Agency owners who build prospect databases for clients. RevOps teams enriching CRM records. Recruiters mapping out teams at target companies.

Basically, anyone who's currently doing the manual version of "find company website, hunt for emails, look up who works there, decide if they're worth contacting." If that takes you more than an hour a week, the suite pays for itself on the first run.

The actor is live at apify.com/ryanclinton/b2b-lead-gen-suite. Run it on 5 domains and see if the output matches what you'd spend 30 minutes finding manually. That's the pitch.

Last updated: March 2026

Built and maintained by ApifyForge — 300+ Apify actors and 93 MCP intelligence servers for web scraping, lead generation, and compliance screening.

DEV Community