NexGenData

Posted on Jul 2 • Originally published at thenextgennexus.com

How to Build an Automated Lead Generation Pipeline with AI

#webscraping #marketing #ai #automation

Every sales team needs leads. Every marketing agency needs prospects. Every recruiter needs contacts. But most lead generation tools charge $200+/month for stale data from last year.

What if you could build a pipeline that pulls fresh, verified leads on demand — for a fraction of the cost? In this tutorial, I’ll show you how to build an automated lead generation pipeline using three AI-powered tools that work together: a Google Maps scraper for discovery, an AI agent for enrichment, and an email validator for verification.

Table of Contents

Toggle

The Three-Stage Pipeline

Our pipeline follows a simple flow: Discover →Enrich → Validate. Each stage has a dedicated tool, and the whole thing runs on Apify’s cloud infrastructure so you don’t need to manage servers.

Stage 1: Discover Leads with Google Maps

The Google Maps Lead Generator extracts business profiles from Google Maps. Search any category + location and get structured data: business name, address, phone, website, email, rating, review count.


    import requests

    APIFY_TOKEN = "your_apify_token"

    # Search for dentists in Austin, TX
    run_input = {
        "searchTerms": ["dentists in Austin TX"],
        "maxResults": 100,
        "includeEmails": True
    }

    # Start the actor run
    response = requests.post(
        f"https://api.apify.com/v2/acts/nexgendata~google-maps-scraper/runs",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
        json=run_input,
        params={"waitForFinish": 120}
    )

    run_data = response.json()["data"]
    dataset_id = run_data["defaultDatasetId"]

    # Get results
    items = requests.get(
        f"https://api.apify.com/v2/datasets/{dataset_id}/items",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"}
    ).json()

    print(f"Found {len(items)} businesses")
    for item in items[:3]:
        print(f"  {item.get('name')} | {item.get('phone')} | {item.get('email')}")

This gives you raw business data — names, addresses, phone numbers, websites, and in many cases email addresses. But raw data isn’t enough. You need qualified leads.

Stage 2: Enrich with AI

The Lead Generation AI Agent takes your raw Google Maps data and enriches it. It identifies decision-makers, qualifies businesses based on your criteria, and adds context that helps your outreach convert.


    # Enrich the leads we found
    enrich_input = {
        "leads": items,  # Pass Google Maps results
        "qualificationCriteria": {
            "minRating": 3.5,
            "minReviews": 10,
            "hasWebsite": True
        }
    }

    enrich_response = requests.post(
        f"https://api.apify.com/v2/acts/nexgendata~lead-gen-ai-agent/runs",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
        json=enrich_input,
        params={"waitForFinish": 180}
    )

    enriched_data = enrich_response.json()["data"]
    enriched_dataset = enriched_data["defaultDatasetId"]

    enriched_leads = requests.get(
        f"https://api.apify.com/v2/datasets/{enriched_dataset}/items",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"}
    ).json()

    print(f"Qualified leads: {len(enriched_leads)}")

Stage 3: Validate Emails

Before you send a single email, validate every address. The Email Validator checks MX records, SMTP connectivity, and flags disposable addresses. This protects your sender reputation and keeps your deliverability rate above 95%.


    # Extract emails from enriched leads
    emails = [lead.get("email") for lead in enriched_leads if lead.get("email")]

    # Validate all emails
    validate_input = {
        "emails": emails
    }

    validate_response = requests.post(
        f"https://api.apify.com/v2/acts/nexgendata~email-validator/runs",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
        json=validate_input,
        params={"waitForFinish": 120}
    )

    validated = validate_response.json()["data"]
    validated_dataset = validated["defaultDatasetId"]

    results = requests.get(
        f"https://api.apify.com/v2/datasets/{validated_dataset}/items",
        headers={"Authorization": f"Bearer {APIFY_TOKEN}"}
    ).json()

    valid_emails = [r for r in results if r.get("isValid")]
    print(f"Valid emails: {len(valid_emails)} / {len(emails)}")

Putting It All Together

Here’s the complete pipeline as a single script. Run it for any business category in any city:


    import requests
    import csv
    import time

    APIFY_TOKEN = "your_apify_token"
    BASE_URL = "https://api.apify.com/v2"

    def run_actor(actor_name, run_input, timeout=180):
        resp = requests.post(
            f"{BASE_URL}/acts/nexgendata~{actor_name}/runs",
            headers={"Authorization": f"Bearer {APIFY_TOKEN}"},
            json=run_input,
            params={"waitForFinish": timeout}
        )
        dataset_id = resp.json()["data"]["defaultDatasetId"]
        return requests.get(
            f"{BASE_URL}/datasets/{dataset_id}/items",
            headers={"Authorization": f"Bearer {APIFY_TOKEN}"}
        ).json()

    # Step 1: Discover
    print("Stage 1: Discovering leads...")
    leads = run_actor("google-maps-scraper", {
        "searchTerms": ["plumbers in Chicago IL"],
        "maxResults": 200,
        "includeEmails": True
    })
    print(f"  Found {len(leads)} businesses")

    # Step 2: Enrich
    print("Stage 2: Enriching leads...")
    enriched = run_actor("lead-gen-ai-agent", {
        "leads": leads,
        "qualificationCriteria": {"minRating": 3.5, "minReviews": 5}
    })
    print(f"  Qualified: {len(enriched)} leads")

    # Step 3: Validate
    emails = [l.get("email") for l in enriched if l.get("email")]
    print(f"Stage 3: Validating {len(emails)} emails...")
    validated = run_actor("email-validator", {"emails": emails})
    valid = [v for v in validated if v.get("isValid")]
    print(f"  Valid: {len(valid)} emails")

    # Export to CSV
    with open("leads.csv", "w", newline="") as f:
        writer = csv.DictWriter(f, fieldnames=["name","email","phone","address","rating","reviews"])
        writer.writeheader()
        for lead in enriched:
            if lead.get("email") in [v.get("email") for v in valid]:
                writer.writerow({
                    "name": lead.get("name"),
                    "email": lead.get("email"),
                    "phone": lead.get("phone"),
                    "address": lead.get("address"),
                    "rating": lead.get("rating"),
                    "reviews": lead.get("reviews")
                })

    print(f"Exported to leads.csv")

Cost Comparison

Traditional lead gen tools charge per seat, per month, for data that might be months old:

ZoomInfo: $14,995+/year
Apollo.io: $49-119/month per seat
Lusha: $29-51/month per seat

This pipeline? Pay per lead, on demand. No seats, no contracts, no stale data. A typical run of 200 leads costs roughly $2-5 on Apify’s platform. That’s 1,000 verified leads for under $25.

AI Agent Integration

Want to connect this pipeline directly to Claude, GPT, or any AI agent? Use our Google Maps MCP Server — it exposes all three tools as MCP endpoints your agent can call directly.

📦 Lead Gen Toolkit — $39

Get all three actors as a bundle with sample datasets, Python scripts, and setup guides. Everything you need to build your own lead generation pipeline.

Get the Toolkit →

Next Steps

Start with a small test: pick one business category, one city, and run 50 leads through the pipeline. Once you see the quality, scale up. The pipeline handles thousands of leads per run — the same code works whether you need 50 or 5,000.

Need help setting up? Check out our other data tools and tutorials.

About the Author

The Next Gen Nexus covers AI agents, automation, and web data — practical guides for developers, analysts, and businesses working with data at scale.

DEV Community