DEV Community

NexGenData
NexGenData

Posted on • Originally published at thenextgennexus.com

How to Build a Google Maps Lead Scraper in Python (Step-by-Step)

Table of Contents

Toggle

How to Build a Google Maps Lead Scraper in Python (Step-by-Step)

Lead generation is the lifeblood of any sales-driven business. Whether you're running a local service business, real estate firm, or B2B agency, finding qualified leads quickly and efficiently can make or break your revenue. Google Maps contains millions of business listings with contact information, reviews, hours, and location dataβ€”but manually collecting this data is tedious and error-prone.

In this guide, I'll show you how to build a production-ready Google Maps lead scraper in Python that extracts business data, phone numbers, emails, and review ratings automatically. This is the exact approach I use in the Local Leads Pack, which has helped hundreds of agencies and entrepreneurs compile targeted prospect lists in hours instead of days.

Why Google Maps Data Matters for Lead Generation

Google Maps is one of the most valuable data sources for B2B and local business prospecting. Here's why:

  • Real-time business data: Business owners keep Google Maps listings current because customers see them first.
  • Contact information: Phone numbers and websites are directly available, eliminating the need for secondary lookups.
  • Review signals: Rating and review counts tell you about business health and customer satisfaction.
  • Operational data: Hours, photos, and services reveal exactly what businesses do and how they operate.
  • Competitor insights: See who's ranking for specific keywords and how they're positioned.

Instead of cold outreach to random businesses, Google Maps data lets you target high-intent prospects: recently active businesses, highly rated competitors, newly opened locations, and businesses in your geographic focus area.

The Challenge: Scraping Google Maps at Scale

Before diving into the code, let's be clear about the technical challenges:

  • JavaScript rendering: Google Maps loads results dynamically via JavaScript, so simple HTTP requests won't work.
  • Rate limiting: Aggressive scraping gets blocked quickly. You need delays and rotating proxies.
  • Data structure complexity: Business data is scattered across multiple DOM elements and requires careful parsing.
  • Geolocation sensitivity: Results vary by location, requiring proper lat/long coordinates and viewport handling.
  • Legal considerations: Always respect Google's ToS and use data ethically for legitimate business purposes.

The solution is to use a headless browser (Selenium or Playwright) combined with proper request handling, data validation, and rate limiting.

Building the Scraper: Architecture Overview

Here's the architecture we'll build:

  • Search module: Handles Google Maps search initialization and result pagination.
  • Parser module: Extracts business data from DOM elements and text.
  • Validator module: Cleans and validates extracted data (phone, email, hours).
  • Storage module: Saves results to CSV, JSON, or database.
  • Rate limiter: Implements backoff strategies to avoid blocking.

Step 1: Set Up Your Python Environment

First, install the required dependencies:


    pip install selenium webdriver-manager pandas requests lxml beautifulsoup4
Enter fullscreen mode Exit fullscreen mode

We're using Selenium for browser automation because Google Maps requires JavaScript execution. WebDriver Manager automatically downloads the correct ChromeDriver version.

Step 2: Initialize the Browser and Navigate to Google Maps


    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from webdriver_manager.chrome import ChromeDriverManager
    from selenium.webdriver.chrome.service import Service
    import time
    import pandas as pd

    class GoogleMapsLeadScraper:
        def __init__(self, search_query, location, max_results=100):
            self.search_query = search_query
            self.location = location
            self.max_results = max_results
            self.results = []
            self.driver = None

        def initialize_driver(self):
            """Set up Chrome WebDriver with options for scraping."""
            chrome_options = webdriver.ChromeOptions()
            chrome_options.add_argument('--disable-blink-features=AutomationControlled')
            chrome_options.add_argument('user-agent=Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36')
            chrome_options.add_argument('--disable-gpu')
            chrome_options.add_argument('--no-sandbox')

            service = Service(ChromeDriverManager().install())
            self.driver = webdriver.Chrome(service=service, options=chrome_options)

        def navigate_to_search(self):
            """Navigate to Google Maps and perform search."""
            search_url = f"https://www.google.com/maps/search/{self.search_query}+{self.location}"
            self.driver.get(search_url)
            time.sleep(3)  # Wait for page load

    # Example usage
    scraper = GoogleMapsLeadScraper(
        search_query="plumbers",
        location="New York",
        max_results=100
    )
    scraper.initialize_driver()
    scraper.navigate_to_search()
Enter fullscreen mode Exit fullscreen mode

Step 3: Extract Business Listings

Once the search results load, we need to extract each business listing. Google Maps renders results in a scrollable sidebar; each result is clickable and contains the business name and snippet information.


    def extract_listings(self):
        """Extract business names and URLs from search results."""
        wait = WebDriverWait(self.driver, 10)

        # Get results container
        results_container = wait.until(
            EC.presence_of_all_elements_located((By.XPATH, '//div[@role="feed"]'))
        )

        listings = []

        # Scroll through results and collect business names
        previous_height = 0
        while len(listings) < self.max_results:
            try:
                # Get all result divs
                result_divs = self.driver.find_elements(
                    By.XPATH, 
                    '//div[@data-index]//div[@class="Nv2PK THOPZb"]'
                )

                for div in result_divs:
                    if len(listings) >= self.max_results:
                        break

                    try:
                        name_element = div.find_element(By.XPATH, './/div[@role="button"]//div')
                        business_name = name_element.text

                        if business_name and business_name not in [l['name'] for l in listings]:
                            listings.append({
                                'name': business_name,
                                'element': div
                            })
                    except:
                        continue

                # Scroll to load more
                scroll_element = self.driver.find_element(By.XPATH, '//div[@role="feed"]')
                self.driver.execute_script('arguments[0].scrollTop = arguments[0].scrollHeight', scroll_element)
                time.sleep(2)

                # Check if we've scrolled to bottom
                new_height = self.driver.execute_script('return arguments[0].scrollHeight', scroll_element)
                if new_height == previous_height:
                    break
                previous_height = new_height

            except Exception as e:
                print(f"Error extracting listings: {e}")
                break

        self.results = listings[:self.max_results]
        return self.results
Enter fullscreen mode Exit fullscreen mode

Step 4: Click Into Each Listing and Extract Details

Now for each business, we click on it to open the detail panel and extract phone number, website, address, reviews, and other metadata.


    def extract_business_details(self, business_element):
        """Extract detailed information from a business listing."""
        try:
            # Click the business to open detail panel
            business_element.click()
            time.sleep(2)  # Wait for detail panel to load

            details = {'name': ''}

            # Extract business name
            try:
                name = self.driver.find_element(By.XPATH, '//h1[@class="fontHeadingLarge"]').text
                details['name'] = name
            except:
                pass

            # Extract rating and review count
            try:
                rating_element = self.driver.find_element(By.XPATH, '//div[@aria-label*="star"]')
                rating_text = rating_element.get_attribute('aria-label')
                details['rating'] = rating_text
            except:
                details['rating'] = ''

            # Extract phone number
            try:
                phone_element = self.driver.find_element(
                    By.XPATH, 
                    '//button//span[contains(text(), "+") or contains(text(), "(")]'
                )
                details['phone'] = phone_element.text
            except:
                details['phone'] = ''

            # Extract website
            try:
                website_element = self.driver.find_element(By.XPATH, '//a[@data-url][@aria-label*="website"]')
                details['website'] = website_element.get_attribute('href')
            except:
                details['website'] = ''

            # Extract address
            try:
                address_element = self.driver.find_element(
                    By.XPATH, 
                    '//button//div[contains(@class, "fontBodyMedium")]'
                )
                details['address'] = address_element.text
            except:
                details['address'] = ''

            # Extract hours
            try:
                hours_element = self.driver.find_element(
                    By.XPATH, 
                    '//div[contains(text(), "Open") or contains(text(), "Closed")]'
                )
                details['hours'] = hours_element.text
            except:
                details['hours'] = ''

            return details

        except Exception as e:
            print(f"Error extracting business details: {e}")
            return {}
Enter fullscreen mode Exit fullscreen mode

Step 5: Complete the Scraper with Data Validation and Export


    import re

    def validate_phone(phone):
        """Clean and validate phone number."""
        if not phone:
            return ''
        # Remove common formatting, keep only digits
        cleaned = re.sub(r'\D', '', phone)
        return cleaned if len(cleaned) >= 10 else ''

    def validate_url(url):
        """Check if URL is valid."""
        if not url:
            return ''
        return url if url.startswith('http') else f'https://{url}'

    def scrape_all_businesses(self):
        """Main scraping loop - extract all listings and details."""
        self.initialize_driver()
        self.navigate_to_search()
        self.extract_listings()

        business_data = []

        for idx, listing in enumerate(self.results):
            print(f"Processing {idx + 1}/{len(self.results)}...")

            try:
                details = self.extract_business_details(listing['element'])

                # Validate data
                details['phone'] = validate_phone(details.get('phone', ''))
                details['website'] = validate_url(details.get('website', ''))

                business_data.append(details)

                # Rate limiting - add delay between requests
                time.sleep(1.5)

            except Exception as e:
                print(f"Error processing business {idx}: {e}")
                continue

        self.driver.quit()
        return business_data

    def export_to_csv(self, data, filename='google_maps_leads.csv'):
        """Export collected data to CSV."""
        df = pd.DataFrame(data)
        df.to_csv(filename, index=False, encoding='utf-8')
        print(f"Exported {len(df)} leads to {filename}")
        return filename

    def export_to_json(self, data, filename='google_maps_leads.json'):
        """Export collected data to JSON."""
        import json
        with open(filename, 'w', encoding='utf-8') as f:
            json.dump(data, f, indent=2, ensure_ascii=False)
        print(f"Exported {len(data)} leads to {filename}")
        return filename

    # Full execution
    if __name__ == "__main__":
        scraper = GoogleMapsLeadScraper(
            search_query="electrical contractors",
            location="Los Angeles, CA",
            max_results=50
        )

        leads = scraper.scrape_all_businesses()

        # Export in both formats
        scraper.export_to_csv(leads)
        scraper.export_to_json(leads)

        print(f"\nScraped {len(leads)} leads successfully!")
        for lead in leads[:5]:
            print(f"\n{lead['name']}")
            print(f"  Phone: {lead['phone']}")
            print(f"  Website: {lead['website']}")
            print(f"  Rating: {lead['rating']}")
Enter fullscreen mode Exit fullscreen mode

Advanced Tips: Production-Ready Improvements

To run this scraper at scale without getting blocked, implement these best practices:

1. Implement Proxy Rotation


    # Add rotating proxies to avoid IP blocking
    proxies = [
        'http://proxy1.com:8080',
        'http://proxy2.com:8080',
        'http://proxy3.com:8080',
    ]

    import random
    proxy = random.choice(proxies)
    chrome_options.add_argument(f'--proxy-server={proxy}')
Enter fullscreen mode Exit fullscreen mode

2. Add Exponential Backoff for Rate Limiting


    import time
    from functools import wraps

    def retry_with_backoff(max_retries=3, backoff_factor=2):
        def decorator(func):
            @wraps(func)
            def wrapper(*args, **kwargs):
                for attempt in range(max_retries):
                    try:
                        return func(*args, **kwargs)
                    except Exception as e:
                        if attempt == max_retries - 1:
                            raise
                        wait_time = backoff_factor ** attempt
                        print(f"Retry in {wait_time}s...")
                        time.sleep(wait_time)
            return wrapper
        return decorator

    @retry_with_backoff(max_retries=3)
    def extract_business_details(self, element):
        # ... extraction logic ...
        pass
Enter fullscreen mode Exit fullscreen mode

3. Add Database Storage


    import sqlite3

    def save_to_database(self, data, db_name='leads.db'):
        """Save leads to SQLite database."""
        conn = sqlite3.connect(db_name)
        c = conn.cursor()

        c.execute('''CREATE TABLE IF NOT EXISTS leads
                     (id INTEGER PRIMARY KEY, name TEXT, phone TEXT, 
                      website TEXT, address TEXT, rating TEXT, hours TEXT)''')

        for lead in data:
            c.execute('''INSERT INTO leads (name, phone, website, address, rating, hours)
                         VALUES (?, ?, ?, ?, ?, ?)''',
                      (lead.get('name'), lead.get('phone'), lead.get('website'),
                       lead.get('address'), lead.get('rating'), lead.get('hours')))

        conn.commit()
        conn.close()
        print(f"Saved {len(data)} leads to {db_name}")
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

  • Local service businesses: Plumbers, electricians, HVAC contractors scraping competitors in their service area.
  • Real estate agents: Collecting contractor and service provider leads for referral partnerships.
  • B2B sales teams: Building prospect lists for specific industries (restaurants, retail, healthcare).
  • Market research: Analyzing competitor density, pricing, and reviews across geographies.
  • Lead validation: Enriching existing CRM data with current phone numbers and websites from Google Maps.

Save Time with the Local Leads Pack

Building and maintaining a Google Maps scraper requires continuous updates as Google changes its DOM structure. Instead of managing this yourself, the Local Leads Pack ($29) provides:

  • Pre-built, production-ready scraper (updated monthly)
  • Support for 50+ business categories
  • Automatic proxy rotation and rate limiting
  • CSV export with dedupe and validation
  • Video tutorials and API documentation

Get the Local Leads Pack β†’

πŸ”— Google Maps MCP Server

Connect your AI agents directly to live google maps data. Use with Claude, GPT, or any AI assistant.

View MCP Server β†’


About the Author

The Next Gen Nexus covers AI agents, automation, and web data β€” practical guides for developers, analysts, and businesses working with data at scale.

Top comments (0)