Google Maps Places API Costs $32/1000 Results. The Scraper Approach Costs $2.

#webscraping #python #googlemaps #dataengineering

Scraping Google Maps: When APIs Get Pricey and Playwright Saves the Day

As a senior Python developer who's built over 35 web scrapers, I've tangled with Google Maps more times than I care to count. It's a goldmine for location data—business names, addresses, ratings, you name it—but Google's ecosystem is a fortress. Their Places API is powerful, but the costs can spiral out of control for large-scale scraping. That's when you turn to browser automation tools like Playwright to scrape directly from the web interface. In this tutorial, I'll walk you through the realities of scraping Google Maps, including a breakdown of API costs, Python code for handling dynamic loading via viewport scrolling, and a JSON schema for structured output. I'll be honest: this isn't foolproof. Google aggressively fights scrapers with CAPTCHAs, rate limits, and UI changes, so expect breakage and maintenance headaches.

We'll focus on scraping search results for places, like querying "coffee shops in New York" and extracting details. This approach mimics human browsing to load more results without hitting API paywalls. But a disclaimer upfront: scraping Google violates their terms of service, and it can lead to IP bans or legal issues. Use proxies, respect robots.txt (even if it's ignored here), and consider ethical implications. If your needs are small-scale, stick to the API. Let's dive in.

The Cost Math Behind Google's Places API: Why Scraping Becomes Tempting

Google's Places API is the official way to get location data. It offers endpoints like Text Search for querying places and Place Details for fetching specifics. But it's billed per request, and costs add up fast.

Here's the pricing as of my last check (always verify on Google's Cloud Console, as it changes):

Text Search: $32 per 1,000 requests (up to 20 results per request).
Place Details: $17 per 1,000 requests (basic fields) or $20+ for contact/atmosphere details.
Free tier: 200 USD credit monthly, but that's ~6,250 Text Search requests if you're only using that.

Let's do some math for a real scenario. Suppose you want to scrape 10,000 coffee shops across a city. A single Text Search might return 20 results, so you'd need about 500 requests to cover them (10,000 / 20). That's $16 at $32/1,000. But wait—you often need pagination via next_page_token, which counts as additional requests. And for each place, you might call Place Details to get phone numbers or hours, adding another 10,000 requests at $17/1,000 = $170.

Total? Around $186 for one run, not including overages or premium fields. Scale to 100,000 places monthly, and you're looking at $1,860+ before taxes. If your project involves ongoing monitoring (e.g., tracking rating changes), multiply by 12 for yearly costs—over $22,000. That's enterprise-level budgeting.

The hard part? API limits: 100 requests per second, and quotas can cap you at 100,000 daily without negotiation. Plus, results are filtered and not always comprehensive—Google might omit places for business reasons. Scraping the web interface bypasses this, giving you raw, unfiltered data for free (minus infrastructure costs like proxies). But it's brittle: one UI tweak from Google, and your scraper breaks. I've lost days debugging after a Maps redesign.

If your volume is under 1,000 queries monthly, the API is a no-brainer for reliability. Beyond that, scraping might be worth the risk—thresholds vary, but we'll revisit this at the end.

Setting Up Playwright for Google Maps Scraping

To scrape Google Maps, we'll use Playwright, a Python library for browser automation. It's headless Chromium under the hood, perfect for handling JavaScript-heavy sites like Maps. Install it with pip install playwright and run playwright install for browsers.

Our goal: Navigate to Google Maps, search for a term, scroll the results pane to load more dynamically, and extract data. Maps lazy-loads results as you scroll, so we need to simulate viewport scrolling.

First code example: Basic setup to launch a browser, search, and grab initial results. This uses async mode for efficiency.

import asyncio
from playwright.async_api import async_playwright

async def scrape_google_maps(query):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        # Navigate to Google Maps
        await page.goto("https://www.google.com/maps")

        # Handle cookie consent if it appears (selectors can change)
        try:
            await page.click('button[aria-label="Accept all"]', timeout=5000)
        except:
            pass

        # Search for the query
        await page.fill('input[aria-label="Search Google Maps"]', query)
        await page.press('input[aria-label="Search Google Maps"]', 'Enter')

        # Wait for results to load
        await page.wait_for_selector('div[role="feed"]', timeout=10000)

        # Extract initial place data (simplified; we'll expand later)
        places = await page.query_selector_all('div[role="article"]')
        for place in places:
            name = await place.query_selector('h3')  # Adjust selector as needed
            if name:
                print(await name.inner_text())

        await browser.close()

# Run it
asyncio.run(scrape_google_maps("coffee shops in New York"))

This gets you started, printing names from the first viewport. But it's incomplete—Maps loads ~20-60 results initially, and we need more. Selectors are fragile; Google uses obfuscated classes like hfpxzc for elements, which change often. The hard part here is reliability: CAPTCHAs pop up after a few runs, so integrate proxies (e.g., via browser = await p.chromium.launch(proxy={'server': 'http://yourproxy:port'})). Also, headless mode can trigger detection; switch to non-headless for testing but it's slower.

Handling Dynamic Loading: Playwright Viewport Scrolling

The real challenge with Google Maps is infinite scrolling in the results pane. Results load as you scroll down, fetching via XHR requests. To mimic this, we scroll the viewport programmatically until no more results appear.

Second code example: Extend the first with scrolling logic. We target the scrollable div (div[role="feed"]), scroll it in increments, and check for new content. This can load 100+ results per search, depending on the query.

import asyncio
from playwright.async_api import async_playwright
import time

async def scrape_google_maps_with_scroll(query, max_results=100):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        await page.goto("https://www.google.com/maps")
        try:
            await page.click('button[aria-label="Accept all"]', timeout=5000)
        except:
            pass

        await page.fill('input[aria-label="Search Google Maps"]', query)
        await page.press('input[aria-label="Search Google Maps"]', 'Enter')
        await page.wait_for_selector('div[role="feed"]', timeout=10000)

        # Find the scrollable feed
        feed = await page.query_selector('div[role="feed"]')
        if not feed:
            print("No results feed found")
            await browser.close()
            return []

        results = []
        last_height = 0
        while len(results) < max_results:
            # Scroll down by a fixed amount (e.g., 1000 pixels)
            await page.evaluate('''
                (feed) => {
                    feed.scrollBy(0, 1000);
                }
            ''', feed)

            # Wait for new content to load
            time.sleep(2)  # Adjust based on network; use wait_for_selector for better
            current_height = await page.evaluate('(feed) => feed.scrollHeight', feed)

            # Break if no more scrolling (end of results)
            if current_height == last_height:
                break
            last_height = current_height

            # Extract places (simplified parsing)
            places = await page.query_selector_all('div[role="article"]')
            for place in places:
                # Parse details here; add to results if not already
                pass  # Implement extraction logic

        await browser.close()
        return results

# Run it
results = asyncio.run(scrape_google_maps_with_scroll("coffee shops in New York"))
print(f"Loaded {len(results)} results")

This scrolls in 1000-pixel chunks, pausing to let JS load new items. The honest hard part: Timing is tricky—too fast, and you miss loads; too slow, and it's inefficient. Google's anti-bot measures might throttle or block you mid-scroll. Extraction isn't shown fully here (to keep it concise), but you'd use await place.query_selector for elements like name (span.fontHeadlineSmall), address (span.fontBodyMedium), etc. Parse carefully—hours might be in a nested div, and reviews in attributes.

For robustness, monitor scroll height deltas and use wait_for_load_state instead of time.sleep. If you hit a wall (literally, the "You've reached the end" message), detect it via text content.

Structuring Your Output: A JSON Schema for Place Data

To make your scraped data usable, output it in a consistent JSON format. Here's a simple JSON schema for the key fields we mentioned. Use libraries like jsonschema to validate.

{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "type": "object",
  "properties": {
    "name": {"type": "string"},
    "address": {"type": "string"},
    "phone": {"type": "string"},
    "rating": {"type": "number"},
    "reviews_count": {"type": "integer"},
    "website": {"type": "string"},
    "category": {"type": "string"},
    "hours": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "day": {"type": "string"},
          "open": {"type": "string"},
          "close": {"type": "string"}
        },
        "required": ["day", "open", "close"]
      }
    }
  },
  "required": ["name", "address"]
}

In your code, build a list of dicts matching this schema. For example, parse hours as an array of day-time objects. Missing fields? Set to null—real data is messy, and not every place has a phone or website.

The hard part with schemas: Google Maps data isn't uniform. Ratings might be absent for new places, and hours can be seasonal. Your parser must handle exceptions without crashing.

Scaling and Alternatives

For production, wrap this in a loop for multiple queries, rotate proxies, and store in a database. But maintenance is a grind—I've rebuilt Maps scrapers thrice due to updates. If you want a turnkey solution without the hassle, check out this Apify actor I've used: https://apify.com/lanky_quantifier/google-maps-scraper (it's clocked 42 runs and handles much of the scrolling/anti-bot logic).

In summary, scraping Google Maps with Playwright is a cost-effective API alternative for high-volume needs, but it's fraught with instability. Weigh the trade-offs carefully.

Your threshold for API vs scraping on Google products?

(Word count: 1187)