DEV Community

Cover image for Efficiently Extracting Business Data from Google Maps
Goscrapy
Goscrapy

Posted on

Efficiently Extracting Business Data from Google Maps

Google maps scraping

Hi everyone,

Extracting reliable business data from Google Maps is a common requirement for market research and local lead generation. However, the technical implementation is a bit difficult not because of anti-bot measures, but because of how Google structures its responses.

Instead of clean HTML or standard JSON, Google serves data in massive, obfuscated arrays that change based on your geographic center. While many developers default to heavy headless browsers to solve this, there is a much more efficient, "browserless" approach that relies on understanding the underlying API logic.

In this guide, we'll walk through how to solve these using GoScrapy in a step by step manner.

Let's get started.

💡 Tip: For this you will need to setup goscrapy. If you haven't checked out our last article on how to do it, you should—it's very simple and will get you up and running in minutes.


Step 1: Defining the Data Blueprint

The first step in to define a model for the scraped data. Google Maps provides a wealth of information, but without a clear schema, your extraction logic will quickly become unmanageable. In GoScrapy, data is modeled using a record

The Benefit:
Defining a typed structure upfront ensures that your data collection is consistent and that your downstream processing (like CSV or Database exports) remains stable even when dealing with varied business listings.

// google_maps_scraper/record.go

type Record struct {
    Query              string           `json:"query" csv:"query"`
    QueryLoc           location         `json:"-" csv:"-"`
    Status             string           `json:"status" csv:"status"`
    OpenHours          OpeningHours     `json:"opening_hours" csv:"-"`
    OpenHoursStr       string           `json:"opening_hours_str" csv:"opening_hours_str"`
    Phone              string           `json:"phone" csv:"phone"`
    WebResultsUrl      string           `json:"webresults_url" csv:"webresults_url"`
    ShortDescription   string           `json:"short_description" csv:"short_description"`
    Description        string           `json:"description" csv:"description"`
    TimeZone           string           `json:"timezone" csv:"timezone"`
    Categories         []string         `json:"category" csv:"category"`
    Title              string           `json:"title" csv:"title"`
    Website            string           `json:"website" csv:"website"`
    Review             float64          `json:"review" csv:"review"`
    ReviewDistribution string2Uint64Map `json:"review_distribution" csv:"review_distribution"`
    ReviewsUrl         string           `json:"reviews_url" csv:"reviews_url"`
    ReviewsCount       uint64           `json:"reviews_count" csv:"reviews_count"`
    Street             string           `json:"street" csv:"street"`
    City               string           `json:"city" csv:"city"`
    ZipCode            string           `json:"zipcode" csv:"zipcode"`
    State              string           `json:"state" csv:"state"`
    StateCode          string           `json:"state_code" csv:"state_code"`
    Country            string           `json:"country" csv:"country"`
    CountryCode        string           `json:"country_code" csv:"country_code"`
    Details            Details          `json:"details" csv:"details"`
    ReservationUrls    []string         `json:"reservation_urls" csv:"reservation_urls"`
    OrderUrls          []string         `json:"order_urls" csv:"order_urls"`
    OrderPlatforms     []string         `json:"order_platforms" csv:"order_platforms"`
    GoogleReserveUrl   string           `json:"google_reserve_url" csv:"google_reserve_url"`
}
Enter fullscreen mode Exit fullscreen mode

Step 2: Overcoming the "Spatial Anchor" Requirement

Most developers start by searching for a keyword, only to find the results are inconsistent. This is because the Search API requires a spatial "anchor" (Latitude and Longitude) to provide high-density results.

The Technique:
You can resolve any text query (e.g., "Gyms in Austin, TX") by first hitting a geocoding endpoint. Once you have the coordinates, you can center your search exactly where the density is highest, ensuring you don't miss any data points. As you can see, for jobs, where don't have lat/lng in advanced, we try to grab the location for the query using the parseGeoding step.

// google_maps_scraper/spider.go

func (s *Spider) StartRequest(ctx context.Context, job *Job) {
    if job.loc == nil {
        // Resolve coordinates to ensure a high-density search area
        req := prepareRequest(s.Request(ctx), generateGeocodingUrl(job.query), *job)
        s.Parse(req, s.parseGeocoding)
        return
    }

    // If we have coordinates, the search results will be far more accurate
    req := prepareRequest(s.Request(ctx), generateSearchUrl(job), *job)
    s.Parse(req, s.parseMapListing)
}


func (s *Spider) parseGeocoding(ctx context.Context, resp core.IResponseReader) {
    job, ok := getJob(resp)
    if !ok {
        return
    }

    lat, lng, _, ok := extractGeocoding(resp.Bytes())
    if !ok {
        return
    }

    job.setLocation(lat, lng)
    s.Logger().Infof("Location found for %s: %f, %f", job.query, lat, lng)
        // check util.go for details on prepareRequest
    req := prepareRequest(s.Request(ctx), generateSearchUrl("https://www.google.com", job), job)
    s.Parse(req, s.parseMapListing)
}
Enter fullscreen mode Exit fullscreen mode

Step 3: Handling Large Datasets via Cursor Offsets

Google Maps doesn't use simple "Next Page" links or page numbers. Instead, it relies on a cursor-based offset system embedded within a pb query parameter.

The Technique:
To crawl thousands of results, you need to track this offset state across your requests. Using GoScrapy's Meta context we pass this "tracking job" between asynchronous workers without the risk of data races or state corruption that global variables would introduce.

// google_maps_scraper/spider.go

func (s *Spider) parseMapListing(ctx context.Context, resp core.IResponseReader) {
    job, ok := getJob(resp)
    if !ok {
        return
    }

    if resp.StatusCode() != 200 {
        return
    }
        // Here we basically try to parse the data surgically
    records := extractMapResults(resp.Bytes())
    s.Logger().Infof("Found %d records for [%s] (At cursor %d)", len(records), job.query, job.cursor)
    if len(records) <= 0 {
        return
    }

    // update cursor for the next page
    job.SetCursor(job.cursor + 20)
    for _, record := range records {

                //  this is where we export the record to be saved by csv/json pipeline, which ever we have configured.
        s.Yield(&record)
    }

    req := prepareRequest(s.Request(ctx), generateSearchUrl("https://www.google.com", job), job)
    s.Parse(req, s.parseMapListing)
}
Enter fullscreen mode Exit fullscreen mode

Step 4: "Surgical" Extraction from Deeply Nested Arrays

Google's internal JSON is not served in standard "key-value" pairs. It is a deeply nested array structure designed for internal engine processing, not external consumption.

The Benefit:
Instead of trying to map the entire massive response into memory, you can use GJSON to perform "surgical" extractions. By reaching directly into specific array indices (like index 11 for Title).

// google_maps_scraper/utils.go
// check repo examples for full code.

func extractMapResults(data []byte) []Record {
    // After removing the non JSON prefix, we dive directly into the results array
    records := gjson.GetBytes(data, "0.1.#.14")
    // This approach allows for extracting deep values with zero-allocation speed
   // Refer to full code in repo attached. Pasting it here would create clutter
}
Enter fullscreen mode Exit fullscreen mode

Conclusion: Performance in Production

By moving away from browser engines and toward a raw HTTP approach, you can achieve significantly higher throughput with minimal resources.

A GoScrapy driven scraper compiles into a standalone 13MB binary, making it incredibly efficient to deploy in cloud environments compared to multi-hundred-megabyte headless browser images. This efficiency translates directly into lower infrastructure costs and faster data turnaround.

If you’re looking to scale your data extraction projects, you can find the full source code for this example and more in the Google map scraper.


Note: This scraper was created using **goscrapy* and for educational purposes only to showcase the capabilities of goscrapy and I am not liable for any misuse of this scraper.*

Top comments (0)