DEV Community

agenthustler
agenthustler

Posted on

Google Shopping Scraping: Extract Product Listings, Prices and Reviews

In the competitive world of e-commerce, pricing intelligence can make or break your business. Google Shopping aggregates product listings from thousands of merchants, making it one of the richest sources of product data on the web. Whether you're monitoring competitor prices, building a price comparison tool, or conducting market research, extracting data from Google Shopping gives you a massive competitive advantage.

In this comprehensive guide, we'll walk through the structure of Google Shopping, how to extract product listings, prices, reviews, merchant data, and Product Listing Ads (PLAs) — and how to do it all at scale using Apify.

Understanding Google Shopping's Data Structure

Google Shopping (shopping.google.com) is a product search engine that displays items from merchants who list through Google Merchant Center. The data is organized into several key layers:

Product Search Results

When you search for a product on Google Shopping, you get a results page containing:

  • Product cards — Each card shows a thumbnail, title, price, store name, rating, and shipping info
  • Filter sidebar — Brand, price range, condition (new/used), seller ratings
  • Sponsored listings (PLAs) — Paid product ads that appear at the top
  • Organic listings — Non-paid results sorted by relevance

Product Detail Pages

Clicking into a product reveals:

  • Full product description and specifications
  • Price comparisons across multiple merchants
  • Aggregated review scores and individual reviews
  • Shipping costs and delivery estimates per seller
  • Product images and sometimes video
  • Related/similar products

Merchant Data

Each listing links back to a merchant with:

  • Store name and URL
  • Seller rating (1-5 stars)
  • Number of reviews
  • Return policy indicators
  • Shipping speed and cost

Understanding this structure is essential because it determines what data you can extract and how you should structure your scraping pipeline.

Why Scrape Google Shopping?

Before diving into the technical implementation, let's understand the use cases:

Price Monitoring: Track how competitors price products across hundreds or thousands of SKUs. Get alerts when prices drop or rise significantly.

Market Research: Understand which products are trending, what price ranges dominate a category, and which merchants are most active.

Product Catalog Enrichment: Pull product descriptions, images, and specifications to enrich your own product database.

Review Aggregation: Collect review data to understand consumer sentiment, common complaints, and product strengths across categories.

Advertising Intelligence: Monitor Product Listing Ads to understand competitors' ad strategies, which keywords they target, and how they position their products.

Setting Up Your Google Shopping Scraper

Method 1: Direct HTTP Requests with Cheerio

The simplest approach starts with making requests to Google Shopping search URLs:

const Apify = require('apify');
const cheerio = require('cheerio');

Apify.main(async () => {
    const input = await Apify.getInput();
    const { searchQueries, maxResults = 100 } = input;

    const requestList = await Apify.openRequestList('google-shopping',
        searchQueries.map(query => ({
            url: `https://www.google.com/search?q=${encodeURIComponent(query)}&tbm=shop`,
            userData: { query }
        }))
    );

    const dataset = await Apify.openDataset();

    const crawler = new Apify.CheerioCrawler({
        requestList,
        maxRequestsPerCrawl: maxResults,
        handlePageFunction: async ({ request, $ }) => {
            const products = [];

            // Parse product cards from search results
            $('.sh-dgr__grid-result').each((index, element) => {
                const $el = $(element);

                const product = {
                    title: $el.find('h3').text().trim(),
                    price: $el.find('.a8Pemb').text().trim(),
                    merchant: $el.find('.aULzUe').text().trim(),
                    rating: $el.find('.Rsc7Yb').text().trim(),
                    reviewCount: $el.find('.NzUzee span').text().trim(),
                    imageUrl: $el.find('img').attr('src'),
                    productUrl: $el.find('a').attr('href'),
                    searchQuery: request.userData.query,
                    scrapedAt: new Date().toISOString()
                };

                products.push(product);
            });

            await dataset.pushData(products);
            console.log(`Extracted ${products.length} products for: ${request.userData.query}`);
        },
        proxyConfiguration: await Apify.createProxyConfiguration({
            groups: ['GOOGLE_SERP']
        })
    });

    await crawler.run();
});
Enter fullscreen mode Exit fullscreen mode

Method 2: Using Playwright for Dynamic Content

Google Shopping uses heavy JavaScript rendering. For more reliable extraction, use a browser-based approach:

const Apify = require('apify');

Apify.main(async () => {
    const input = await Apify.getInput();
    const { searchQueries, maxPages = 3, extractReviews = false } = input;

    const requestList = await Apify.openRequestList('google-shopping-pw',
        searchQueries.map(query => ({
            url: `https://shopping.google.com/search?q=${encodeURIComponent(query)}`,
            userData: { query, page: 1 }
        }))
    );

    const crawler = new Apify.PlaywrightCrawler({
        requestList,
        launchContext: {
            launchOptions: { headless: true }
        },
        handlePageFunction: async ({ request, page }) => {
            // Wait for product grid to load
            await page.waitForSelector('[data-sh-gr]', { timeout: 15000 });

            // Scroll down to load more results
            for (let i = 0; i < 5; i++) {
                await page.evaluate(() => window.scrollBy(0, 800));
                await page.waitForTimeout(1000);
            }

            // Extract product data
            const products = await page.evaluate(() => {
                const items = [];
                document.querySelectorAll('.sh-dgr__content').forEach(el => {
                    items.push({
                        title: el.querySelector('h3')?.textContent?.trim() || '',
                        price: el.querySelector('[data-sh-or="price"]')?.textContent?.trim() || '',
                        merchant: el.querySelector('.E5ocAb')?.textContent?.trim() || '',
                        rating: el.querySelector('.Rsc7Yb')?.textContent?.trim() || '',
                        shipping: el.querySelector('.vEjMR')?.textContent?.trim() || '',
                        thumbnail: el.querySelector('img')?.src || '',
                        link: el.querySelector('a')?.href || ''
                    });
                });
                return items;
            });

            // Add metadata
            const enrichedProducts = products.map(p => ({
                ...p,
                searchQuery: request.userData.query,
                pageNumber: request.userData.page,
                scrapedAt: new Date().toISOString()
            }));

            await Apify.pushData(enrichedProducts);

            // Handle pagination
            if (request.userData.page < maxPages) {
                const nextButton = await page.$('[aria-label="Next page"]');
                if (nextButton) {
                    const nextUrl = await nextButton.evaluate(el => el.href);
                    await crawler.addRequests([{
                        url: nextUrl,
                        userData: {
                            query: request.userData.query,
                            page: request.userData.page + 1
                        }
                    }]);
                }
            }
        },
        proxyConfiguration: await Apify.createProxyConfiguration({
            groups: ['GOOGLE_SERP']
        })
    });

    await crawler.run();
});
Enter fullscreen mode Exit fullscreen mode

Extracting Price Comparisons Across Merchants

One of Google Shopping's most valuable features is price comparison. When you click on a product, you see prices from multiple merchants. Here's how to extract that data:

async function extractPriceComparisons(page, productUrl) {
    await page.goto(productUrl, { waitUntil: 'networkidle' });

    // Wait for merchant comparison table
    await page.waitForSelector('.sh-osd__offer-row', { timeout: 10000 });

    const offers = await page.evaluate(() => {
        const results = [];
        document.querySelectorAll('.sh-osd__offer-row').forEach(row => {
            results.push({
                merchant: row.querySelector('.kPMwsc')?.textContent?.trim(),
                merchantRating: row.querySelector('.uYNZm')?.textContent?.trim(),
                totalPrice: row.querySelector('.drzWO')?.textContent?.trim(),
                basePrice: row.querySelector('.g9WBQb')?.textContent?.trim(),
                shipping: row.querySelector('.SuRhMd')?.textContent?.trim(),
                tax: row.querySelector('.NAPsLd')?.textContent?.trim(),
                availability: row.querySelector('.mEfnBd')?.textContent?.trim(),
                merchantUrl: row.querySelector('a.b5ycib')?.href
            });
        });
        return results;
    });

    return offers;
}
Enter fullscreen mode Exit fullscreen mode

This gives you a complete picture of pricing across all sellers for a specific product, including base price, shipping, tax, and total cost.

Extracting Review Data

Reviews on Google Shopping are aggregated from multiple sources. Here's how to capture them:

async function extractReviews(page, productUrl) {
    await page.goto(productUrl);

    // Click on the reviews tab
    const reviewTab = await page.$('[data-tab="reviews"]');
    if (reviewTab) {
        await reviewTab.click();
        await page.waitForTimeout(2000);
    }

    const reviewData = await page.evaluate(() => {
        // Aggregate review summary
        const summary = {
            averageRating: document.querySelector('.uYNZm')?.textContent?.trim(),
            totalReviews: document.querySelector('.HiT7Id')?.textContent?.trim(),
            ratingDistribution: {}
        };

        // Rating breakdown (5-star, 4-star, etc.)
        document.querySelectorAll('.JuVwI').forEach(bar => {
            const stars = bar.querySelector('.hTj2Xb')?.textContent?.trim();
            const count = bar.querySelector('.z1HJj')?.textContent?.trim();
            if (stars && count) {
                summary.ratingDistribution[stars] = count;
            }
        });

        // Individual reviews
        const reviews = [];
        document.querySelectorAll('.z6XoBf').forEach(review => {
            reviews.push({
                author: review.querySelector('.sPPcBf')?.textContent?.trim(),
                rating: review.querySelector('.UzThIf')?.getAttribute('aria-label'),
                date: review.querySelector('.ff3bE')?.textContent?.trim(),
                text: review.querySelector('.g1lvWe')?.textContent?.trim(),
                source: review.querySelector('.sPPcBf .HiT7Id')?.textContent?.trim(),
                helpful: review.querySelector('.XlPrCd')?.textContent?.trim()
            });
        });

        return { summary, reviews };
    });

    return reviewData;
}
Enter fullscreen mode Exit fullscreen mode

Extracting Product Listing Ads (PLAs)

Product Listing Ads are the sponsored results at the top of Google Shopping. They're valuable for competitive intelligence:

async function extractPLAs(page, searchQuery) {
    const url = `https://www.google.com/search?q=${encodeURIComponent(searchQuery)}&tbm=shop`;
    await page.goto(url, { waitUntil: 'networkidle' });

    const plas = await page.evaluate(() => {
        const ads = [];
        // PLAs appear in the commercial unit at top
        document.querySelectorAll('.mnr-c.pla-unit').forEach(ad => {
            ads.push({
                isSponsored: true,
                title: ad.querySelector('.pymv4e')?.textContent?.trim(),
                price: ad.querySelector('.e10twf')?.textContent?.trim(),
                merchant: ad.querySelector('.LbUacb')?.textContent?.trim(),
                rating: ad.querySelector('.Rsc7Yb')?.textContent?.trim(),
                specialOffer: ad.querySelector('.VRpiue')?.textContent?.trim(),
                imageUrl: ad.querySelector('img')?.src,
                destinationUrl: ad.querySelector('a')?.href
            });
        });
        return ads;
    });

    return plas.map(pla => ({
        ...pla,
        searchQuery,
        extractedAt: new Date().toISOString()
    }));
}
Enter fullscreen mode Exit fullscreen mode

Building a Complete Price Intelligence Pipeline

Now let's put it all together into a production-ready Apify actor that handles search, price comparison, and review extraction:

const Apify = require('apify');

Apify.main(async () => {
    const input = await Apify.getInput();
    const {
        queries = [],
        maxProductsPerQuery = 50,
        extractPriceComparisons = true,
        extractProductReviews = false,
        outputFormat = 'json'
    } = input;

    const proxyConfig = await Apify.createProxyConfiguration({
        groups: ['GOOGLE_SERP']
    });

    const router = Apify.utils.router.createRouter();

    // Handle search result pages
    router.addHandler('SEARCH', async ({ request, page }) => {
        await page.waitForSelector('.sh-dgr__grid-result', { timeout: 15000 });

        const productLinks = await page.evaluate(() => {
            return Array.from(document.querySelectorAll('.sh-dgr__grid-result a'))
                .map(a => a.href)
                .filter(href => href.includes('/shopping/product/'));
        });

        // Queue product detail pages
        const detailRequests = productLinks
            .slice(0, maxProductsPerQuery)
            .map(url => ({
                url,
                userData: {
                    label: 'PRODUCT_DETAIL',
                    query: request.userData.query
                }
            }));

        await crawler.addRequests(detailRequests);
    });

    // Handle product detail pages
    router.addHandler('PRODUCT_DETAIL', async ({ request, page }) => {
        await page.waitForSelector('.sh-pr__product-results', { timeout: 10000 });

        const productData = await page.evaluate(() => ({
            title: document.querySelector('.BvQan')?.textContent?.trim(),
            description: document.querySelector('.sh-ds__trunc-txt')?.textContent?.trim(),
            averagePrice: document.querySelector('.MLYgAb')?.textContent?.trim(),
            priceRange: document.querySelector('.qYzNBe')?.textContent?.trim(),
            mainImage: document.querySelector('.sh-div__image img')?.src,
            specifications: Array.from(document.querySelectorAll('.sh-pr__spec-value'))
                .map(el => el.textContent?.trim())
        }));

        // Extract price comparisons if enabled
        let priceComparisons = [];
        if (extractPriceComparisons) {
            priceComparisons = await page.evaluate(() => {
                return Array.from(document.querySelectorAll('.sh-osd__offer-row'))
                    .map(row => ({
                        merchant: row.querySelector('.kPMwsc')?.textContent?.trim(),
                        price: row.querySelector('.drzWO')?.textContent?.trim(),
                        shipping: row.querySelector('.SuRhMd')?.textContent?.trim()
                    }));
            });
        }

        await Apify.pushData({
            ...productData,
            priceComparisons,
            searchQuery: request.userData.query,
            url: request.url,
            scrapedAt: new Date().toISOString()
        });
    });

    const crawler = new Apify.PlaywrightCrawler({
        requestHandler: router,
        proxyConfiguration: proxyConfig,
        maxConcurrency: 5,
        requestHandlerTimeoutSecs: 60
    });

    // Create initial search requests
    const searchRequests = queries.map(query => ({
        url: `https://shopping.google.com/search?q=${encodeURIComponent(query)}`,
        userData: { label: 'SEARCH', query }
    }));

    await crawler.addRequests(searchRequests);
    await crawler.run();
});
Enter fullscreen mode Exit fullscreen mode

Using Pre-Built Apify Actors for Google Shopping

If you want to skip building from scratch, Apify Store has ready-made actors for Google Shopping scraping. These handle proxy rotation, CAPTCHA solving, and data parsing out of the box.

To use a Google Shopping actor from the Apify Store:

const Apify = require('apify');

const client = Apify.newClient({ token: 'YOUR_APIFY_TOKEN' });

// Run a Google Shopping scraper actor
const run = await client.actor('apify/google-shopping-scraper').call({
    queries: ['wireless headphones', 'bluetooth speaker'],
    maxResults: 100,
    countryCode: 'us',
    languageCode: 'en',
    includeReviews: true
});

// Fetch results
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Extracted ${items.length} products`);
Enter fullscreen mode Exit fullscreen mode

This approach saves significant development time and handles edge cases like rate limiting, IP rotation, and dynamic page rendering that would take weeks to build from scratch.

Handling Common Challenges

Rate Limiting and CAPTCHAs

Google aggressively rate-limits automated requests. Key strategies:

  • Use residential proxies: Datacenter IPs get blocked quickly. Apify's proxy pool includes residential IPs that are much harder to detect.
  • Implement delays: Add random delays between requests (2-5 seconds).
  • Rotate user agents: Change browser fingerprints between requests.
  • Use session management: Maintain cookies and session state to appear more human-like.

Dynamic Content Loading

Google Shopping uses infinite scroll and lazy loading. Make sure you:

  • Scroll the page programmatically before extracting data
  • Wait for specific selectors rather than using fixed timeouts
  • Handle "Load more" buttons when present

Data Normalization

Prices come in different formats ($19.99, $19.99 - $24.99, "From $19.99"). Build a normalization layer:

function normalizePrice(priceStr) {
    if (!priceStr) return null;
    const cleaned = priceStr.replace(/[^0-9.,]/g, '');
    const parts = cleaned.split(/[-–]/);
    return {
        min: parseFloat(parts[0]?.replace(',', '')) || null,
        max: parts[1] ? parseFloat(parts[1]?.replace(',', '')) : null,
        raw: priceStr
    };
}
Enter fullscreen mode Exit fullscreen mode

Best Practices for Production Scraping

  1. Respect robots.txt: Always check and respect the site's crawling policies
  2. Cache aggressively: Don't re-scrape data that hasn't changed
  3. Monitor your success rate: Track how many requests succeed vs fail
  4. Use webhooks: Set up notifications for when scraping jobs complete
  5. Export data in multiple formats: JSON, CSV, and Excel for different stakeholders
  6. Schedule regular runs: Set up recurring scraping jobs for ongoing price monitoring

Conclusion

Google Shopping scraping opens up powerful possibilities for price intelligence, competitive analysis, and market research. Whether you build custom scrapers or use pre-built Apify actors, the key is building a reliable pipeline that handles Google's anti-scraping measures while extracting clean, structured data.

The combination of Apify's infrastructure (proxies, browser automation, scheduling) with the techniques covered in this guide gives you everything you need to build a production-grade Google Shopping data pipeline. Start with a focused use case — like monitoring prices for 100 SKUs in your niche — and scale from there as you validate the business value of the data.


Ready to start scraping Google Shopping? Check out the Apify Store for ready-to-use scrapers, or build your own using the Apify SDK.

Top comments (0)