DEV Community

Artur Chukhrai for SerpApi

Posted on • Edited on • Originally published at serpapi.com

How to Scrape Home Depot Product Data with SerpApi

Intro

In this blog post, we'll go through the process of extracting product data from The Home Depot using The Home Depot Product API and the Python programming language.

In order to successfully extract The Home Depot Product results, you will need to pass the product_id parameter, this parameter is responsible for a specific product. You can extract this parameter from search results. Have a look at the Integrate The Home Depot Search Page Results Data with SerpApi and Python blog post, in which I described in detail how to extract all the needed data.

You can look at the complete code in the online IDE (Replit).

If you prefer video format, we have a dedicated video that shows how to do that: The Home Depot Product API - SerpApi.

What will be scraped

wwbs-the-home-depot-product

Why using API?

There're a couple of reasons that may use API, ours in particular:

  • No need to create a parser from scratch and maintain it.
  • Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
  • Pay for proxies, and CAPTCHA solvers.
  • Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.

serpapi-status-all

Full Code

This code retrieves all the data for each of the 24 products on the 1st page:

from serpapi import GoogleSearch
import json

params = {
    'api_key': '...',           # https://serpapi.com/manage-api-key
    'engine': 'home_depot',     # SerpApi search engine 
    'q': 'coffee maker',        # query
}

search = GoogleSearch(params)   # where data extraction happens on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict

product_ids = [result['product_id'] for result in results['products']]

home_depot_products = []

for product_id in product_ids:
    product_params = {
        'api_key': '...',                   # https://serpapi.com/manage-api-key
        'engine': 'home_depot_product',     # SerpApi search engine 
        'product_id': product_id,           # HomeDepot ID of a product
    }

    product_search = GoogleSearch(product_params)
    product_results = product_search.get_dict()

    home_depot_products.append(product_results['product_results'])

print(json.dumps(home_depot_products, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

Preparation

Install library:

pip install google-search-results
Enter fullscreen mode Exit fullscreen mode

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import json
Enter fullscreen mode Exit fullscreen mode
Library Purpose
GoogleSearch to scrape and parse Google results using SerpApi web scraping library.
json to convert extracted data to a JSON object.

At the beginning of the code, you need to make the request in order to get search results. Then the product_id will be extracted from them.

The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params dictionary:

params = {
    'api_key': '...',           # https://serpapi.com/manage-api-key
    'engine': 'home_depot',     # SerpApi search engine 
    'q': 'coffee maker',        # query
}
Enter fullscreen mode Exit fullscreen mode

Then, we create a search object where the data is retrieved from the SerpApi backend. In the results dictionary we get data from JSON:

search = GoogleSearch(params)   # data extraction on the SerpApi backend
results = search.get_dict()     # JSON -> Python dict
Enter fullscreen mode Exit fullscreen mode

At the moment, the first 24 search results from 1st page are stored in the results dictionary. If you are interested in all search results with pagination, then check out the Using The Home Depot Product API from SerpApi blog post.

The product_ids list stores product_id which are extracted from each search result. These data will be needed later:

product_ids = [result['product_id'] for result in results['products']]
Enter fullscreen mode Exit fullscreen mode

Declaring the home_depot_products list where the extracted data will be added:

home_depot_products = []
Enter fullscreen mode Exit fullscreen mode

Next, you need to access each product page separately by iterating the product_ids list:

for product_id in product_ids:
    # data extraction will be here
Enter fullscreen mode Exit fullscreen mode

These parameters are defined for generating the URL about the product. If you want to pass other parameters to the URL, you can do so using the product_params dictionary:

product_params = {
    'api_key': '...',                   # https://serpapi.com/manage-api-key
    'engine': 'home_depot_product',     # SerpApi search engine 
    'product_id': product_id,           # HomeDepot ID of a product
}
Enter fullscreen mode Exit fullscreen mode
Parameters Explanation
api_key Parameter defines the SerpApi private key to use. You can find it under your account -> API key
engine Set parameter to home_depot_product to use the The Home Depot Product API engine.
product_id HomeDepot identifier of a product

πŸ“ŒNote: You can also add other API Parameters.

Then, we create a product_search object where the data is retrieved from the SerpApi backend. In the product_results dictionary we get a new package of the data in JSON format:

product_search = GoogleSearch(product_params)
product_results = product_search.get_dict()
Enter fullscreen mode Exit fullscreen mode

Adding data about the current product to the home_depot_products list:

home_depot_products.append(product_results['product_results'])
# title = product_results['product_results']['title']
# description = product_results['product_results']['description']
# rating = product_results['product_results']['rating']
# reviews = product_results['product_results']['reviews']
# price = product_results['product_results']['price']
Enter fullscreen mode Exit fullscreen mode

πŸ“ŒNote: In the comments above, I showed how to extract specific fields from the current product.

After the all data is retrieved, it is output in JSON format:

print(json.dumps(home_depot_products, indent=2, ensure_ascii=False))
Enter fullscreen mode Exit fullscreen mode

Output

[
  {
    "product_id": "206667220",
    "title": "12-Cup Programmable Stainless Steel Drip Coffee Maker with Thermal Carafe",
    "description": "Get your fix throughout the day with the BLACK+DECKER CM2035B 12-Cup Thermal Coffeemaker. The stainless steel thermal carafe is vacuum-sealed to ensure your coffee stays at the optimal drinking temperature for hours and the Perfect Pour spout does away with spills and drips. The easy-to-use digital controls include a setting for batches of 1-4 cups BLACK+DECKER and the BLACK+DECKER logo are trademarks of The Black and Decker Corporation and are used under license. Cup equals approximately 5 oz. (varies by brewing technique).",
    "link": "https://www.homedepot.com/p/BLACK-DECKER-12-Cup-Programmable-Stainless-Steel-Drip-Coffee-Maker-with-Thermal-Carafe-CM2035B/206667220",
    "upc": "050875812123",
    "model_number": "CM2035B",
    "favorite": 103,
    "rating": "3.1793",
    "reviews": "569",
    "price": 69.73,
    "highlights": [
      "Electric drip-type coffee maker for creating delectable coffee",
      "Serves up to 12 cups with ease",
      "Includes an Evenstream shower head for maximum flavor extraction",
      "Made from stainless steel for high longevity and durability",
      "Provides a flavorful pot of hot coffee"
    ],
    "brand": {
      "name": "BLACK+DECKER",
      "link": "https://www.homedepot.com/b/Appliances-Small-Kitchen-Appliances-Coffee-Makers-Drip-Coffee-Makers/BLACK-DECKER/N-5yc1vZ2fkp8ffZe7c"
    },
    "images": [
      [
        "https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_65.jpg",
        "https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_100.jpg",
        "https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_145.jpg",
        "https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_300.jpg",
        "https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_400.jpg",
        "https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_600.jpg",
        "https://images.thdstatic.com/productImages/22b7e43f-06ea-497b-9d9e-c2c4d23dbd42/svn/black-with-stainless-steel-black-decker-drip-coffee-makers-cm2035b-64_1000.jpg"
      ],
      ... other images
    ],
    "bullets": [
      "12-cup thermal carafe - the large capacity carafe is double-walled and vacuum-sealed to keep your coffee at optimal drinking temperature for hours",
      "Customizable brewing options - drink your favorite coffee every morning using features like the brew strength selector and the option for small-batch (1-4 cup) brewing that maintains all the flavor of a full brew",
      "even stream showerhead - the Evenstream Showerhead dispenses water evenly over the packed coffee, extracting maximum flavor and wasting less",
      "No-drip perfect pour spout - don't put up with annoying spills, the carafe spout is designed to prevent spills and drips while pouring",
      "Wide-mouth carafe opening-the carafe is designed with a wide opening for fast, easy cleanup with a damp towel",
      "<a href=https://www.homedepot.com/c/electronics_recycling_programs style=color:#F96302; target=_blank>Click here for more information on Electronic Recycling Programs</a>"
    ],
    "info_and_guides": [
      {
        "title": "Warranty",
        "link": "https://images.thdstatic.com/catalog/pdfImages/de/deb8e49b-76c5-4dd2-82e0-6863ebb8408c.pdf"
      },
      {
        "title": "Use and Care Manual",
        "link": "https://images.thdstatic.com/catalog/pdfImages/68/681b462d-4233-4d43-ab88-ce5dc12ff487.pdf"
      }
    ],
    "specifications": [
      {
        "key": "Details",
        "value": [
          {
            "name": "Appliance Type",
            "value": "Coffee Maker"
          },
          ... other results
        ]
      },
      {
        "key": "Warranty / Certifications",
        "value": [
          {
            "name": "Certifications and Listings",
            "value": "ETL Listed"
          },
          ... other results
        ]
      },
      {
        "key": "Dimensions",
        "value": [
          {
            "name": "Product Depth (in.)",
            "value": "8.58"
          },
          ... other results
        ]
      }
    ]
  },
  ... other products
]
Enter fullscreen mode Exit fullscreen mode

πŸ“ŒNote: Head to the playground for a live and interactive demo.

Join us on Twitter | YouTube

Add a Feature RequestπŸ’« or a Bug🐞

Top comments (0)