What will be scraped
Why using API?
- No need to create a parser from scratch and maintain it.
- Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation.
SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.
Full Code
If you don't need explanation, have a look at full code example in the online IDE.
from serpapi import GoogleSearch
import os, json
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_product', # SerpApi search engine
'product_id': '14019378181107046593', # product id
'offers': True, # more offers, could be also set as '1` which is the same as True
'location': 'Dallas, Texas, United States', # location
'filter': 'flocal:1', # local sellers
'hl': 'en', # language
'gl': 'us' # country of the search, US -> USA
}
search = GoogleSearch(params) # where data extraction happens on the backend
results = search.get_dict() # JSON -> Python dict
data = {}
data['product_results'] = results['product_results']
data['local_sellers'] = results['sellers_results']['online_sellers']
data['related_items'] = results['related_products']['different_brand']
print(json.dumps(data, indent=2, ensure_ascii=False))
Preparation
Install library:
pip install google-search-results
google-search-results
is a SerpApi API package.
Code Explanation
Import libraries:
from serpapi import GoogleSearch
import os, json
Library | Purpose |
---|---|
GoogleSearch |
to scrape and parse Google results using SerpApi web scraping library. |
os |
to return environment variable (SerpApi API key) value. |
json |
to convert extracted data to a JSON object. |
At the beginning of the code, parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params
dictionary:
params = {
# https://docs.python.org/3/library/os.html#os.getenv
'api_key': os.getenv('API_KEY'), # your serpapi api
'engine': 'google_product', # SerpApi search engine
'product_id': '14019378181107046593', # product id
'offers': True, # more offers
'location': 'Dallas, Texas, United States', # location
'filter': 'flocal:1', # local sellers
'hl': 'en', # language
'gl': 'us' # country of the search, US -> USA
}
Parameters | Explanation |
---|---|
api_key |
Parameter defines the SerpApi private key to use. |
engine |
Set parameter to google_product to use the Google Product API engine. |
product_id |
Parameter defines the product to get results for. Normally found from shopping results for supported products (e.g., https://www.google.com/shopping/product/{product_id} ). |
offers |
Parameter for fetching offers results. Replaces former sellers=online results. It can be set to 1 or true . |
location |
Parameter defines from where you want the search to originate. If several locations match the location requested, we'll pick the most popular one. Head to the /locations.json API if you need more precise control. location and uule parameters can't be used together. |
filter |
Parameter defines filters and sorting for offers results. The flocal:1 filter could be used to switch Nearby filter. |
hl |
Parameter defines the language to use for the Google Jobs search. It's a two-letter language code. (e.g., en for English, es for Spanish, or fr for French). Head to the Google languages page for a full list of supported Google languages. |
gl |
Parameter defines the country to use for the Google search. It's a two-letter country code. (e.g., us for the United States, uk for United Kingdom, or fr for France). Head to the Google countries page for a full list of supported Google countries. |
📌Note: After recent changes google does not provide local results and sellers
parameter was deprecated. Alternatively filter=flocal:1
could be used switching Nearby
filter.
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the results
dictionary we get data from JSON:
search = GoogleSearch(params) # where data extraction happens on the SerpApi backend
results = search.get_dict() # JSON -> Python dict
Declaring the data
dictionary where the extracted data will be added:
data = {}
Retrieving the data is quite simple, you just need to access the corresponding key:
data['product_results'] = results['product_results']
data['local_sellers'] = results['sellers_results']['online_sellers']
data['related_items'] = results['related_products']['different_brand']
After the data is retrieved, it is output in JSON format:
print(json.dumps(data, indent=2, ensure_ascii=False))
You can view playground or check the output. This way you will be able to understand what keys you can use in this JSON structure to get the data you need.
Output
{
"product_results": {
"product_id": 14019378181107046593,
"title": "SteelSeries Aerox 3 2022 Edition Wired Gaming Mouse, Onyx",
"reviews": 68,
"rating": 4.5
},
"local_sellers": [
{
"position": 1,
"name": "Best Buy",
"link": "https://www.google.com/url?q=https://www.bestbuy.com/site/steelseries-aerox-3-2022-edition-lightweight-wired-optical-gaming-mouse-onyx/6485231.p%3FskuId%3D6485231%26ref%3D212%26loc%3D1%26extStoreId%3D1412&sa=U&ved=0ahUKEwjJ__yp3MD7AhWWhIkEHdp8D6IQ2ykIJA&usg=AOvVaw0iSz1DvCsnPaEyJ8Llivf7",
"base_price": "$34.99",
"additional_price": {
"shipping": "See website"
},
"total_price": "$34.99"
},
... other sellers
],
"related_items": [
{
"title": "SteelSeries Aerox 5 Wired ...",
"link": "https://www.bestbuy.com/site/steelseries-aerox-5-lightweight-wired-optical-gaming-mouse-with-9-programmble-buttons-black/6501454.p?skuId=6501454&ref=212&loc=1&extStoreId=1027&sa=X&ved=0ahUKEwjJ__yp3MD7AhWWhIkEHdp8D6IQrhIIfA",
"price": "$47.99"
},
... other items
]
}
Links
Add a Feature Request💫 or a Bug🐞
Top comments (0)