What will be scraped
πNote: In this blog post, I will show you how to scrape the Apple App Store search and get exactly the same result as on Apple iMac, because the search results on Mac are completely different from the results on PC. The screenshots below show the difference:
- Mac results:
- PC results:
Why using API?
There're a couple of reasons that may use API, ours in particular:
- No need to create a parser from scratch and maintain it.
- Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation.
SerpApi handles everything on the backend with fast response times under ~2.6 seconds (~0.6 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page.
Full Code
If you don't need an explanation, have a look at the full code example in the online IDE.
from serpapi import GoogleSearch
import json
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'apple_app_store', # SerpApi search engine
'term': 'image viewer', # search query
'device': 'desktop', # device to get the results
'country': 'us', # country for the search
'lang': 'en-us', # language for the search
'disallow_explicit': False, # disallowing explicit apps
'num': 20, # number of items per page
'page': 0, # pagination
# 'property': 'developer' # developer of an app
}
app_store_results = []
while True:
search = GoogleSearch(params) # data extraction on the SerpApi backend
new_page_results = search.get_dict() # JSON -> Python dict
app_store_results.extend(new_page_results['organic_results'])
if 'next' in new_page_results.get('serpapi_pagination', {}):
params['page'] += 1
else:
break
print(json.dumps(app_store_results, indent=2, ensure_ascii=False))
Preparation
Install library:
pip install google-search-results
google-search-results
is a SerpApi API package.
Code Explanation
Import libraries:
from serpapi import GoogleSearch
import json
Library | Purpose |
---|---|
GoogleSearch |
to scrape and parse Google results using SerpApi web scraping library. |
json |
to convert extracted data to a JSON object. |
The parameters are defined for generating the URL. If you want to pass other parameters to the URL, you can do so using the params
dictionary:
params = {
'api_key': '...', # https://serpapi.com/manage-api-key
'engine': 'apple_app_store', # SerpApi search engine
'term': 'image viewer', # search query
'device': 'desktop', # device to get the results
'country': 'us', # country for the search
'lang': 'en-us', # language for the search
'disallow_explicit': False, # disallowing explicit apps
'num': 20, # number of items per page
'page': 0, # pagination
# 'property': 'developer' # developer of an app
}
Parameters | Explanation |
---|---|
api_key |
Parameter defines the SerpApi private key to use. You can find it under your account -> API key. |
engine |
Set parameter to apple_app_store to use the App Store API engine. |
term |
Parameter defines the query you want to search. You can use any search term that you would use in a regular App Store search. |
device |
Parameter defines the device to use to get the results. It can be set to desktop to use a Mac App Store, tablet to use a iPad App Store, or mobile (default) to use a iPhone App Store. |
country |
Parameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions. |
lang |
Parameter defines the language to use for the search. It's a four-letter country code. Head to the Apple Languages for a full list of supported Apple Languages. |
disallow_explicit |
Parameter defines the filter for disallowing explicit apps. It defaults to false . |
num |
Parameter defines the number of results you want to get per each page. It defaults to 10 . Maximum number of results you can get per page is 200 . |
page |
Parameter is used to get the items on a specific page. (e.g., 0 (default) is the first page of results, 1 is the 2nd page of results, 2 is the 3rd page of results, etc.). |
property |
Parameter allows to search the property of an app. developer allows searching the developer title of an app ( e.g., property=developer and term=Coffee gives apps with "Coffee" in their developer's name. (Ex: Coffee Inc. ) |
πNote: You can also add other API Parameters.
Define the app_store_results
list to which the retrieved data will be added:
app_store_results = []
The while
loop is created that is needed to extract data from all pages:
while True:
# data extraction will be here
Then, we create a search
object where the data is retrieved from the SerpApi backend. In the new_page_results
dictionary we get data from JSON:
search = GoogleSearch(params) # data extraction on the SerpApi backend
new_page_results = search.get_dict() # JSON -> Python dict
Adding new data from this page to the app_store_results
list:
app_store_results.extend(new_page_results['organic_results'])
# title = new_page_results['organic_results'][0]['title']
# version = new_page_results['organic_results'][0]['version']
# description = new_page_results['organic_results'][0]['description']
πNote: In the comments above, I showed how to extract specific fields. You may have noticed the new_page_results['organic_results'][0]
. This is the index of a product, which means that we are extracting data from the first product. The new_page_results['organic_results'][1]
is from the second product and so on.
After data is retrieved from the current page, a check is made to see if the next page exists. If there is one in the serpapi_pagination
dictionary, then the page
parameter is incremented by 1
. Else, the loop stops.
if 'next' in new_page_results.get('serpapi_pagination', {}):
params['page'] += 1
else:
break
After the all data is retrieved, it is output in JSON format:
print(json.dumps(app_store_results, indent=2, ensure_ascii=False))
Output
[
{
"position": 1,
"id": 1507782672,
"title": "Pixea",
"bundle_id": "imagetasks.Pixea",
"version": "2.1",
"vpp_license": true,
"age_rating": "4+",
"release_note": "- New \"Fixed Size and Position\" zoom mode - Fixed a bug causing crash when browsing ZIP-files - Bug fixes and improvements",
"seller_link": "https://www.imagetasks.com",
"minimum_os_version": "10.12",
"description": "Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them. Supported formats: JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives. Export formats: JPEG, JPEG-2000, PNG, TIFF, BMP. Found a bug? Have a suggestion? Please, send it to support@imagetasks.com Follow us on Twitter @imagetasks!",
"link": "https://apps.apple.com/us/app/pixea/id1507782672?mt=12&uo=4",
"serpapi_product_link": "https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
"serpapi_reviews_link": "https://serpapi.com/search.json?country=us&engine=apple_reviews&page=1&product_id=1507782672",
"release_date": "2020-04-20 07:00:00 UTC",
"price": {
"type": "Free"
},
"rating": [
{
"type": "All Times",
"rating": 0.0,
"count": 0
}
],
"genres": [
{
"name": "Photo & Video",
"id": 6008,
"primary": true
},
{
"name": "Graphics & Design",
"id": 6027,
"primary": false
}
],
"developer": {
"name": "ImageTasks Inc",
"id": 450316587,
"link": "https://apps.apple.com/us/developer/id450316587"
},
"size_in_bytes": 7113871,
"supported_languages": [
"EN"
],
"screenshots": {
"general": [
{
"link": "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/e0/21/86/e021868d-b43b-0a78-8d4a-e4e0097a1d01/0131f1c2-3227-46bf-8328-7b147d2b1ea2_Pixea-1.jpg/800x500bb.jpg",
"size": "800x500"
},
{
"link": "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/55/3c/98/553c982d-de30-58b5-3b5a-d6b3b2b6c810/a0424c4d-4346-40e6-8cde-bc79ce690040_Pixea-2.jpg/800x500bb.jpg",
"size": "800x500"
},
{
"link": "https://is3-ssl.mzstatic.com/image/thumb/Purple123/v4/77/d7/d8/77d7d8c1-4b4c-ba4b-4dde-94bdc59dfb71/6e66509c-5886-45e9-9e96-25154a22fd53_Pixea-3.jpg/800x500bb.jpg",
"size": "800x500"
},
{
"link": "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource113/v4/44/79/91/447991e0-518f-48b3-bb7e-c7121eb57ba4/79be2791-5b93-4c4d-b4d1-38a3599c2b2d_Pixea-4.jpg/800x500bb.jpg",
"size": "800x500"
}
]
},
"logos": [
{
"size": "60x60",
"link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/60x60bb.png"
},
{
"size": "512x512",
"link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/512x512bb.png"
},
{
"size": "100x100",
"link": "https://is3-ssl.mzstatic.com/image/thumb/Purple113/v4/8a/a7/f7/8aa7f75f-620b-74d5-7958-35aa5d851582/AppIcon-0-0-85-220-0-0-0-0-4-0-0-0-2x-sRGB-0-0-0-0-0.png/100x100bb.png"
}
]
},
... other results
]
Links
- Code in the online IDE
- Apple App Store Search Scraper API
- Apple App Store Search Scraper API Playground
Add a Feature Requestπ« or a Bugπ
Top comments (0)