Artur Chukhrai for SerpApi

Posted on Feb 5, 2023 • Edited on Feb 6, 2023 • Originally published at serpapi.com

Scraping Apple App Store Product Info And Reviews with Python

#webscraping #tutorial #python #programming

What will be scraped
Why using API?
Full Code
Preparation
Code Explanation
Output
Links

What will be scraped

Why using API?

There're a couple of reasons that may use API, ours in particular:

No need to create a parser from scratch and maintain it.
Bypass blocks from Google: solve CAPTCHA or solve IP blocks.
Pay for proxies, and CAPTCHA solvers.
Don't need to use browser automation.

SerpApi handles everything on the backend with fast response times under ~2.5 seconds (~1.2 seconds with Ludicrous speed) per request and without browser automation, which becomes much faster. Response times and status rates are shown under SerpApi Status page:

Apple Product:

Apple Reviews:

Head to the Apple Product Page playground and Apple App Store Reviews playground for a live and interactive demo.

Full Code

If you don't need an explanation, have a look at the full code example in the online IDE.

from serpapi import GoogleSearch
import json


def get_product_info(product_id):
    params = {
        'api_key': '...',               # https://serpapi.com/manage-api-key
        'engine': 'apple_product',      # SerpApi search engine 
        'product_id': product_id,       # ID of a product
        'type': 'app',                  # type of Apple Product
        'country': 'us',                # country for the search
    }

    search = GoogleSearch(params)       # data extraction on the SerpApi backend
    product_info = search.get_dict()    # JSON -> Python dict

    del product_info['search_metadata']
    del product_info['search_parameters']
    del product_info['search_information']

    return product_info


def get_product_reviews(product_id):
    params = {
        'api_key': '...',               # https://serpapi.com/manage-api-key
        'engine': 'apple_reviews',      # SerpApi search engine 
        'product_id': product_id,       # ID of a product
        'country': 'us',                # country for the search
        'sort': 'mostrecent',           # sorting reviews
        'page': 1,                      # pagination
    }

    product_reviews = []

    while True:
        search = GoogleSearch(params)
        new_page_results = search.get_dict()

        product_reviews.extend(new_page_results['reviews'])

        if 'next' in new_page_results.get('serpapi_pagination', {}):
            params['page'] += 1
        else:
            break

    return product_reviews


def main():
    product_id = 1507782672

    app_store_results = {
        'product_info': get_product_info(product_id),
        'product_reviews': get_product_reviews(product_id)
    }

    print(json.dumps(app_store_results, indent=2, ensure_ascii=False))


if __name__ == '__main__':
    main()

Preparation

Install library:

pip install google-search-results

google-search-results is a SerpApi API package.

Code Explanation

Import libraries:

from serpapi import GoogleSearch
import json

Library	Purpose
`GoogleSearch`	to scrape and parse Google results using SerpApi web scraping library.
`json`	to convert extracted data to a JSON object.

Top-level code environment

At the beginning of the function, the product_id variable is created that stores the ID of the desired product:

product_id = 1507782672

Next, the app_store_results dictionary is created, to which the data returned by the get_product_info(product_id) and get_product_reviews(product_id) functions are added. The explanation of these functions will be in the corresponding headings below.

app_store_results = {
    'product_info': get_product_info(product_id),
    'product_reviews': get_product_reviews(product_id)
}

After the all data is retrieved, it is output in JSON format:

print(json.dumps(app_store_results, indent=2, ensure_ascii=False))

This code uses the generally accepted rule of using the __name__ == "__main__" construct:

def main():
    product_id = 1507782672

    app_store_results = {
        'product_info': get_product_info(product_id),
        'product_reviews': get_product_reviews(product_id)
    }

    print(json.dumps(app_store_results, indent=2, ensure_ascii=False))


if __name__ == '__main__':
    main()

This check will only be performed if the user has run this file. If the user imports this file into another, then the check will not work.

You can watch the video Python Tutorial: if name == 'main' for more details.

Get product information

The function takes a specific product_id and returns a dictionary with all the information about that product.

At the beginning of the function, the params dictionary are defined for generating the URL:

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'engine': 'apple_product',      # SerpApi search engine 
    'product_id': product_id,       # ID of a product
    'type': 'app',                  # type of Apple Product
    'country': 'us',                # country for the search
}

Parameters	Explanation
`api_key`	Parameter defines the SerpApi private key to use. You can find it under your account -> API key.
`engine`	Set parameter to `apple_product` to use the Apple Product engine.
`product_id`	Parameter defines the product id you want to search. You can use the specific id of a product that you would like to get the product page of.
`type`	Parameter defines the type of Apple Product to get the product page of. It defaults to `app`.
`country`	Parameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions.

📌Note: You can also add other API Parameters.

Then, we create a search object where the data is retrieved from the SerpApi backend. In the product_info dictionary we get data from JSON:

search = GoogleSearch(params)       # data extraction on the SerpApi backend
product_info = search.get_dict()    # JSON -> Python dict

The product_info dictionary contains information not only about the product, but also about the request. Request information is not needed, so we remove the corresponding keys using the del statement:

del product_info['search_metadata']
del product_info['search_parameters']
del product_info['search_information']

At the end of the function, the product_info dictionary with the extracted data is returned:

return product_info

The complete function to get product information would look like this:

def get_product_info(product_id):
    params = {
        'api_key': '...',               # https://serpapi.com/manage-api-key
        'engine': 'apple_product',      # SerpApi search engine 
        'product_id': product_id,       # ID of a product
        'type': 'app',                  # type of Apple Product
        'country': 'us',                # country for the search
    }

    search = GoogleSearch(params)       # data extraction on the SerpApi backend
    product_info = search.get_dict()    # JSON -> Python dict

    del product_info['search_metadata']
    del product_info['search_parameters']
    del product_info['search_information']

    return product_info

Get product reviews

The function takes a specific product_id and returns a dictionary with all the reviews about that product.

At the beginning of the function, the params dictionary are defined for generating the URL:

params = {
    'api_key': '...',               # https://serpapi.com/manage-api-key
    'engine': 'apple_reviews',      # SerpApi search engine 
    'product_id': product_id,       # ID of a product
    'country': 'us',                # country for the search
    'sort': 'mostrecent',           # sorting reviews
    'page': 1,                      # pagination
}

Parameters	Explanation
`api_key`	Parameter defines the SerpApi private key to use. You can find it under your account -> API key.
`engine`	Set parameter to `apple_reviews` to use the Apple Reviews engine.
`product_id`	Parameter defines the ID of a product you want to get the reviews for.
`country`	Parameter defines the country to use for the search. It's a two-letter country code. Head to the Apple Regions for a full list of supported Apple Regions.
`sort`	Parameter is used for sorting reviews. It can be set to `mostrecent` or `mosthelpful`.
`page`	Parameter is used to get the items on a specific page. (e.g., `1` (default) is the first page of results, `2` is the 2nd page of results, `3` is the 3rd page of results, etc.).

📌Note: You can also add other API Parameters.

Define the product_reviews list to which the retrieved reviews will be added:

product_reviews = []

The while loop is created that is needed to extract reviews from all pages:

while True:
    # data extraction will be here

Then, we create a search object where the data is retrieved from the SerpApi backend. In the new_page_results dictionary we get data from JSON:

search = GoogleSearch(params)
new_page_results = search.get_dict()

Adding new data from this page to the product_reviews list:

product_reviews.extend(new_page_results['reviews'])

# first_review = new_page_results['reviews'][0]
# title = first_review['title']
# text = first_review['text']
# rating = first_review['rating']
# review_date = first_review['review_date']
# author_name = first_review['author']['name']
# author_link = first_review['author']['link']

📌Note: In the comments above, I showed how to extract specific fields. You may have noticed the new_page_results['reviews'][0]. This is the index of a review, which means that we are extracting data from the first review. The new_page_results['reviews'][1] is from the second review and so on.

After data is retrieved from the current page, a check is made to see if the next page exists. If there is one in the serpapi_pagination dictionary, then the page parameter is incremented by 1. Else, the loop stops:

if 'next' in new_page_results.get('serpapi_pagination', {}):
    params['page'] += 1
else:
    break

At the end of the function, the product_reviews dictionary with the extracted data is returned:

return product_reviews

The complete function to get product reviews would look like this:

def get_product_reviews(product_id):
    params = {
        'api_key': '...',               # https://serpapi.com/manage-api-key
        'engine': 'apple_reviews',      # SerpApi search engine 
        'product_id': product_id,       # ID of a product
        'country': 'us',                # country for the search
        'sort': 'mostrecent',           # sorting reviews
        'page': 1,                      # pagination
    }

    product_reviews = []

    while True:
        search = GoogleSearch(params)
        new_page_results = search.get_dict()

        product_reviews.extend(new_page_results['reviews'])

        if 'next' in new_page_results.get('serpapi_pagination', {}):
            params['page'] += 1
        else:
            break

    return product_reviews

Output

{
  "product_info": {
    "title": "Pixea",
    "snippet": "The invisible image viewer",
    "id": "1507782672",
    "age_rating": "4+",
    "developer": {
      "name": "ImageTasks Inc",
      "link": "https://apps.apple.com/us/developer/imagetasks-inc/id450316587"
    },
    "rating": 4.6,
    "rating_count": "620 Ratings",
    "price": "Free",
    "in_app_purchases": "Offers In-App Purchases",
    "logo": "https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
    "mac_screenshots": [
      "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/e0/21/86/e021868d-b43b-0a78-8d4a-e4e0097a1d01/0131f1c2-3227-46bf-8328-7b147d2b1ea2_Pixea-1.jpg/643x0w.webp",
      "https://is4-ssl.mzstatic.com/image/thumb/Purple113/v4/55/3c/98/553c982d-de30-58b5-3b5a-d6b3b2b6c810/a0424c4d-4346-40e6-8cde-bc79ce690040_Pixea-2.jpg/643x0w.webp",
      "https://is3-ssl.mzstatic.com/image/thumb/Purple123/v4/77/d7/d8/77d7d8c1-4b4c-ba4b-4dde-94bdc59dfb71/6e66509c-5886-45e9-9e96-25154a22fd53_Pixea-3.jpg/643x0w.webp",
      "https://is3-ssl.mzstatic.com/image/thumb/PurpleSource113/v4/44/79/91/447991e0-518f-48b3-bb7e-c7121eb57ba4/79be2791-5b93-4c4d-b4d1-38a3599c2b2d_Pixea-4.jpg/643x0w.webp"
    ],
    "description": "Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them.Supported formats:JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives.Export formats:JPEG, JPEG-2000, PNG, TIFF, BMP.Found a bug? Have a suggestion? Please, send it to support@imagetasks.comFollow us on Twitter @imagetasks!",
    "version_history": [
      {
        "release_version": "2.1",
        "release_notes": "- New \"Fixed Size and Position\" zoom mode- Fixed a bug causing crash when browsing ZIP-files- Bug fixes and improvements",
        "release_date": "2023-01-03"
      },
      ... other versions
    ],
    "ratings_and_reviews": {
      "rating_percentage": {
        "5_star": "76%",
        "4_star": "13%",
        "3_star": "4%",
        "2_star": "2%",
        "1_star": "4%"
      },
      "review_examples": [
        {
          "rating": "5 out of 5",
          "username": "MyrtleBlink182",
          "review_date": "01/18/2022",
          "review_title": "Full-Screen Perfection",
          "review_text": "This photo-viewer is by far the best in the biz. I thoroughly enjoy viewing photos with it. I tried a couple of others out, but this one is exactly what I was looking for. There is no dead space or any extra design baggage when viewing photos. Pixea knocks it out of the park keeping the design minimalistic while ensuring the functionality is through the roof"
        },
        ... other reviews examples
      ]
    },
    "privacy": {
      "description": "The developer, ImageTasks Inc, indicated that the app’s privacy practices may include handling of data as described below. For more information, see the developer’s privacy policy.",
      "privacy_policy_link": "https://www.imagetasks.com/Pixea-policy.txt",
      "cards": [
        {
          "title": "Data Not Collected",
          "description": "The developer does not collect any data from this app."
        }
      ],
      "sidenote": "Privacy practices may vary, for example, based on the features you use or your age. Learn More",
      "learn_more_link": "https://apps.apple.com/story/id1538632801"
    },
    "information": {
      "seller": "ImageTasks Inc",
      "price": "Free",
      "size": "7.1 MB",
      "categories": [
        "Photo & Video"
      ],
      "compatibility": [
        {
          "device": "Mac",
          "requirement": "Requires macOS 10.12 or later."
        }
      ],
      "supported_languages": [
        "English"
      ],
      "age_rating": {
        "rating": "4+"
      },
      "copyright": "Copyright © 2020-2023 ImageTasks Inc. All rights reserved.",
      "in_app_purchases": [
        {
          "name": "Upgrade to Pixea Plus",
          "price": "$3.99"
        }
      ],
      "developer_website": "https://www.imagetasks.com",
      "app_support_link": "https://www.imagetasks.com/pixea",
      "privacy_policy_link": "https://www.imagetasks.com/Pixea-policy.txt"
    },
    "more_by_this_developer": {
      "apps": [
        {
          "logo": "https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
          "link": "https://apps.apple.com/us/app/istatistica/id1126874522",
          "serpapi_link": "https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
          "name": "iStatistica",
          "category": "Utilities"
        },
        ... other apps
      ],
      "result_type": "Full",
      "see_all_link": "https://apps.apple.com/us/app/id1507782672#see-all/developer-other-apps"
    }
  },
  "product_reviews": [
    {
      "position": 1,
      "id": "9446406432",
      "title": "Stop begging for reviews",
      "text": "Stop begging for reviews",
      "rating": 1,
      "review_date": "2022-12-28 21:42:28 UTC",
      "author": {
        "name": "stalfos_knight",
        "link": "https://itunes.apple.com/us/reviews/id41752602"
      }
    },
    ... other reviews
  ]
}

Links

Join us on Twitter | YouTube

Add a Feature Request💫 or a Bug🐞

DEV Community

Scraping Apple App Store Product Info And Reviews with Python

What will be scraped

Why using API?

Full Code

Preparation

Code Explanation

Top-level code environment

Get product information

Get product reviews

Output

Links

Top comments (0)

Read next

Docker All in one 1️⃣

EchoAPI for Cursor: An Alternative for Thunder Client in Cursor?

GhubScan osint tool

Advanced Entity Extraction with Azure OpenAI: Harnessing Structured Outputs