What will be scraped
Full code
If you don't need an explanation, have a look at the full code example in the online IDE
import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";
const getSearchParams = (searchType) => {
const isProduct = searchType === "product";
const reviewsLimit = 10; // hardcoded limit for demonstration purpose
const engine = isProduct ? "apple_product" : "apple_reviews"; // search engine
const params = {
api_key: process.env.API_KEY, //your API key from serpapi.com
product_id: "1507782672", // Parameter defines the ID of a product you want to get the reviews for
country: "us", // Parameter defines the country to use for the search
type: isProduct ? "app" : undefined, // Parameter defines the type of Apple Product to get the product page of
page: isProduct ? undefined : 1, // Parameter is used to get the items on a specific page
sort: isProduct ? undefined : "mostrecent", // Parameter is used for sorting reviews
};
return { engine, params, reviewsLimit };
};
const getProductInfo = async () => {
const { engine, params } = getSearchParams("product");
const json = await getJson(engine, params);
delete json.search_metadata;
delete json.search_parameters;
delete json.search_information;
return json;
};
const getReviews = async () => {
const reviews = [];
const { engine, params, reviewsLimit } = getSearchParams();
while (true) {
const json = await getJson(engine, params);
if (json.reviews) {
reviews.push(...json.reviews);
params.page += 1;
} else break;
if (reviews.length >= reviewsLimit) break;
}
return reviews;
};
const getResults = async () => {
return { productInfo: await getProductInfo(), reviews: await getReviews() };
};
getResults().then((result) => console.dir(result, { depth: null }));
Why use Apple Product Page Scraper and Apple App Store Reviews Scraper APIs from SerpApi?
Using API generally solves all or most problems that might get encountered while creating own parser or crawler. From webscraping perspective, our API can help to solve the most painful problems:
- Bypass blocks from supported search engines by solving CAPTCHA or IP blocks.
- No need to create a parser from scratch and maintain it.
- Pay for proxies, and CAPTCHA solvers.
- Don't need to use browser automation if there's a need to extract data in large amounts faster.
Head to the Apple Product Page playground and Apple App Store Reviews playground for a live and interactive demo.
Preparation
First, we need to create a Node.js* project and add npm
packages serpapi
and dotenv
.
To do this, in the directory with our project, open the command line and enter:
$ npm init -y
And then:
$ npm i serpapi dotenv
*If you don't have Node.js installed, you can download it from nodejs.org and follow the installation documentation.
SerpApi package is used to scrape and parse search engine results using SerpApi. Get search results from Google, Bing, Baidu, Yandex, Yahoo, Home Depot, eBay, and more.
dotenv package is a zero-dependency module that loads environment variables from a
.env
file intoprocess.env
.
Next, we need to add a top-level "type" field with a value of "module" in our package.json
file to allow using ES6 modules in Node.JS:
For now, we complete the setup Node.JS environment for our project and move to the step-by-step code explanation.
Code explanation
First, we need to import dotenv
from dotenv
library and call config()
method, then import getJson
from serpapi
library:
import dotenv from "dotenv";
dotenv.config();
import { getJson } from "serpapi";
-
config()
will read your.env
file, parse the contents, assign it toprocess.env
, and return an Object with aparsed
key containing the loaded content or anerror
key if it failed. -
getJson()
allows you to get a JSON response based on search parameters.
Next, we write getSearchParams
function, to make the necessary search parameters for two different APIs. In this function, we define and set isProduct
constant depending on the searchType
argument.
Next, we define and return different search parameters for Product Page API and Reviews API: search engine
; how many reviews we want to receive (reviewsLimit
constant); search parameters for making a request:
const getSearchParams = (searchType) => {
const isProduct = searchType === "product";
const reviewsLimit = 10; // hardcoded limit for demonstration purpose
const engine = isProduct ? "apple_product" : "apple_reviews"; // search engine
const params = {
api_key: process.env.API_KEY, //your API key from serpapi.com
product_id: "1507782672", // Parameter defines the ID of a product you want to get the reviews for
country: "us", // Parameter defines the country to use for the search
type: isProduct ? "app" : undefined, // Parameter defines the type of Apple Product to get the product page of
page: isProduct ? undefined : 1, // Parameter is used to get the items on a specific page
sort: isProduct ? undefined : "mostrecent", // Parameter is used for sorting reviews
};
return { engine, params, reviewsLimit };
};
When we run this function, we receive different search parameters for:
You can use the next search params:
Common params:
-
api_key
parameter defines the SerpApi private key to use. -
product_id
parameter defines the ID of a product you want to get the reviews for. You can get the ID of a product from our Web scraping Apple App Store Search with Nodejs blog post. You can also get it from the URL of the app. For exampleproduct_id
of "https://apps.apple.com/us/app/the-great-coffee-app/id534220544", is the long numerical value that comes after "id",534220544
. -
country
parameter defines the country to use for the search. It's a two-letter country code. (e.g.,us
(default) for the United States,uk
for United Kingdom, orfr
for France). Head to the Apple Regions for a full list of supported Apple Regions. -
no_cache
parameter will force SerpApi to fetch the App Store Search results even if a cached version is already present. A cache is served only if the query and all parameters are exactly the same. Cache expires after 1h. Cached searches are free, and are not counted towards your searches per month. It can be set tofalse
(default) to allow results from the cache, ortrue
to disallow results from the cache.no_cache
andasync
parameters should not be used together. -
async
parameter defines the way you want to submit your search to SerpApi. It can be set tofalse
(default) to open an HTTP connection and keep it open until you got your search results, ortrue
to just submit your search to SerpApi and retrieve them later. In this case, you'll need to use our Searches Archive API to retrieve your results.async
andno_cache
parameters should not be used together.async
should not be used on accounts with Ludicrous Speed enabled.
Product Page params:
-
type
parameter defines the type of Apple Product to get the product page of. It defaults toapp
.
Reviews params:
-
page
parameter is used to get the items on a specific page. (e.g.,1
(default) is the first page of results,2
is the 2nd page of results,3
is the 3rd page of results, etc.). -
sort
parameter is used for sorting reviews. It can be set to:mostrecent
(Most recent (default)) ormosthelpful
(Most helpful).
Next, we declare the function getProductInfo
that gets all product information from the page and returns it. In this function we receive and destructure engine
and params
from getSearchParams
function with "product"
argument. Next, we get json
with results, delete unnecessary keys, and return it:
const getProductInfo = async () => {
const { engine, params } = getSearchParams("product");
const json = await getJson(engine, params);
delete json.search_metadata;
delete json.search_parameters;
delete json.search_information;
return json;
};
Next, we declare the function getReviews
that gets reviews results from all pages (using pagination) and return it:
const getReviews = async () => {
...
};
In this function we need to declare an empty reviews
array, receive and destructure engine
, params
and reviewsLimit
from getSearchParams
function without arguments, then and using while
loop get json
with results, add reviews
from each page and set next page index (to params.page
value).
If there are no more results on the page or if the number of received results is more than reviewsLimit
we stop the loop (using break
) and return an array with results:
const reviews = [];
const { engine, params, reviewsLimit } = getSearchParams();
while (true) {
const json = await getJson(engine, params);
if (json.reviews) {
reviews.push(...json.reviews);
params.page += 1;
} else break;
if (reviews.length >= reviewsLimit) break;
}
return reviews;
And finally, we declare and run the getResults
function, in which we make an object with results from getProductInfo
and getReviews
functions. Then we print all the received information in the console with the console.dir
method, which allows you to use an object with the necessary parameters to change default output options:
const getResults = async () => {
return { productInfo: await getProductInfo(), reviews: await getReviews() };
};
getResults().then((result) => console.dir(result, { depth: null }));
Output
{
"productInfo":{
"title":"Pixea",
"snippet":"The invisible image viewer",
"id":"1507782672",
"age_rating":"4+",
"developer":{
"name":"ImageTasks Inc",
"link":"https://apps.apple.com/us/developer/imagetasks-inc/id450316587"
},
"rating":4.6,
"rating_count":"594 Ratings",
"price":"Free",
"logo":"https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
"mac_screenshots":[
"https://is3-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/b1/8c/fb/b18cfb80-cb5c-d67d-2edc-ee1f6666e012/35b8d5a7-b493-4a80-bdbd-3e9d564601dd_Pixea-1.jpg/643x0w.webp",
"https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/96/08/83/9608834d-3d2b-5c0b-570c-f022407ff5cc/1836573e-1b6a-421c-b654-6ae2f915d755_Pixea-2.jpg/643x0w.webp",
"https://is1-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/58/fd/db/58fddb5d-9480-2536-8679-92d6b067d285/98e22b63-1575-4ee6-b08d-343b9e0474ea_Pixea-3.jpg/643x0w.webp",
"https://is2-ssl.mzstatic.com/image/thumb/PurpleSource124/v4/c3/f3/f3/c3f3f3b5-deb0-4b58-4afc-79073373b7b9/28f51f38-bc59-4a61-a5a1-bff553838267_Pixea-4.jpg/643x0w.webp"
],
"description":"Pixea is an image viewer for macOS with a nice minimal modern user interface. Pixea works great with JPEG, HEIC, PSD, RAW, WEBP, PNG, GIF, and many other formats. Provides basic image processing, including flip and rotate, shows a color histogram, EXIF, and other information. Supports keyboard shortcuts and trackpad gestures. Shows images inside archives, without extracting them.Supported formats:JPEG, HEIC, GIF, PNG, TIFF, Photoshop (PSD), BMP, Fax images, macOS and Windows icons, Radiance images, Google's WebP. RAW formats: Leica DNG and RAW, Sony ARW, Olympus ORF, Minolta MRW, Nikon NEF, Fuji RAF, Canon CR2 and CRW, Hasselblad 3FR. Sketch files (preview only). ZIP-archives.Export formats:JPEG, JPEG-2000, PNG, TIFF, BMP.Found a bug? Have a suggestion? Please, send it to support@imagetasks.comFollow us on Twitter @imagetasks!",
"version_history":[
{
"release_version":"1.4",
"release_notes":"- New icon- macOS Big Sur support- Universal Binary- Bug fixes and improvements",
"release_date":"2020-11-09"
},
... and other versions
],
"ratings_and_reviews":{
"rating_percentage":{
"5_star":"76%",
"4_star":"14%",
"3_star":"4%",
"2_star":"2%",
"1_star":"3%"
},
"review_examples":[
{
"rating":"5 out of 5",
"username":"MyrtleBlink182",
"review_date":"01/18/2022",
"review_title":"Full-Screen Perfection",
"review_text":"This photo-viewer is by far the best in the biz. I thoroughly enjoy viewing photos with it. I tried a couple of others out, but this one is exactly what I was looking for. There is no dead space or any extra design baggage when viewing photos. Pixea knocks it out of the park keeping the design minimalistic while ensuring the functionality is through the roof"
},
... and other reviews examples
]
},
"privacy":{
"description":"The developer, ImageTasks Inc, indicated that the appโs privacy practices may include handling of data as described below. For more information, see the developerโs privacy policy.",
"privacy_policy_link":"https://www.imagetasks.com/Pixea-policy.txt",
"cards":[
{
"title":"Data Not Collected",
"description":"The developer does not collect any data from this app."
}
],
"sidenote":"Privacy practices may vary, for example, based on the features you use or your age. Learn More",
"learn_more_link":"https://apps.apple.com/story/id1538632801"
},
"information":{
"seller":"ImageTasks Inc",
"price":"Free",
"size":"5.8 MB",
"categories":[
"Photo & Video"
],
"compatibility":[
{
"device":"Mac",
"requirement":"Requires macOS 10.12 or later."
}
],
"supported_languages":[
"English"
],
"age_rating":{
"rating":"4+"
},
"copyright":"Copyright ยฉ 2020 Andrey Tsarkov. All rights reserved.",
"developer_website":"https://www.imagetasks.com",
"app_support_link":"https://www.imagetasks.com/pixea",
"privacy_policy_link":"https://www.imagetasks.com/Pixea-policy.txt"
},
"more_by_this_developer":{
"apps":[
{
"logo":"https://is3-ssl.mzstatic.com/image/thumb/Purple118/v4/f6/93/b6/f693b68f-9b14-3689-7521-c19a83fb0d88/AppIcon-1x_U007emarketing-85-220-6.png/320x0w.webp",
"link":"https://apps.apple.com/us/app/istatistica/id1126874522",
"serpapi_link":"https://serpapi.com/search.json?country=us&engine=apple_product&product_id=1507782672&type=app",
"name":"iStatistica",
"category":"Utilities"
},
... and other apps
],
"result_type":"Full",
"see_all_link":"https://apps.apple.com/us/app/id1507782672#see-all/developer-other-apps"
}
},
"reviews":[
{
"position":1,
"id":"9332275235",
"title":"Doesn't respect aspect ratios",
"text":"Seemingly no way to maintain the aspect ratio of an image. It always wants to fill the photo to the window size, no matter what sizing options you pick. How useless is that?",
"rating":3,
"review_date":"2022-11-26 13:29:43 UTC",
"author":{
"name":"soren121",
"link":"https://itunes.apple.com/us/reviews/id33706024"
}
},
... and other reviews
]
}
Links
- Code in the online IDE
- Apple Product Page Scraper API documentation
- Apple Product Page playground
- Apple App Store Reviews Scraper API documentation
- Apple App Store Reviews playground
If you want other functionality added to this blog post or if you want to see some projects made with SerpApi, write me a message.
Add a Feature Request๐ซ or a Bug๐
Top comments (0)