This blog was initially posted to Crawlbase Blog
How to Scrape Apple App Store Data?
Our first step is to create an account with Crawlbase, which will enable us to utilize the Crawling API and serve as our platform for reliably fetching data from the App Store.
Creating a Crawlbase account
- Sign up for a Crawlbase account and log in.
- Once registered, you’ll receive 1,000 free requests. Add your billing details before using any of the free credits to get an extra 9,000 requests.
- Go to your Account Docs and save your Normal Request token for this blog’s purpose.
Setting up the Environment
Next, ensure that Node.js is installed on your device, as it is the backbone of our scraping script, providing a fast JavaScript runtime and access to essential libraries.
Installing Node on Windows:
- Go to the official Node.js website and download the Long-Term Support (LTS) version for Windows.
- Launch the installer and follow the prompts. Leave the default options selected.
- Verify installation by opening a new Command Prompt and running the following commands:
node -v
npm -v
For macOS:
- Go to
[https://nodejs.org](https://nodejs.org/)
and download the macOS installer (LTS). - Follow the installation wizard.
- Open the Terminal and confirm the installation:
node -v
npm -v
For Linux (Ubuntu/Debian):
- Open your terminal to add the NodeSource repository and install Node.js:
curl -fsSL https://deb.nodesource.com/setup_lts.x | sudo -E bash -
sudo apt-get install -y nodejs
- Verify your installation:
node -v
npm -v
Fetch Script
Grab the script below and save it with a .js
extension, any IDE or a coding environment you like will work. Once you've saved it, double-check that all the necessary dependencies are installed in your Node.js setup. After that, you should be all set.
import { CrawlingAPI } from 'crawlbase';
const CRAWLBASE_NORMAL_TOKEN = '<Normal requests token>';
const URL = 'https://apps.apple.com/us/app/google-authenticator/id388497605';
async function crawlAppStore() {
const api = new CrawlingAPI({ token: CRAWLBASE_NORMAL_TOKEN });
const options = {
userAgent: 'Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/30.0',
};
const response = await api.get(URL, options);
if (response.statusCode !== 200) {
throw new Error(`Request failed with status code: ${response.statusCode}`);
}
return response.body;
}
IMPORTANT: Make sure to replace <Normal requests token>
with your actual Crawlbase normal request token before running the script.
This script shows how to use Crawlbase’s Crawling API to retrieve HTML content from the Apple App Store without getting blocked. Note that the response hasn’t been scraped yet. We still need to remove unnecessary elements, clean the data, and produce a parsed, structured response.
Locating specific CSS selectors
Now that you understand how to send a simple API request using Node.js, let’s locate the data we need from our target URL so we can later write code to clean and parse it.
The first thing you’ll notice is the main section at the top. It’s usually where we’ll find the most important details and is typically well-structured, making it an ideal target for scraping.
Go ahead and open your target URL and locate each selector. For example, let’s search for the title:
Take note of the .app-header__title
and do the same for subtitle
, seller
, category
, stars
, rating
, and price
. Once that's done, this section is complete.
The process is pretty much the same for the rest of the page. Here’s another example: if you want to include the customer average rating in the Ratings and Reviews section, right-click on the data and select Inspect:
You know the gist. It should now be a piece of cake for you to locate the remaining data you need.
Parsing the HTML in Node.js
Now that you’re an expert in extracting the CSS selectors, it is time to build the code to parse the HTML. This is where Cheerio comes in. It is a lightweight and powerful library that enables us to select relevant data from the HTML source code within Node.js.
Start by creating your project folder and run:
npm init -y
npm install crawlbase lodash casenator cheerio
Import the Required Libraries
Then in your .js
file, import the required libraries for this project, including Cheerio
:
import _ from 'lodash';
import { CrawlingAPI } from 'crawlbase';
import { toCamelCase } from 'casenator';
import * as cheerio from 'cheerio';
Don’t forget to set up the Crawling API as well as the target website:
const CRAWLBASE_NORMAL_TOKEN = '<Normal requests token>';
const URL = 'https://apps.apple.com/us/app/google-authenticator/id388497605';
Functions for Scraping Apple Store Data
This is where we’ll use the CSS selectors we’ve collected earlier. Let’s write the part of the code that pulls the bits of information from the App Store page.
function scrapePrimaryAppDetails($) {
let title = $('.app-header__title').text().trim();
const titleBadge = $('.badge--product-title').text().trim();
title = title.replace(titleBadge, '').trim();
const subtitle = $('.app-header__subtitle').text().trim();
const seller = $('.app-header__identity').text().trim();
let category = null;
try {
category = $('.product-header__list__item a.inline-list__item').text().trim().split('in')[1].trim();
} catch {
category = null;
}
const stars = $('.we-star-rating').attr('aria-label');
const rating = $('.we-rating-count').text().trim().split('•')[1].trim();
const price = $('.app-header__list__item--price').text().trim();
return { title, subtitle, seller, category, stars, rating, price };
}
Just like that, it will extract the title, subtitle, seller, category, star rating, overall ratings, and price.
From this point, you can add more functions for each section of the page. You can add the Preview Image and Description, as well as user reviews, etc.
Combine Everything in One Function
Once the scraper is complete, we need to combine everything in one function and print the result:
function scrapeAppStore(html) {
const $ = cheerio.load(html);
return {
primaryAppDetails: scrapePrimaryAppDetails($),
appPreviewAndDescription: scrapeAppPreviewAndDescription($),
ratingsAndReviews: { reviews: scrapeRatingsAndReviews($) },
informationSection: scrapeInformationSection($),
relatedAppsAndRecommendations: scrapeRelatedAppsAndRecommendations($),
};
}
Complete Code to Scrape Apple App Store Data
import _ from 'lodash';
import { CrawlingAPI } from 'crawlbase';
import { toCamelCase } from 'casenator';
import * as cheerio from 'cheerio';
const CRAWLBASE_NORMAL_TOKEN = '<Normal requests token>';
const URL = 'https://apps.apple.com/us/app/google-authenticator/id388497605';
async function crawlAppStore() {
const api = new CrawlingAPI({ token: CRAWLBASE_NORMAL_TOKEN });
const options = {
userAgent: 'Mozilla/5.0 (Windows NT 6.2; rv:20.0) Gecko/20121202 Firefox/30.0',
};
const response = await api.get(URL, options);
if (response.statusCode !== 200) {
throw new Error(`Request failed with status code: ${response.statusCode}`);
}
return response.body;
}
function scrapePrimaryAppDetails($) {
let title = $('.app-header__title').text().trim();
const titleBadge = $('.badge--product-title').text().trim();
title = title.replace(titleBadge, '').trim();
const subtitle = $('.app-header__subtitle').text().trim();
const seller = $('.app-header__identity').text().trim();
let category = null;
try {
category = $('.product-header__list__item a.inline-list__item').text().trim().split('in')[1].trim();
} catch {
category = null;
}
const stars = $('.we-star-rating').attr('aria-label');
const rating = $('.we-rating-count').text().trim().split('•')[1].trim();
const price = $('.app-header__list__item--price').text().trim();
return { title, subtitle, seller, category, stars, rating, price };
}
function scrapeAppPreviewAndDescription($) {
const sources = $('source').toArray();
const imageUrl =
sources
.map((element) => $(element).attr('srcset'))
.filter((srcset) => srcset)
.map((srcset) => srcset.split(',')[0].trim().split(' ')[0])
.find((url) => url) || null;
let appDescription = $('.section__description').text().trim();
appDescription = appDescription.replace(/^Description\s*/, '');
return { imageUrl, appDescription };
}
function scrapeRatingsAndReviews($) {
const reviews = [];
$('.we-customer-review').each((index, element) => {
const stars = $(element).find('.we-star-rating').attr('aria-label');
const reviewerName = $(element).find('.we-customer-review__user').text().trim();
const reviewTitle = $(element).find('.we-customer-review__title').text().trim();
const fullReviewText = $(element).find('.we-customer-review__body').text().trim();
const reviewDate = $(element).find('.we-customer-review__date').attr('datetime');
reviews.push({ stars, reviewerName, reviewTitle, fullReviewText, reviewDate });
});
return reviews;
}
function scrapeInformationSection($) {
const information = {};
$('dl.information-list dt').each((index, element) => {
const key = $(element).text().trim();
const value = $(element).next('dd').text().trim();
if (key && value) {
const camelKey = toCamelCase(key);
if (camelKey === 'languages') {
information[camelKey] = _.uniq(value.split(',').map((item) => item.trim())).sort();
} else if (camelKey === 'compatibility') {
information[camelKey] = _.uniq(
value
.split('\n')
.map((item) => item.trim())
.filter((item) => item),
).sort();
} else {
information[camelKey] = value;
}
}
});
return information;
}
function scrapeRelatedAppsAndRecommendations($) {
function extractAppsFromSection(headlineText) {
const results = [];
$('h2.section__headline').each((index, element) => {
const currentHeadlineText = $(element).text().trim();
if (currentHeadlineText === headlineText) {
const parent = $(element).parent();
const nextSibling = parent.next();
nextSibling.find('a.we-lockup--in-app-shelf').each((appIndex, appElement) => {
const appTitle = $(appElement).find('.we-lockup__title').text().trim();
const appUrl = $(appElement).attr('href');
if (appTitle && appUrl) {
results.push({
title: appTitle,
url: appUrl,
});
}
});
}
});
return results;
}
return {
developerApps: extractAppsFromSection('More By This Developer'),
relatedApps: extractAppsFromSection('You Might Also Like'),
};
}
function scrapeAppStore(html) {
const $ = cheerio.load(html);
const data = {
primaryAppDetails: {
...scrapePrimaryAppDetails($),
},
appPreviewAndDescription: {
...scrapeAppPreviewAndDescription($),
},
ratingsAndReviews: {
reviews: scrapeRatingsAndReviews($),
},
informationSection: {
...scrapeInformationSection($),
},
relatedAppsAndRecommendations: {
...scrapeRelatedAppsAndRecommendations($),
},
};
return data;
}
const html = await crawlAppStore();
const data = scrapeAppStore(html);
console.log(JSON.stringify(data, null, 2));
And when you run your script:
npm run crawl
You’ll see the output in this structure:
This organized structure provides a solid foundation for further analysis, reporting, or visualization, regardless of your end goal.
Check out the complete code in our GitHub repository for this blog.
Scrape Apple Store Data with Crawlbase
Scraping the Apple App Store can provide valuable insights into how apps are presented, user responses, and the performance of competitors. With Crawlbase and a solid HTML parser like Cheerio, you can automate the extraction of Apple data and turn it into something actionable.
For tracking reviews, comparing prices, or just exploring the app ecosystem, this setup can save you time and effort while delivering the data you need.
Start your next scraping project now with Crawlbase’s Smart Proxy and Crawling API to avoid getting blocked!
Top comments (0)