DEV Community

Minexa.ai
Minexa.ai

Posted on • Edited on

Scraping app store listings at scale with the Minexa API

App store pages are structured consistently. Every listing has a title, a rating, a review count, a description, a category, a developer name, and often a price or in-app purchase flag. For app analytics platforms, that consistency is exactly what makes them worth scraping at scale.

The challenge is volume. Tracking thousands of apps across multiple store pages, refreshing data regularly, and keeping fields aligned across runs is not something you want to rebuild every time a page shifts. This is where Minexa API fits in.

How the workflow starts: train once in the extension

Before you write a single API call, you train a scraper using the Minexa Chrome extension. Browse to an app listing detail page, let Minexa detect the structure automatically, confirm the fields, and save the scraper. That scraper gets a stable scraper_id you will reuse in every API call going forward.

This is the core developer pattern with Minexa: visual setup in the browser, programmatic execution via API.

Minexa developer workflow

Once trained, the scraper works on any app listing page that shares the same structure. You do not retrain for every app. One scraper, thousands of pages.

Making the API call

The endpoint for extracting data is https://api.minexa.ai/data. You send a POST request with your scraper_id, the list of URLs you want to process, and the columns you want back.

Here is a Python example for detail-mode extraction across a batch of app listing pages:

import requests

API_KEY = "your_api_key_here"

app_urls = [
    "https://appstorehub.com/app/4821",
    "https://appstorehub.com/app/4822",
    "https://appstorehub.com/app/4823"
]

payload = {
    "scraper_id": 6374,
    "urls": app_urls,
    "columns": [
        "app_name",
        "developer",
        "rating",
        "review_count",
        "category",
        "description",
        "price"
    ]
}

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

response = requests.post(
    "https://api.minexa.ai/data",
    json=payload,
    headers=headers
)

print(response.json())
Enter fullscreen mode Exit fullscreen mode

If you are not sure which column names are available, you can use "columns": "top_40" instead of listing them manually. Minexa will return the top 40 ranked data points it found on the page.

Handling paginated responses

When you submit a large batch of URLs, the API returns results in pages. Each response includes a next_token field. If it is present, there are more results to fetch. Here is a checkpoint-based loop to collect everything:

import json

all_results = []
next_token = None

while True:
    if next_token:
        payload["next_token"] = next_token

    response = requests.post(
        "https://api.minexa.ai/data",
        json=payload,
        headers=headers
    ).json()

    all_results.extend(response.get("data", []))
    next_token = response.get("next_token")

    with open("checkpoint.json", "w") as f:
        json.dump(all_results, f)

    if not next_token:
        break

print(f"Total records collected: {len(all_results)}")
Enter fullscreen mode Exit fullscreen mode

Saving a checkpoint after each iteration means a network interruption does not cost you the work already done.

Credit consumption on app store pages

App store listing pages often load ratings, reviews, and media assets dynamically via JavaScript. Pages with heavy dynamic content or anti-bot protection may consume more than one credit per page. Plan your batch sizes accordingly.

API credit consumption guide

Why not just use an LLM for this?

App listing pages contain multiple similar numeric fields: aggregate rating, rating count, number of reviews per version, and sometimes a separate score for the current version. An LLM reading the raw HTML has to decide which number maps to which field. It does not always get this right, and it does not always signal when it is uncertain.

Minexa binds each column to a specific position in the DOM. The same field returns the same value every run, regardless of what else is on the page. If a value is missing, the output is null, not a guess.

For an analytics platform ingesting data from thousands of apps on a recurring basis, that determinism matters more than flexibility.

Get started: Read the full API docs and train your first app listing scraper in the extension before writing any code.

Scheduling your runs

The API itself does not manage scheduling. If you need to refresh app data daily or weekly, set up a cron job on your end and pass the updated URL list to the API on each run. This gives you full control over timing, batching, and retry logic within your own infrastructure.

For smaller, fixed lists of app URLs, the Chrome extension's built-in scheduling is simpler to configure. For larger or dynamic URL sets, the cron-plus-API approach scales better.

What app analytics platforms actually get out of this

With a trained scraper and a few dozen lines of Python, an analytics platform can maintain a structured, refreshable dataset of app listings covering name, developer, category, rating, review volume, pricing model, and description text. That data feeds directly into trend analysis, competitive benchmarking, category ranking models, and review sentiment pipelines.

The extraction setup is done once. After that, running it again on a new batch of URLs takes the same amount of engineering effort as the first run.

Explore Minexa API docs to see the full request schema and response format before you build.

Top comments (0)