If you have ever screenshotted a competitor's Shopify store to track their prices, this will save you a lot of clicking. Every Shopify storefront exposes its entire catalog as JSON, on the store's own domain, with no key and no login:
GET https://gymshark.com/products.json?limit=250&page=1
That is not an internal API you are sneaking into. It is the documented storefront endpoint Shopify themes themselves consume, and it works on almost every store, custom domain included.
What comes back
One products array, each entry carrying more than most people expect:
{
"id": 7523118678235,
"title": "Blush Seamless Shorts",
"handle": "blush-seamless-shorts",
"vendor": "Gymshark",
"product_type": "Shorts",
"tags": ["new-arrivals", "seamless"],
"published_at": "2026-06-28T09:00:00-04:00",
"variants": [
{
"title": "XS",
"sku": "GS-BSS-XS",
"price": "40.00",
"compare_at_price": null,
"available": true,
"grams": 180
}
],
"images": [{ "src": "https://cdn.shopify.com/..." }],
"options": [{ "name": "Size", "values": ["XS", "S", "M"] }]
}
The fields worth noticing:
-
variants[].availableis live in-stock status per size and color. Diff it daily and you know what is selling out. -
variants[].compare_at_pricereveals markdowns: when it is set and higher thanprice, the product is on sale. -
published_at/created_attell you when a product launched. Poll the first page sorted as-is and new launches surface immediately. -
variants[].skulets you join catalogs across stores when brands resell through multiple storefronts.
The three gotchas
-
Pagination is
?limit=250&page=Nand stops when a page comes back short. No cursors, no tokens. -
Per-collection views live at
/collections/{handle}/products.json, useful when you only care about one category. - Some stores disable it. A minority return 404 or an empty array. Detect that on page 1 and move on; there is no bypass worth doing, and with millions of stores the next target is a request away.
tags is also messier than it looks: some themes emit an array, some a comma-joined string, and plenty of stores stuff machine metadata in there. Normalize before you filter on it.
Why this beats browser scraping
A 50-product catalog pull is one HTTP request and finishes in single-digit seconds. The same data through a rendered storefront is dozens of page loads through whatever bot protection the store's CDN runs. When the structured data is served willingly, rendering the human version of the page is pure overhead.
I packaged this as an Apify actor this week: Shopify Store Products Scraper takes store domains, walks the pagination, normalizes tags/prices/availability, and returns one row per product or per variant, with an option to scope to a collection. The first 2 rows of every run are free, and stores with the endpoint disabled cost nothing.
The scheduled-run pattern is where it earns its keep: point it at your competitors daily, diff against yesterday's dataset, and you have a price, stockout, and launch monitor for a few cents a day.
Top comments (0)