▶️ 3-minute video walkthrough: input → run → dataset → API call.
TL;DR — Step-by-step walkthrough: paste a list of Shopify URLs, get back products + installed apps + reviews in JSON. No headless browser, ~$0.005 per store, runs in batch. We'll cover the input schema, three real use-cases (ICP qualification, cold outbound personalization, app market-share research), and the cost math. Live on Apify Store: Shopify Apps Spy + Product Scraper.
What you'll have at the end of this tutorial
A working pipeline that turns this:
https://allbirds.com
https://gymshark.com
https://manukora.com
…into a CSV like this:
store_domain,product_title,price,available,email_app,reviews_app,subs_app,product_url
allbirds.com,Wool Runner,110,true,Klaviyo,Yotpo,,https://allbirds.com/products/...
gymshark.com,Vital Seamless Bra,40,true,Klaviyo,Stamped,ReCharge,https://gymshark.com/...
manukora.com,UMF 20+ Honey,109,true,Postscript,Judge.me,Skio,https://manukora.com/...
Total time end-to-end: about 3 minutes for 50 stores.
Step 1 — Sign up to Apify (free, no card needed)
Go to apify.com and create an account. You get $5 of free credit on signup, which covers about 1,500 store scans at the standard tier. No credit card required.
Step 2 — Open the actor
Navigate to Shopify Apps Spy + Product Scraper on Apify Store.
Click "Try for free" → the actor opens in your console with a default input ready to run.
Step 3 — Configure the input
The input has 9 fields. The 3 you care about for your first run:
{
"store_urls": [
"https://allbirds.com",
"https://gymshark.com"
],
"extract_level": "standard",
"max_products_per_store": 250
}
store_urls: list of Shopify store URLs. Works with any custom domain, *.myshopify.com URLs, or store homepages. Cap is 100 stores per single run.
extract_level: choose what to pull.
| Level | Outputs | Cost per store (avg) |
|---|---|---|
basic |
products only | $0.001 |
standard |
products + apps installed | $0.005 |
full |
+ reviews from detected app | $0.30 |
pro |
+ revenue estimation (placeholder, J4) | TBD |
For 95% of use-cases, standard is the sweet spot — you get the apps stack which is the actual signal.
max_products_per_store: cap to avoid runaway costs on 50,000-product mega-stores. Default 250.
Step 4 — Run it
Click Save & Start → the actor boots, scrapes, and dumps the output to the default dataset (top right of your console).
For 2-store input, finishes in about 5 seconds. The dataset view auto-refreshes — you'll see products appear in real-time.
Step 5 — Export the dataset
Top-right of the dataset view, click Export → choose CSV / JSON / Excel / RSS.
Or if you prefer the API:
curl "https://api.apify.com/v2/acts/kazkn~shopify-scraper-apps-spy/runs/last/dataset/items?format=csv&token=YOUR_TOKEN" \
-o shopify-data.csv
What's in the output
One record per product (or per variant if include_variants: true). Each record carries the store-level apps stack:
{
"store_domain": "allbirds.com",
"store_meta": {
"name": "Allbirds",
"currency": "USD"
},
"product_title": "Wool Runner",
"product_handle": "mens-wool-runners",
"vendor": "Allbirds",
"product_type": "Sneakers",
"tags": ["bestseller", "wool"],
"price": 110,
"compare_at_price": 0,
"available": true,
"main_image": "https://cdn.shopify.com/...",
"apps_detected": {
"email": ["Klaviyo"],
"reviews": ["Yotpo"],
"subscriptions": [],
"popups": ["Klaviyo Forms"],
"search": ["Searchanise"],
"loyalty": ["Smile.io"]
},
"product_url": "https://allbirds.com/products/mens-wool-runners",
"scraped_at": "2026-05-02T14:30:00Z"
}
If you set extract_level: "full", reviews come in a separate named dataset called reviews, with one row per review and a product_handle foreign key to join back.
Real use-case 1 — ICP qualification for B2B outbound
Hypothesis: "Shopify stores running Klaviyo + a paid reviews app are good ICP for our retention SaaS — they spend money on retention tooling."
Workflow:
// 1. Scrape your prospect list
const input = {
store_urls: prospectList, // ~1,200 stores
extract_level: 'standard',
max_products_per_store: 50,
};
// 2. After the run, filter the dataset
const tier1 = dataset.filter(r =>
r.apps_detected.email.includes('Klaviyo') &&
(r.apps_detected.reviews.includes('Yotpo') ||
r.apps_detected.reviews.includes('Judge.me Premium'))
);
Cost: 1,200 stores × $0.005 = $6 total. Time: ~25 minutes.
For comparison, the cheapest SaaS that does this filter is $199/month with monthly export caps.
Real use-case 2 — Cold outbound personalization
Open the email with the actual stack the prospect runs. From a real test on 200 accounts, this moves reply rate from ~4% to ~11%.
Pre-call mail merge field:
const opener = (record) => {
const reviewsApp = record.apps_detected.reviews[0];
const emailApp = record.apps_detected.email[0];
if (reviewsApp === 'Judge.me' && emailApp === 'Klaviyo') {
return `Saw you're on Judge.me Free + Klaviyo — same combo we
saw at [REFERENCE_BRAND] before they...`;
}
if (reviewsApp === 'Yotpo' && emailApp === 'Klaviyo') {
return `Noticed you're running Yotpo + Klaviyo — the data
integration there is usually the bottleneck...`;
}
// ... 10-20 more conditions
return `Quick question about how you're handling [GENERIC_PROBLEM]...`;
};
The conditional opener is what unlocks the reply rate. Generic openers stay around 3-5%.
Real use-case 3 — App market-share research
Scrape 5,000 stores in your vertical once a month. Aggregate the apps_detected fields. You'll have a real-time market-share dataset for any app category.
// Aggregate email apps share over 5,000 stores
const emailShare = {};
for (const r of dataset) {
for (const app of r.apps_detected.email) {
emailShare[app] = (emailShare[app] || 0) + 1;
}
}
// emailShare = { Klaviyo: 2400, Mailchimp: 1100, Omnisend: 800, ... }
This is the kind of data BuiltWith charges $295/month for. With the actor + a one-line aggregation, you have it for $25 per refresh.
Cost math
The pricing is pay-per-event — you pay only for the rows you get, not for compute time:
-
store_analyzed— $0.003 per store -
product_extracted— $0.0005 per product -
apps_detected— $0.001 per store at standard+ -
review_extracted— $0.0003 per review
Examples:
| Run | Stores | Products avg | Reviews avg | Total |
|---|---|---|---|---|
| Small batch (Standard) | 50 | 100 | – | $0.45 |
| Medium ICP scan (Standard) | 1,000 | 100 | – | $9 |
| Full reviews pull (Full) | 100 | 500 | 50 | $30 |
| Monthly market research (Standard) | 5,000 | 100 | – | $45 |
The free $5 Apify credit covers ~1,500 stores at standard. You'd need to run several batches before paying anything.
Common gotchas
1. The store doesn't expose /products.json.
Rare, but some custom themes disable it. The actor logs a warning and falls back to scraping the sitemap.xml. Always check the run log for 404 warnings.
2. Detection misses one app.
About 70-80% of installed apps are detectable from the storefront HTML. Backend-only apps (accounting, inventory, shipping) don't load scripts — they're invisible to any scanner. If you spot a missing detector for a frontend-loading app, ping me on the actor's GitHub and I'll add it (~15-min job).
3. Rate-limit warnings on big batches.
At default concurrency (5 simultaneous requests), you should hit no limits. If you crank max_concurrent_requests to 20 and hit 429s, the actor backs off automatically with jitter.
4. Reviews on extract_level: "full" blow the budget.
A 500-product store with 100 reviews each = 50,000 review rows = $15 alone. Use max_reviews_per_product: 20 to keep costs predictable.
FAQ
Is scraping /products.json allowed?
Shopify exposes /products.json publicly on every store by default. The actor never authenticates, never bypasses access controls, and respects rate limits. For commercial use of scraped data, consult a lawyer in your jurisdiction.
Can I get one record per variant instead of per product?
Yes — set include_variants: true in the input and the dataset returns one row per SKU with size/color/price/availability normalized.
Does this work on *.myshopify.com URLs?
Yes. The actor canonicalizes URLs internally — https://yourstore.myshopify.com, https://yourstore.com, and https://www.yourstore.com all route to the same scrape.
How do I integrate this into a Make.com or Zapier workflow?
Apify has native Zapier integration — search "Apify" in Zapier triggers, choose "Run Actor", paste the input JSON. Make.com works the same way via the Apify HTTP module.
Can I run this on a schedule?
Apify supports cron-style scheduling. Click Schedule in the actor view, set the cadence (e.g., every Monday 8am), and the actor runs automatically with the same input.
Wrap up
To recap the pipeline:
- Sign up on Apify, get $5 free credit.
- Open the actor.
- Paste your store URLs, choose
standardextract level. - Run it. Wait 25 minutes for 1,000 stores.
- Export the CSV / JSON / RSS / Excel.
Total cost for 1,000 stores: about $9. The cheapest SaaS alternative I tested for the same volume was $199/month.
If a detector is missing, ping me — each is a 15-minute add. If you find a use-case I haven't documented, I'll add it here. The actor is on Apify Store: kazkn/shopify-scraper-apps-spy.
Was this useful? ❤️ a reaction or drop a comment with the use-case you're trying to solve — I read every reply and add detector + endpoint coverage based on what people actually need.
Follow @boo_n for the next tutorials in this series: scraping reviews at scale, building a Shopify ICP dataset for cold outreach, and turning the actor into an MCP tool for Claude / Cursor.
Tags: shopify, ecommerce, api, tutorial, javascript, webdev

Top comments (0)