DEV Community

Cover image for How to scrape any Shopify store's apps + product catalog in one API call (full tutorial)
Boon
Boon

Posted on

How to scrape any Shopify store's apps + product catalog in one API call (full tutorial)

Watch the 3-minute walkthrough

▶️ 3-minute video walkthrough: input → run → dataset → API call.

TL;DR — Step-by-step walkthrough: paste a list of Shopify URLs, get back products + installed apps + reviews in JSON. No headless browser, ~$0.005 per store, runs in batch. We'll cover the input schema, three real use-cases (ICP qualification, cold outbound personalization, app market-share research), and the cost math. Live on Apify Store: Shopify Apps Spy + Product Scraper.


What you'll have at the end of this tutorial

A working pipeline that turns this:

https://allbirds.com
https://gymshark.com
https://manukora.com
Enter fullscreen mode Exit fullscreen mode

…into a CSV like this:

store_domain,product_title,price,available,email_app,reviews_app,subs_app,product_url
allbirds.com,Wool Runner,110,true,Klaviyo,Yotpo,,https://allbirds.com/products/...
gymshark.com,Vital Seamless Bra,40,true,Klaviyo,Stamped,ReCharge,https://gymshark.com/...
manukora.com,UMF 20+ Honey,109,true,Postscript,Judge.me,Skio,https://manukora.com/...
Enter fullscreen mode Exit fullscreen mode

Total time end-to-end: about 3 minutes for 50 stores.


Step 1 — Sign up to Apify (free, no card needed)

Go to apify.com and create an account. You get $5 of free credit on signup, which covers about 1,500 store scans at the standard tier. No credit card required.


Step 2 — Open the actor

Navigate to Shopify Apps Spy + Product Scraper on Apify Store.

Click "Try for free" → the actor opens in your console with a default input ready to run.


Step 3 — Configure the input

The input has 9 fields. The 3 you care about for your first run:

{
  "store_urls": [
    "https://allbirds.com",
    "https://gymshark.com"
  ],
  "extract_level": "standard",
  "max_products_per_store": 250
}
Enter fullscreen mode Exit fullscreen mode

store_urls: list of Shopify store URLs. Works with any custom domain, *.myshopify.com URLs, or store homepages. Cap is 100 stores per single run.

extract_level: choose what to pull.

Level Outputs Cost per store (avg)
basic products only $0.001
standard products + apps installed $0.005
full + reviews from detected app $0.30
pro + revenue estimation (placeholder, J4) TBD

For 95% of use-cases, standard is the sweet spot — you get the apps stack which is the actual signal.

max_products_per_store: cap to avoid runaway costs on 50,000-product mega-stores. Default 250.


Step 4 — Run it

Click Save & Start → the actor boots, scrapes, and dumps the output to the default dataset (top right of your console).

For 2-store input, finishes in about 5 seconds. The dataset view auto-refreshes — you'll see products appear in real-time.


Step 5 — Export the dataset

Top-right of the dataset view, click Export → choose CSV / JSON / Excel / RSS.

Or if you prefer the API:

curl "https://api.apify.com/v2/acts/kazkn~shopify-scraper-apps-spy/runs/last/dataset/items?format=csv&token=YOUR_TOKEN" \
  -o shopify-data.csv
Enter fullscreen mode Exit fullscreen mode

What's in the output

One record per product (or per variant if include_variants: true). Each record carries the store-level apps stack:

{
  "store_domain": "allbirds.com",
  "store_meta": {
    "name": "Allbirds",
    "currency": "USD"
  },
  "product_title": "Wool Runner",
  "product_handle": "mens-wool-runners",
  "vendor": "Allbirds",
  "product_type": "Sneakers",
  "tags": ["bestseller", "wool"],
  "price": 110,
  "compare_at_price": 0,
  "available": true,
  "main_image": "https://cdn.shopify.com/...",
  "apps_detected": {
    "email": ["Klaviyo"],
    "reviews": ["Yotpo"],
    "subscriptions": [],
    "popups": ["Klaviyo Forms"],
    "search": ["Searchanise"],
    "loyalty": ["Smile.io"]
  },
  "product_url": "https://allbirds.com/products/mens-wool-runners",
  "scraped_at": "2026-05-02T14:30:00Z"
}
Enter fullscreen mode Exit fullscreen mode

If you set extract_level: "full", reviews come in a separate named dataset called reviews, with one row per review and a product_handle foreign key to join back.


Real use-case 1 — ICP qualification for B2B outbound

Hypothesis: "Shopify stores running Klaviyo + a paid reviews app are good ICP for our retention SaaS — they spend money on retention tooling."

Workflow:

// 1. Scrape your prospect list
const input = {
  store_urls: prospectList, // ~1,200 stores
  extract_level: 'standard',
  max_products_per_store: 50,
};

// 2. After the run, filter the dataset
const tier1 = dataset.filter(r => 
  r.apps_detected.email.includes('Klaviyo') &&
  (r.apps_detected.reviews.includes('Yotpo') || 
   r.apps_detected.reviews.includes('Judge.me Premium'))
);
Enter fullscreen mode Exit fullscreen mode

Cost: 1,200 stores × $0.005 = $6 total. Time: ~25 minutes.

For comparison, the cheapest SaaS that does this filter is $199/month with monthly export caps.


Real use-case 2 — Cold outbound personalization

Open the email with the actual stack the prospect runs. From a real test on 200 accounts, this moves reply rate from ~4% to ~11%.

Pre-call mail merge field:

const opener = (record) => {
  const reviewsApp = record.apps_detected.reviews[0];
  const emailApp = record.apps_detected.email[0];

  if (reviewsApp === 'Judge.me' && emailApp === 'Klaviyo') {
    return `Saw you're on Judge.me Free + Klaviyo — same combo we
            saw at [REFERENCE_BRAND] before they...`;
  }
  if (reviewsApp === 'Yotpo' && emailApp === 'Klaviyo') {
    return `Noticed you're running Yotpo + Klaviyo — the data 
            integration there is usually the bottleneck...`;
  }
  // ... 10-20 more conditions
  return `Quick question about how you're handling [GENERIC_PROBLEM]...`;
};
Enter fullscreen mode Exit fullscreen mode

The conditional opener is what unlocks the reply rate. Generic openers stay around 3-5%.


Real use-case 3 — App market-share research

Scrape 5,000 stores in your vertical once a month. Aggregate the apps_detected fields. You'll have a real-time market-share dataset for any app category.

// Aggregate email apps share over 5,000 stores
const emailShare = {};
for (const r of dataset) {
  for (const app of r.apps_detected.email) {
    emailShare[app] = (emailShare[app] || 0) + 1;
  }
}
// emailShare = { Klaviyo: 2400, Mailchimp: 1100, Omnisend: 800, ... }
Enter fullscreen mode Exit fullscreen mode

This is the kind of data BuiltWith charges $295/month for. With the actor + a one-line aggregation, you have it for $25 per refresh.


Cost math

The pricing is pay-per-event — you pay only for the rows you get, not for compute time:

  • store_analyzed — $0.003 per store
  • product_extracted — $0.0005 per product
  • apps_detected — $0.001 per store at standard+
  • review_extracted — $0.0003 per review

Examples:

Run Stores Products avg Reviews avg Total
Small batch (Standard) 50 100 $0.45
Medium ICP scan (Standard) 1,000 100 $9
Full reviews pull (Full) 100 500 50 $30
Monthly market research (Standard) 5,000 100 $45

The free $5 Apify credit covers ~1,500 stores at standard. You'd need to run several batches before paying anything.


Common gotchas

1. The store doesn't expose /products.json.
Rare, but some custom themes disable it. The actor logs a warning and falls back to scraping the sitemap.xml. Always check the run log for 404 warnings.

2. Detection misses one app.
About 70-80% of installed apps are detectable from the storefront HTML. Backend-only apps (accounting, inventory, shipping) don't load scripts — they're invisible to any scanner. If you spot a missing detector for a frontend-loading app, ping me on the actor's GitHub and I'll add it (~15-min job).

3. Rate-limit warnings on big batches.
At default concurrency (5 simultaneous requests), you should hit no limits. If you crank max_concurrent_requests to 20 and hit 429s, the actor backs off automatically with jitter.

4. Reviews on extract_level: "full" blow the budget.
A 500-product store with 100 reviews each = 50,000 review rows = $15 alone. Use max_reviews_per_product: 20 to keep costs predictable.


FAQ

Is scraping /products.json allowed?

Shopify exposes /products.json publicly on every store by default. The actor never authenticates, never bypasses access controls, and respects rate limits. For commercial use of scraped data, consult a lawyer in your jurisdiction.

Can I get one record per variant instead of per product?

Yes — set include_variants: true in the input and the dataset returns one row per SKU with size/color/price/availability normalized.

Does this work on *.myshopify.com URLs?

Yes. The actor canonicalizes URLs internally — https://yourstore.myshopify.com, https://yourstore.com, and https://www.yourstore.com all route to the same scrape.

How do I integrate this into a Make.com or Zapier workflow?

Apify has native Zapier integration — search "Apify" in Zapier triggers, choose "Run Actor", paste the input JSON. Make.com works the same way via the Apify HTTP module.

Can I run this on a schedule?

Apify supports cron-style scheduling. Click Schedule in the actor view, set the cadence (e.g., every Monday 8am), and the actor runs automatically with the same input.


Wrap up

To recap the pipeline:

  1. Sign up on Apify, get $5 free credit.
  2. Open the actor.
  3. Paste your store URLs, choose standard extract level.
  4. Run it. Wait 25 minutes for 1,000 stores.
  5. Export the CSV / JSON / RSS / Excel.

Total cost for 1,000 stores: about $9. The cheapest SaaS alternative I tested for the same volume was $199/month.

If a detector is missing, ping me — each is a 15-minute add. If you find a use-case I haven't documented, I'll add it here. The actor is on Apify Store: kazkn/shopify-scraper-apps-spy.


Was this useful? ❤️ a reaction or drop a comment with the use-case you're trying to solve — I read every reply and add detector + endpoint coverage based on what people actually need.

Follow @boo_n for the next tutorials in this series: scraping reviews at scale, building a Shopify ICP dataset for cold outreach, and turning the actor into an MCP tool for Claude / Cursor.


Tags: shopify, ecommerce, api, tutorial, javascript, webdev

Top comments (0)