A 6-week experiment in turning competitor research into a $0.005-per-store API.
▶️ 2-minute video walkthrough of the actor in action — input, run, dataset, API call.
There is a quiet rule in B2B sales nobody puts on a slide: the cost of qualifying a bad lead is roughly equal to the cost of closing a good one.
Six weeks ago I was trying to validate ICP for a B2B side-project targeting Shopify operators. The hypothesis was tight: "Stores running Klaviyo plus a paid reviews app spend money on retention tooling, so they will pay for ours."
To test it I needed two columns next to each store name: email provider and reviews provider. Maybe a third for subscriptions. From those three I could segment 1,200 stores into a tier-one list of about 200, and avoid wasting outreach on the rest.
The data exists. It is in the page source of every Shopify store. Apps inject <script> tags from their own CDN — cdn.judge.me, cdn.yotpo.com, loox.io/widget, klaviyo.com/onsite. Any store using Klaviyo loads a Klaviyo script. The information was right there.
But three minutes of View Source per store, times 1,200 stores, is 60 hours. I do not have 60 hours.
So I did the thing I had been avoiding for months: I wrote the scraper.
What I expected to find
I expected the scraping itself to be the hard part. It was not.
I expected proxies, retries, and rate-limit roulette. None of it materialized — /products.json is publicly served on every Shopify store and the homepage HTML is, well, a homepage. No bot challenges, no CAPTCHA, no reCAPTCHA. A polite concurrency limit of 5 simultaneous requests is enough to scan 1,000 stores in 25 minutes without anyone noticing.
What I did not expect was how much the app stack tells you about the operator.
A Shopify store on Judge.me Free is a different company than a Shopify store on Yotpo Premium. Same revenue band, same vertical, same product type — totally different stage, budget, and pain points.
- Judge.me Free → indie operator, doing under $30k/month, allergic to monthly subscriptions, will not buy your $99/month tool unless you frame it as ROI within 30 days.
- Yotpo Premium → seven-figure DTC brand, has a marketing team, will compare you against 4 competitors in a vendor matrix before signing, and will negotiate.
You can replicate this exercise across every app category:
- Klaviyo Free → still building list, every dollar matters.
- Klaviyo paid (>$100/mo) → mature email program, ready for sophisticated tooling.
- Postscript → SMS-first DTC, modern stack, probably trying everything.
- Mailchimp → legacy stack, conservative, harder to displace.
- ReCharge → subscription-driven economics, focus on retention.
- Smile.io → loyalty-conscious, willing to invest in retention tooling.
Six weeks ago I would have called this overthinking. Today I run my outbound off it. Reply rate moved from 4% to 11% on a 200-account test, simply by changing the opener line to acknowledge the actual stack the prospect was running.
The build, in three observations
Observation 1 — Most "Shopify scrapers" are not.
Every existing tool I tested fell into one of two camps. Either it scraped products only (no app detection), or it detected apps but lookup-by-lookup through a Chrome extension. Nothing did both, in batch, at low volume cost.
The closest matches were paid SaaS dashboards (Storeleads, Charm, BuiltWith) at $99 to $499 per month, with monthly export caps. For a one-off list of 1,200 stores I could not justify a $1,200 yearly subscription that I would forget to use.
Observation 2 — App detection is a 10-line regex problem, not an ML problem.
Every Shopify app I needed to detect ships through one of three patterns: a <script src="cdn.[appname].com/..."> tag in the homepage HTML, a <meta name="generator"> tag with the app name, or an inline _q.push(...) queue call. Match on any of those three, OR them together, return a boolean.
The whole detection module — covering Klaviyo, Yotpo, Judge.me, Loox, Stamped, Reviews.io, ReCharge, Bold, Skio, Privy, Justuno, Mailchimp, Postscript, Attentive, Smile.io, Searchanise, Boost, and 8 others — is about 600 lines of JavaScript including snapshot tests. New detectors are 15-minute additions.
Observation 3 — Pay-per-event pricing changes the unit economics.
Apify's Store lets developers price by the row. So a 500-product store with full app detection plus reviews costs about $0.30 to scan. A thousand-store batch costs about $3.
That number matters. At $3 per batch, refreshing my ICP list weekly is a coffee. At $300 per batch (which is what specialized SaaS would charge for the same volume), I would refresh quarterly and miss every interesting signal in between.
The cheaper the unit, the higher the refresh frequency. The higher the refresh frequency, the better the signal quality. This is true for every kind of competitive intelligence work, and it's the reason I shipped the actor as a public tool instead of keeping it private.
What surprised me
Three things I did not expect, in order of how badly I underestimated them:
1. /products.json is more honest than the storefront.
Shopify's catalog endpoint exposes products that have been unpublished from the theme but are still live in the database — out-of-stock items, B2B-only SKUs, retired collections that nobody bothered to fully delete. For research, this is gold. You see what the merchant sells today and what they sold last quarter.
2. Reviews-app detection turned out to be the strongest lead signal.
More predictive than email provider, more predictive than vertical, more predictive than location. A store paying for reviews is a store that has scaled past the early stage and is now optimizing for retention and social proof. That's where my offer lands.
3. People want this packaged as an MCP tool for Claude.
Two of the first three external users asked. I had not planned for it. The pattern is clear though — once you can pipe Shopify-store data into Claude or Cursor and ask "qualify these 200 stores for my ICP", you stop opening spreadsheets. I am building it next.
The actor, if you want to use it
I shipped the scraper on Apify Store as Shopify Apps Spy + Product Scraper.
What it does in one call:
- Pulls the full product catalog for a list of Shopify URLs (titles, prices, variants, images, vendor, tags).
- Detects installed apps across email/SMS, reviews, subscriptions, popups, search, loyalty.
- Pulls reviews when a reviews app is detected, by routing to that app's public reviews API.
What it costs:
- $0.005 per store for the standard tier (products + apps).
- $0.30 for a 500-product store with full reviews.
- Apify gives a $5 free credit on signup, which covers about 1,500 stores.
What it doesn't do:
- Historical data. If you need "who started using Klaviyo in Q1 2024," you want BuiltWith.
- Cross-platform. Shopify only. WooCommerce/Magento are different problems.
- Filtering by revenue band. Storeleads does that better.
If you're an indie founder, agency analyst, or sales rep doing 100-2,000 stores per month and you need raw exports, it should fit. If your volume is much larger or much smaller, the SaaS competitors are probably the right call.
The takeaway, if you skim
Three things I would do differently if I were starting over:
- Build the qualification tool before you start prospecting, not after. The 60-hour manual baseline is what kills the experiment. Every B2B founder I have asked has the same story.
- Treat tech-stack data as ICP data, not technographic trivia. The app a store runs is downstream of their stage, budget, and team size. Use it that way.
- Refresh weekly, not quarterly. Cheap refresh frequency beats expensive depth nine times out of ten in early-stage outbound.
The scraper is on Apify Store, free $5 credit covers your first batch. If a detector is missing, ping me — each is a 15-minute add.
FAQ
How do I detect what apps a Shopify store is using?
Apps inject identifiable scripts into the storefront HTML — cdn.judge.me, cdn.yotpo.com, klaviyo.com/onsite, etc. Either inspect the page source manually (3 minutes per store) or use Shopify Apps Spy + Product Scraper to detect 150+ apps in batch at $0.005 per store.
Is scraping Shopify legal?
Yes for publicly accessible product data. Shopify exposes /products.json on every storefront and the homepage HTML is public. No login, no API key, no proxy needed for most stores. You're reading what the merchant chose to publish.
How long does it take to scan 1,000 Shopify stores?
About 25 minutes at a polite concurrency of 5 simultaneous requests. The bottleneck is /products.json response size, not rate limits — Shopify storefronts handle this volume without complaint.
What's a realistic cost for B2B lead qualification across 1,200 Shopify stores?
Around $3 of compute on Apify's pay-per-event pricing — $0.005 per store for products + apps detection, $0.30 for full reviews. The $5 free Apify credit covers your first ~1,500 stores.
Which Shopify apps are the strongest signal of B2B SaaS fit?
Reviews providers (Yotpo Premium, Okendo, Stamped Pro) signal seven-figure DTC. Klaviyo paid plans signal a mature email program. Postscript or Attentive signal SMS-first modern stack. Smile.io signals retention-conscious operators ready to invest in tooling.
Tags: shopify, b2b, lead-generation, ecommerce, indiehackers

Top comments (0)