agenthustler

Posted on Mar 26 • Edited on Apr 19

How to Scrape Shopify Stores in 2026: Products, Prices, and Inventory

#webscraping #python #tutorial #ecommerce

There are over 4.8 million active Shopify stores worldwide. If you're doing competitor research, building a price comparison tool, tracking product trends, or sourcing for dropshipping, Shopify stores are a goldmine of structured product data.

The best part? Shopify has a little-known public endpoint that makes scraping significantly easier than most ecommerce platforms.

The Shopify products.json Endpoint

Every Shopify store exposes a public JSON API at /products.json. No authentication needed:

https://store-name.myshopify.com/products.json
https://custom-domain.com/products.json

This returns up to 250 products per page with full details: titles, descriptions, prices, variants, images, inventory availability, and more.

Basic Shopify Scraper

Here's a straightforward Python scraper that pulls every product from a Shopify store:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Exporting to CSV

def export_to_csv(products: list[dict], filename: str = "products.csv"):
    """Export scraped products to CSV."""
    if not products:
        return

    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=products[0].keys())
        writer.writeheader()
        writer.writerows(products)

    print(f"Exported {len(products)} products to {filename}")

export_to_csv(products)

Scraping Multiple Stores for Competitive Analysis

For comparing competitors, batch scrape and save each store separately:

import json
from pathlib import Path

COMPETITORS = [
    "allbirds.com",
    "gymshark.com",
    "fashionnova.com",
    "colourpop.com",
    "kyliecosmetics.com",
]

def scrape_competitors(domains: list[str], output_dir: str = "data"):
    """Scrape multiple Shopify stores and save results."""
    Path(output_dir).mkdir(exist_ok=True)

    for domain in domains:
        print(f"\nScraping {domain}...")
        try:
            products = scrape_shopify_store(domain)

            output_file = Path(output_dir) / f"{domain.replace('.', '_')}.json"
            with open(output_file, "w") as f:
                json.dump(products, f, indent=2)

            print(f"  Saved {len(products)} variants to {output_file}")
        except Exception as e:
            print(f"  Failed: {e}")

        time.sleep(5)  # Be nice between stores

scrape_competitors(COMPETITORS)

Detecting if a Site Runs Shopify

Not sure if a website is on Shopify? Check programmatically:

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Handling Common Challenges

Rate Limiting (HTTP 430)

Shopify returns status 430 when you hit their rate limit. Solutions:

Add delays — 1-2 seconds between pages, 5+ seconds between stores
Use ScraperAPI — rotates IPs automatically so you spread requests across many addresses
Use ThorData residential proxies — a pool of real residential IPs makes your requests look like normal browser traffic

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Stores That Block products.json

Some merchants disable the public endpoint. In that case:

Scrape the collection pages HTML instead (/collections/all)
Use ScraperAPI with render=true to execute JavaScript
Check out pre-built Shopify scrapers on Apify that handle edge cases like custom themes and pagination

Getting Collection and Category Data

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Building a Price Monitor

The real power is in running this daily to track price changes:

import sqlite3
from datetime import datetime

def init_db(db_path: str = "prices.db"):
    conn = sqlite3.connect(db_path)
    conn.execute("""
        CREATE TABLE IF NOT EXISTS price_history (
            domain TEXT,
            sku TEXT,
            title TEXT,
            price REAL,
            compare_at_price REAL,
            available BOOLEAN,
            scraped_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
        )
    """)
    return conn

def record_prices(conn, domain: str, products: list[dict]):
    for p in products:
        conn.execute(
            "INSERT INTO price_history (domain, sku, title, price, compare_at_price, available) VALUES (?, ?, ?, ?, ?, ?)",
            (domain, p["sku"], p["title"], float(p["price"]),
             float(p["compare_at_price"]) if p["compare_at_price"] else None,
             p["available"])
        )
    conn.commit()

# Run daily via cron
conn = init_db()
products = scrape_shopify_store("competitor.com")
record_prices(conn, "competitor.com", products)

Set this up as a cron job and you have a full competitor price tracker running for free.

Legal Notes

The products.json endpoint serves publicly available data that Shopify intentionally exposes. That said:

Don't hammer stores with high-frequency requests
Don't scrape and republish product descriptions (copyright)
Don't use scraped data for fake reviews or misleading comparisons
Respect rate limits — the 430 status exists for a reason

Conclusion

Shopify's public JSON API makes it one of the easiest ecommerce platforms to scrape. For stores that block the endpoint, tools like ScraperAPI and ThorData proxies handle the heavy lifting. And if you'd rather skip the code, Apify has cloud-based Shopify scrapers ready to go.

Happy scraping — let me know in the comments what you're building!

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.