DEV Community

Scrapfly for Scrapfly

Posted on • Originally published at scrapfly.io on

How to Scrape StockX e-commerce Data with Python

How to Scrape StockX e-commerce Data with Python

How to Scrape StockX e-commerce Data with Python

StockX is an online marketplace for buying and selling authentic sneakers, streetwear, watches, and designer handbags. The most interesting part about StockX is that it treats apparel items as a commodity and tracks their value over time. This makes StockX a prime target for web scraping as tracking market movement and product data is a great way to build a data-driven business.

In this web scraping tutorial, we'll be taking a look at how to scrape StockX using Python. We'll be scraping StockX's hidden web data which is an incredibly easy way to scrape e-commerce websites with just a few lines of code.

We'll start with a quick Python environment setup and tool overview, then scrape some products and product search pages. Let's dive in!

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Why scrape StockX?

Just like most e-commerce targets StockX public data provides an important overview of the market. So, scraping StockX is a great way to get a competitive advantage through data-driven decision-making.

Additionally, since StockX treats its products as commodities by scraping the data we can perform various data analysis tasks to keep on top with market trends outbidding and outmaneuvering our competitors.

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Dataset Preview

Since we'll be using hidden web data scraping we'll be extracting the whole product datasets which contain fields like:

  • Product data like descriptions, images, sizes, ids, traits and everything visible on the page.
  • Product market performance like sale numbers, prices, asks and bids. <!--kg-card-end: markdown--><!--kg-card-begin: markdown-->#Example Product Dataset
{
    "id": "7cfe0c22-7e77-4e54-89ca-c03007ecbfd1",
    "listingType": "STANDARD",
    "deleted": false,
    "merchandising": {
        "title": "StockX Verified Sneakers",
        "subtitle": "We Verify Every Item. Every Time.",
        "image": {
            "alt": null,
            "url": "https: //images-cs.stockx.com/v3/assets/blt818b0c67cf450811/bltc3258254704231c0/62a8faa88f6a4950536d049f/Merchandising_Modules_EN_-_Image_02.jpg"
        },
        "body": "",
        "trackingEvent": "06-08-22 Verified Authentic Sneakers",
        "link": {
            "title": "StockX Verified Sneakers",
            "url": "https://stockx.com/about/verification/",
            "urlType": "EXTERNAL"
        }
    },
    "productCategory": "sneakers",
    "urlKey": "nike-air-max-90-se-running-club",
    "market": {
        "bidAskData": {
            "lowestAsk": 114,
            "numberOfAsks": 130,
            "highestBid": 128,
            "numberOfBids": 79
        },
        "statistics": {
            "lastSale": {
                "amount": 204,
                "changePercentage": 0.211539,
                "changeValue": 36,
                "sameFees": false
            }
        },
        "salesInformation": {
            "lastSale": 204,
            "salesLast72Hours": 3
        }
    },
    "variants": [
        {
            "id": "1f07d59f-988e-48ac-8c44-24efa4543118",
            "market": {
                "bidAskData": {
                    "lowestAsk": 237,
                    "numberOfAsks": 1,
                    "highestBid": 70,
                    "numberOfBids": 2
                },
                "statistics": {
                    "lastSale": {
                        "amount": 278,
                        "changePercentage": 0.456539,
                        "changeValue": 88,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 278,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "6"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087411"
                }
            ],
            "sizeChart": {
                "baseSize": "6",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 6",
                        "type": "us m"
                    },
                    {
                        "size": "UK 5.5",
                        "type": "uk"
                    },
                    {
                        "size": "JP 24 (US M 6)",
                        "type": "jp"
                    },
                    {
                        "size": "KR 240 (US M 6)",
                        "type": "kr"
                    },
                    {
                        "size": "EU 38.5",
                        "type": "eu"
                    },
                    {
                        "size": "US W 7.5",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "80701fe2-0488-4c57-9d38-9ac15988b87b",
            "market": {
                "bidAskData": {
                    "lowestAsk": 173,
                    "numberOfAsks": 4,
                    "highestBid": 57,
                    "numberOfBids": 2
                },
                "statistics": {
                    "lastSale": {
                        "amount": 189,
                        "changePercentage": 0.35,
                        "changeValue": 49,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 189,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "6.5"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087428"
                }
            ],
            "sizeChart": {
                "baseSize": "6.5",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 6.5",
                        "type": "us m"
                    },
                    {
                        "size": "UK 6 (EU 39)",
                        "type": "uk"
                    },
                    {
                        "size": "JP 24.5",
                        "type": "jp"
                    },
                    {
                        "size": "KR 245",
                        "type": "kr"
                    },
                    {
                        "size": "EU 39",
                        "type": "eu"
                    },
                    {
                        "size": "US W 8",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "72f405a3-2788-470d-9258-bad6cd56ad7e",
            "market": {
                "bidAskData": {
                    "lowestAsk": 164,
                    "numberOfAsks": 3,
                    "highestBid": 20,
                    "numberOfBids": 1
                },
                "statistics": {
                    "lastSale": {
                        "amount": 204,
                        "changePercentage": 0.505296,
                        "changeValue": 69,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 204,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "7"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087435"
                }
            ],
            "sizeChart": {
                "baseSize": "7",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 7",
                        "type": "us m"
                    },
                    {
                        "size": "UK 6 (EU 40)",
                        "type": "uk"
                    },
                    {
                        "size": "JP 25",
                        "type": "jp"
                    },
                    {
                        "size": "KR 250",
                        "type": "kr"
                    },
                    {
                        "size": "EU 40",
                        "type": "eu"
                    },
                    {
                        "size": "US W 8.5",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "7fa2dc2e-de81-4c3b-95c2-14fd020c5377",
            "market": {
                "bidAskData": {
                    "lowestAsk": 126,
                    "numberOfAsks": 8,
                    "highestBid": 40,
                    "numberOfBids": 3
                },
                "statistics": {
                    "lastSale": {
                        "amount": 120,
                        "changePercentage": 0.00175,
                        "changeValue": 1,
                        "sameFees": true
                    }
                },
                "salesInformation": {
                    "lastSale": 120,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "7.5"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087442"
                }
            ],
            "sizeChart": {
                "baseSize": "7.5",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 7.5",
                        "type": "us m"
                    },
                    {
                        "size": "UK 6.5",
                        "type": "uk"
                    },
                    {
                        "size": "JP 25.5",
                        "type": "jp"
                    },
                    {
                        "size": "KR 255",
                        "type": "kr"
                    },
                    {
                        "size": "EU 40.5",
                        "type": "eu"
                    },
                    {
                        "size": "US W 9",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "bed7428d-31f9-46b3-9a5b-18e5e4fbe6d1",
            "market": {
                "bidAskData": {
                    "lowestAsk": 117,
                    "numberOfAsks": 10,
                    "highestBid": 63,
                    "numberOfBids": 10
                },
                "statistics": {
                    "lastSale": {
                        "amount": 168,
                        "changePercentage": 0.084122,
                        "changeValue": 14,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 168,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "8"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087459"
                }
            ],
            "sizeChart": {
                "baseSize": "8",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 8",
                        "type": "us m"
                    },
                    {
                        "size": "UK 7",
                        "type": "uk"
                    },
                    {
                        "size": "JP 26",
                        "type": "jp"
                    },
                    {
                        "size": "KR 260",
                        "type": "kr"
                    },
                    {
                        "size": "EU 41",
                        "type": "eu"
                    },
                    {
                        "size": "US W 9.5",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "674e5c93-a8f6-41f7-a9a0-beabb7788c5a",
            "market": {
                "bidAskData": {
                    "lowestAsk": 120,
                    "numberOfAsks": 6,
                    "highestBid": 79,
                    "numberOfBids": 12
                },
                "statistics": {
                    "lastSale": {
                        "amount": 118,
                        "changePercentage": -0.077334,
                        "changeValue": -9,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 118,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "8.5"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087466"
                },
                {
                    "type": "EAN-13",
                    "identifier": "2460002040899"
                }
            ],
            "sizeChart": {
                "baseSize": "8.5",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 8.5",
                        "type": "us m"
                    },
                    {
                        "size": "UK 7.5",
                        "type": "uk"
                    },
                    {
                        "size": "JP 26.5",
                        "type": "jp"
                    },
                    {
                        "size": "KR 265",
                        "type": "kr"
                    },
                    {
                        "size": "EU 42",
                        "type": "eu"
                    },
                    {
                        "size": "US W 10",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "fe862238-749b-47ff-8913-b8f299beb9c4",
            "market": {
                "bidAskData": {
                    "lowestAsk": 114,
                    "numberOfAsks": 15,
                    "highestBid": 79,
                    "numberOfBids": 4
                },
                "statistics": {
                    "lastSale": {
                        "amount": 160,
                        "changePercentage": 0.873659,
                        "changeValue": 75,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 160,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "9"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087473"
                }
            ],
            "sizeChart": {
                "baseSize": "9",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 9",
                        "type": "us m"
                    },
                    {
                        "size": "UK 8",
                        "type": "uk"
                    },
                    {
                        "size": "JP 27",
                        "type": "jp"
                    },
                    {
                        "size": "KR 270",
                        "type": "kr"
                    },
                    {
                        "size": "EU 42.5",
                        "type": "eu"
                    },
                    {
                        "size": "US W 10.5",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "9f02d4df-bd2f-4c60-a79b-df087d597bb4",
            "market": {
                "bidAskData": {
                    "lowestAsk": 119,
                    "numberOfAsks": 13,
                    "highestBid": 78,
                    "numberOfBids": 6
                },
                "statistics": {
                    "lastSale": {
                        "amount": 168,
                        "changePercentage": 0.430113,
                        "changeValue": 51,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 168,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "9.5"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087480"
                }
            ],
            "sizeChart": {
                "baseSize": "9.5",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 9.5",
                        "type": "us m"
                    },
                    {
                        "size": "UK 8.5",
                        "type": "uk"
                    },
                    {
                        "size": "JP 27.5",
                        "type": "jp"
                    },
                    {
                        "size": "KR 275",
                        "type": "kr"
                    },
                    {
                        "size": "EU 43",
                        "type": "eu"
                    },
                    {
                        "size": "US W 11",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "79baa849-a2b7-487c-8468-6fa3314028ca",
            "market": {
                "bidAskData": {
                    "lowestAsk": 117,
                    "numberOfAsks": 11,
                    "highestBid": 92,
                    "numberOfBids": 6
                },
                "statistics": {
                    "lastSale": {
                        "amount": 156,
                        "changePercentage": -0.035684,
                        "changeValue": -5,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 156,
                    "salesLast72Hours": 1
                }
            },
            "hidden": false,
            "traits": {
                "size": "10"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087497"
                },
                {
                    "type": "EAN-13",
                    "identifier": "2460002040929"
                }
            ],
            "sizeChart": {
                "baseSize": "10",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 10",
                        "type": "us m"
                    },
                    {
                        "size": "UK 9",
                        "type": "uk"
                    },
                    {
                        "size": "JP 28",
                        "type": "jp"
                    },
                    {
                        "size": "KR 280",
                        "type": "kr"
                    },
                    {
                        "size": "EU 44",
                        "type": "eu"
                    },
                    {
                        "size": "US W 11.5",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "dbdbfc90-07fa-4574-be5c-8a53502fedd5",
            "market": {
                "bidAskData": {
                    "lowestAsk": 125,
                    "numberOfAsks": 12,
                    "highestBid": 45,
                    "numberOfBids": 3
                },
                "statistics": {
                    "lastSale": {
                        "amount": 124,
                        "changePercentage": 0.24,
                        "changeValue": 24,
                        "sameFees": true
                    }
                },
                "salesInformation": {
                    "lastSale": 124,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "10.5"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087503"
                }
            ],
            "sizeChart": {
                "baseSize": "10.5",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 10.5",
                        "type": "us m"
                    },
                    {
                        "size": "UK 9.5",
                        "type": "uk"
                    },
                    {
                        "size": "JP 28.5",
                        "type": "jp"
                    },
                    {
                        "size": "KR 285",
                        "type": "kr"
                    },
                    {
                        "size": "EU 44.5",
                        "type": "eu"
                    },
                    {
                        "size": "US W 12",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "6f9daafc-3309-42a0-8cdb-4ffb0a60a8ba",
            "market": {
                "bidAskData": {
                    "lowestAsk": 149,
                    "numberOfAsks": 7,
                    "highestBid": 90,
                    "numberOfBids": 5
                },
                "statistics": {
                    "lastSale": {
                        "amount": 218,
                        "changePercentage": 0.439591,
                        "changeValue": 67,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 218,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "11"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087510"
                }
            ],
            "sizeChart": {
                "baseSize": "11",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 11",
                        "type": "us m"
                    },
                    {
                        "size": "UK 10",
                        "type": "uk"
                    },
                    {
                        "size": "JP 29",
                        "type": "jp"
                    },
                    {
                        "size": "KR 290",
                        "type": "kr"
                    },
                    {
                        "size": "EU 45",
                        "type": "eu"
                    },
                    {
                        "size": "US W 12.5",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "40f04b53-43aa-4dd2-983b-64f1e2f14642",
            "market": {
                "bidAskData": {
                    "lowestAsk": 146,
                    "numberOfAsks": 9,
                    "highestBid": 99,
                    "numberOfBids": 5
                },
                "statistics": {
                    "lastSale": {
                        "amount": 163,
                        "changePercentage": 0,
                        "changeValue": 0,
                        "sameFees": true
                    }
                },
                "salesInformation": {
                    "lastSale": 163,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "11.5"
            },
            "gtins": [
                {
                    "type": "EAN-13",
                    "identifier": "2460002040950"
                },
                {
                    "type": "UPC",
                    "identifier": "195242087527"
                }
            ],
            "sizeChart": {
                "baseSize": "11.5",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 11.5",
                        "type": "us m"
                    },
                    {
                        "size": "UK 10.5",
                        "type": "uk"
                    },
                    {
                        "size": "JP 29.5",
                        "type": "jp"
                    },
                    {
                        "size": "KR 295",
                        "type": "kr"
                    },
                    {
                        "size": "EU 45.5",
                        "type": "eu"
                    },
                    {
                        "size": "US W 13",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "c9a34e69-96c5-4880-aa14-93d2d7cd1b11",
            "market": {
                "bidAskData": {
                    "lowestAsk": 169,
                    "numberOfAsks": 12,
                    "highestBid": 51,
                    "numberOfBids": 3
                },
                "statistics": {
                    "lastSale": {
                        "amount": 190,
                        "changePercentage": 0.292037,
                        "changeValue": 43,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 190,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "12"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087534"
                },
                {
                    "type": "EAN-13",
                    "identifier": "2000216738184"
                }
            ],
            "sizeChart": {
                "baseSize": "12",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 12",
                        "type": "us m"
                    },
                    {
                        "size": "UK 11",
                        "type": "uk"
                    },
                    {
                        "size": "JP 30",
                        "type": "jp"
                    },
                    {
                        "size": "KR 300",
                        "type": "kr"
                    },
                    {
                        "size": "EU 46",
                        "type": "eu"
                    },
                    {
                        "size": "US W 13.5",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "2a080f4a-8d04-434a-9c20-46f015256bfa",
            "market": {
                "bidAskData": {
                    "lowestAsk": 214,
                    "numberOfAsks": 4,
                    "highestBid": 128,
                    "numberOfBids": 5
                },
                "statistics": {
                    "lastSale": {
                        "amount": 187,
                        "changePercentage": 0.680219,
                        "changeValue": 76,
                        "sameFees": false
                    }
                },
                "salesInformation": {
                    "lastSale": 187,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "12.5"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087541"
                }
            ],
            "sizeChart": {
                "baseSize": "12.5",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 12.5",
                        "type": "us m"
                    },
                    {
                        "size": "UK 11.5",
                        "type": "uk"
                    },
                    {
                        "size": "JP 30.5",
                        "type": "jp"
                    },
                    {
                        "size": "KR 305",
                        "type": "kr"
                    },
                    {
                        "size": "EU 47",
                        "type": "eu"
                    },
                    {
                        "size": "US W 14",
                        "type": "us w"
                    }
                ]
            },
            "group": null
        },
        {
            "id": "6286188c-47ff-4ac5-bad6-898ea136fdeb",
            "market": {
                "bidAskData": {
                    "lowestAsk": 164,
                    "numberOfAsks": 10,
                    "highestBid": 95,
                    "numberOfBids": 6
                },
                "statistics": {
                    "lastSale": {
                        "amount": 160,
                        "changePercentage": 0.006289,
                        "changeValue": 1,
                        "sameFees": true
                    }
                },
                "salesInformation": {
                    "lastSale": 160,
                    "salesLast72Hours": 0
                }
            },
            "hidden": false,
            "traits": {
                "size": "13"
            },
            "gtins": [
                {
                    "type": "UPC",
                    "identifier": "195242087558"
                }
            ],
            "sizeChart": {
                "baseSize": "13",
                "baseType": "us m",
                "displayOptions": [
                    {
                        "size": "US M 13",
                        "type": "us m"
                    },
                    {
                        "size": "UK 12",
                        "type": "uk"
                    },
                    {
                        "size": "JP 31",

Enter fullscreen mode Exit fullscreen mode

For parsing these json datasets to something smaller see our JMESPath and JSONPath tool introductions.

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Setup

In this web scraping tutorial, we'll be using Python with three popular libraries:

  • httpx - HTTP client library which we'll use to retrieve StockX's web pages.
  • parsel - HTML parsing library which we'll use to find <script> elements containing hidden web data.
  • nested_lookup - Allows to extract nested values from a dictionary by key. Since StockX's datasets are huge and nested this library will help us find the product data quickly and reliably.

These packages can be easily installed via the pip install command:

$ pip install httpx parsel nested_lookup

Enter fullscreen mode Exit fullscreen mode

Alternatively, feel free to swap httpx out with any other HTTP client package such as requests as we'll only need basic HTTP functions which are almost interchangeable in every library. As for, parsel, another great alternative is the beautifulsoup package.

Next, let's start by taking a look at how to scrape StockX's single product data.

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Scraping Product Data

To scrape single product data we'll be using hidden web data technique.

StockX is powered by React and Next.js technologies so we'll be looking for hidden data in the <script> elements. In particular, hidden web data is usually available in one of these two places:

<script id=" __NEXT_DATA__" type="application/json">{...}</script>
<!-- or -->
<script data-name="query">window. __REACT_QUERY_STATE__ = {...};</script>

Enter fullscreen mode Exit fullscreen mode

To parse this HTML for these hidden datasets we can use XPath or CSS Selectors:

import json
from parsel import Selector

def parse_hidden_Data(html: str) -> dict:
    """extract nextjs cache from page"""
    selector = Selector(html)
    data = selector.css("script# __NEXT_DATA__ ::text").get()
    if not data:
        data = selector.css("script[data-name=query]::text").get()
        data = data.split("=", 1)[-1].strip().strip(';')
    data = json.loads(data)
    return data

Enter fullscreen mode Exit fullscreen mode

Here we're building a parsel.Selector and looking up <script> elements based on CSS selectors.

Next, let's add HTTP capabilities to complete our product scraper and let's take it for a spin:

Python

Scrapfly

import asyncio
import json
import httpx

from nested_lookup import nested_lookup
from parsel import Selector

# create HTTPX client with headers that resemble a web browser
client = httpx.AsyncClient(
    http2=True,
    follow_redirects=True,
    headers={
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9",
        "Cache-Control": "no-cache",
    },
)

def parse_nextjs(html: str) -> dict:
    """extract nextjs cache from page"""
    selector = Selector(html)
    data = selector.css("script# __NEXT_DATA__ ::text").get()
    if not data:
        data = selector.css("script[data-name=query]::text").get()
        data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data

async def scrape_product(url: str) -> dict:
    """scrape a single stockx product page for product data"""
    response = await client.get(url)
    assert response.status_code == 200
    data = parse_nextjs(response.text)
    # extract all products datasets from page cache
    products = nested_lookup("product", data)
    # find the current product dataset
    try:
        product = next(p for p in products if p.get("urlKey") in str(response.url))
    except StopIteration:
        raise ValueError("Could not find product dataset in page cache", response)
    return product

# example use:
url = "https://stockx.com/nike-air-max-90-se-running-club"
print(asyncio.run(scrape_product(url)))


import asyncio
import json

from nested_lookup import nested_lookup
from scrapfly import ScrapeApiResponse, ScrapeConfig, ScrapflyClient

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY API KEY", max_concurrency=10)

def parse_nextjs(result: ScrapeApiResponse) -> dict:
    """extract nextjs cache from page"""
    data = result.selector.css("script# __NEXT_DATA__ ::text").get()
    if not data:
        data = result.selector.css("script[data-name=query]::text").get()
        data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data

async def scrape_product(url: str) -> dict:
    """scrape a single stockx product page for product data"""
    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            country="US",
            asp=True,
        )
    )
    data = parse_nextjs(result)
    # extract all products datasets from page cache
    products = nested_lookup("product", data)
    # find the current product dataset
    try:
        product = next(p for p in products if p.get("urlKey") in result.context["url"])
    except StopIteration:
        raise ValueError("Could not find product dataset in page cache", result.context)
    return product

# example use:
url = "https://stockx.com/nike-air-max-90-se-running-club"
print(asyncio.run(scrape_product(url)))

Enter fullscreen mode Exit fullscreen mode

Above, in just a few lines of code, we've scraped the entire product's dataset available on StockX's website.

Now that we can scrape a single item, let's take a look at how to find products on StockX to scrape all data or just select categories.

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Scraping Search

To discover products we have two choices: sitemaps and search.

Sitemaps are ideal for discovering all products and they can usually be found by inspecting /robots.txt URL. For example, StockX's robots.txt indicates this:

Sitemap: https://stockx.com/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/it-it/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/de-de/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/fr-fr/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/ja-jp/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/zh-cn/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/en-gb/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/ko-kr/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/es-es/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/es-mx/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/es-us/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/zh-tw/sitemap/sitemap-index.xml
Sitemap: https://stockx.com/fr-ca/sitemap/sitemap-index.xml

Enter fullscreen mode Exit fullscreen mode

So we could scrape /sitemap/sitemap-index.xml where every product URL is located. However, if we want to narrow down our scope and scrape specific items then we can scrape StockX's search pages. Let's take a look how can we do that.

To start, we can see that StockX's search is capable of searching by product category and query:

How to Scrape StockX e-commerce Data with Python

Each of these search pages can be further refined and sorted which results in a unique URL.

For this example, let's take the top sold apparel items that match the query "indigo":

How to Scrape StockX e-commerce Data with Python

Which takes us to the final url stockx.com/search/apparel?s=indigo

To scrape this we'll be using the same hidden web data approach as before. The hidden data is located in the same place so we can reuse our parse_hidden_data() function though this time around it only contains product preview data rather than the whole datasets.

Python

Scrapfly

import asyncio
import json
import math
from typing import Dict, List
import httpx

from nested_lookup import nested_lookup
from parsel import Selector

# create HTTPX client with headers that resemble a web browser
client = httpx.AsyncClient(
    http2=True,
    follow_redirects=True,
    limits=httpx.Limits(max_connections=3), # keep this low to avoid being blocked
    headers={
        "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
        "Accept-Encoding": "gzip, deflate, br",
        "Accept-Language": "en-US,en;q=0.9",
        "Cache-Control": "no-cache",
    },
)

# From previous chapter:
def parse_nextjs(html: str) -> dict:
    """extract nextjs cache from page"""
    selector = Selector(html)
    data = selector.css("script# __NEXT_DATA__ ::text").get()
    if not data:
        data = selector.css("script[data-name=query]::text").get()
        data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data

async def scrape_search(url: str, max_pages: int = 25) -> List[Dict]:
    """Scrape StockX search"""
    print(f"scraping first search page: {url}")
    first_page = await client.get(url)
    assert first_page.status_code == 200, "scrape was blocked" # this should be retried, handled etc.

    # parse first page for product search data and total amount of pages:
    data = parse_nextjs(first_page.text)
    _first_page_results = nested_lookup("results", data)[0]
    _paging_info = _first_page_results["pageInfo"]
    total_pages = _paging_info["pageCount"] or math.ceil(_paging_info["total"] / _paging_info["limit"]) # note: pageCount can be missing but we can calculate it ourselves
    if max_pages < total_pages:
        total_pages = max_pages

    product_previews = [edge["node"] for edge in _first_page_results["edges"]]

    # then scrape other pages concurrently:
    print(f" scraping remaining {total_pages - 1} search pages")
    _other_pages = [ # create GET task for each page url
        asyncio.create_task(client.get(f"{first_page.url}&page={page}"))
        for page in range(2, total_pages + 1)
    ]
    for response in asyncio.as_completed(_other_pages): # run all tasks concurrently
        response = await response
        data = parse_nextjs(response.text)
        _page_results = nested_lookup("results", data)[0]
        product_previews.extend([edge["node"] for edge in _page_results["edges"]])
    return product_previews

# example run
result = asyncio.run(scrape_search("https://stockx.com/search/sneakers/top-selling?s=indigo", max_pages=2))
print(json.dumps(result, indent=2))


import asyncio
import json
import math
from typing import Dict, List

from nested_lookup import nested_lookup
from scrapfly import ScrapeApiResponse, ScrapeConfig, ScrapflyClient

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY", max_concurrency=10)

def parse_nextjs(result: ScrapeApiResponse) -> dict:
    """extract nextjs cache from page"""
    data = result.selector.css("script# __NEXT_DATA__ ::text").get()
    if not data:
        data = result.selector.css("script[data-name=query]::text").get()
        data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data

async def scrape_search(url: str, max_pages: int = 25) -> List[Dict]:
    """Scrape StockX search"""
    print(f"scraping first search page: {url}")
    first_page = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            country="US",
            asp=True,
        )
    )
    # parse first page for product search data and total amount of pages:
    data = parse_nextjs(first_page)
    _first_page_results = nested_lookup("results", data)[0]
    _paging_info = _first_page_results["pageInfo"]
    total_pages = _paging_info["pageCount"] or math.ceil(_paging_info["total"] / _paging_info["limit"])
    if max_pages < total_pages:
        total_pages = max_pages

    product_previews = [edge["node"] for edge in _first_page_results["edges"]]

    # then scrape other pages concurrently:
    print(f" scraping remaining {total_pages - 1} search pages")
    _other_pages = [
        ScrapeConfig(
            url=f"{first_page.context['url']}&page={page}",
            country="US",
            asp=True,
        )
        for page in range(2, total_pages + 1)
    ]
    async for result in scrapfly.concurrent_scrape(_other_pages):
        data = parse_nextjs(result)
        _page_results = nested_lookup("results", data)[0]
        product_previews.extend([edge["node"] for edge in _page_results["edges"]])
    return product_previews

# example run
result = asyncio.run(scrape_search("https://stockx.com/search/sneakers/top-selling?s=indigo"))
print(json.dumps(result, indent=2))

Enter fullscreen mode Exit fullscreen mode

While this product preview data offers a lot of data we might want to scrape the entire product dataset using the product scraper we wrote in the previous chapter. See the urlKey field for the full product URL.

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Bypass Blocking with Scrapfly

StockX is a popular website and it's not uncommon for them to block scraping attempts. To scale up our scraper and bypass blocking we can use Scrapfly's web scraping API which fortifies scrapers against blocking and much more!

How to Scrape StockX e-commerce Data with Python
Scrapfly service does the heavy lifting for you!

Scrapfly API is a middleware service that sits between your scraper and the target website. It handles all the heavy lifting of scraping and proxies so you can focus on building your scraper. To add Scrapfly offers a bunch of other convenient features like:

Using Python-SDK we can easily integrate Scrapfly into our Python scrapers:

from scrapfly import ScrapflyClient, ScrapeConfig
scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY")

result = scrapfly.scrape(ScrapeConfig(
    "https://stockx.com/search/apparel/top-selling?s=indigo",
    # anti scraping protection bypass
    asp=True, 
    # proxy country selection
    country="US",
    # we can enable features like:
    # cloud headless browser use
    render_js=True,  
    # screenshot taking
    screenshots={"all": "fullpage"},
))

# full result data
print(result.content) # html body
print(result.selector.css("h1")) # CSS selector and XPath parser built-in

Enter fullscreen mode Exit fullscreen mode

For more on scraping StockX using Scrapfly SDK see Full Scraper Code section.

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

FAQ

To wrap up this scrape guide, let's take a look at frequently asked questions regarding scraping of StockX:

Is it legal to scrape StockX.com?

Yes, it is legal to scrape StockX.com. StockX e-commerce data is publically available and as long as the scraper doesn't inflict damages to the website it's perfectly legal to scrape.

Can StockX.com be crawled?

Yes, StockX.com can be crawled. Crawling is an alternative web scraping approach where the scraper is capable of discovering pages on its own. StockX offers sitemaps and recommended product areas that can be used to develop crawling logic. For more see our Crawling With Python introduction.

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Summary

In this guide, we've learned how to scrape StockX.com using Python and a few community packages.

For this, we've used the hidden web data scraping technique where instead of traditional HTML parsing we retrieve the product HTML pages and extracted Javascript cache data.

With just a few lines of Python code, we've extracted the entire product dataset from StockX.com.

We've also taken a look at how to discover StockX product pages using sitemaps or search pages.

To scale up our scraper we've also taken a look at Scrapfly API which fortifies scrapers against blocking and much more - try it out for free!

<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->
Get Your FREE API KeyDiscover ScrapFly
<!--kg-card-end: markdown--><!--kg-card-begin: markdown-->

Full Scraper Code

Here's the full StockX scraper code using Python and Scrapfly Python SDK:

💙 This code should only be used as a reference. To scrape data from StockX at scale you'll need to adjust it to your preferences and environment

import asyncio
import math
import json
from pathlib import Path
from typing import Dict, List
from nested_lookup import nested_lookup
from scrapfly import ScrapeConfig, ScrapflyClient, ScrapeApiResponse

scrapfly = ScrapflyClient(key="YOUR SCRAPFLY KEY", max_concurrency=10)

def parse_nextjs(result: ScrapeApiResponse) -> dict:
    """extract nextjs cache from page"""
    data = result.selector.css("script# __NEXT_DATA__ ::text").get()
    if not data:
        data = result.selector.css("script[data-name=query]::text").get()
        data = data.split("=", 1)[-1].strip().strip(";")
    data = json.loads(data)
    return data

async def scrape_product(url: str) -> dict:
    """scrape a single stockx product page for product data"""
    result = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            country="US",
            asp=True,
        )
    )
    data = parse_nextjs(result)
    # extract all products datasets from page cache
    products = nested_lookup("product", data)
    # find the current product dataset
    try:
        product = next(p for p in products if p.get("urlKey") in result.context["url"])
    except StopIteration:
        raise ValueError("Could not find product dataset in page cache", result.context)
    return product

async def scrape_search(url: str, max_pages: int = 25) -> List[Dict]:
    """Scrape StockX search"""
    print(f"scraping first search page: {url}")
    first_page = await scrapfly.async_scrape(
        ScrapeConfig(
            url=url,
            country="US",
            asp=True,
        )
    )
    # parse first page for product search data and total amount of pages:
    data = parse_nextjs(first_page)
    _first_page_results = nested_lookup("results", data)[0]
    _paging_info = _first_page_results["pageInfo"]
    total_pages = _paging_info["pageCount"] or math.ceil(_paging_info["total"] / _paging_info["limit"])
    if max_pages < total_pages:
        total_pages = max_pages

    product_previews = [edge["node"] for edge in _first_page_results["edges"]]

    # then scrape other pages concurrently:
    print(f" scraping remaining {total_pages - 1} search pages")
    _other_pages = [
        ScrapeConfig(
            url=f"{first_page.context['url']}&page={page}",
            country="US",
            asp=True,
        )
        for page in range(2, total_pages + 1)
    ]
    async for result in scrapfly.concurrent_scrape(_other_pages):
        data = parse_nextjs(result)
        _page_results = nested_lookup("results", data)[0]
        product_previews.extend([edge["node"] for edge in _page_results["edges"]])
    return product_previews

async def example_run():
    """
    this example run will scrape example product and 2 pages of search results and 
    save them to ./results/product.json and ./results/search.json respectively
    """
    out_dir = Path( __file__ ).parent / "results"
    out_dir.mkdir(exist_ok=True)

    product = await scrape_product("https://stockx.com/nike-x-stussy-bucket-hat-black")
    out_dir.joinpath("product.json").write_text(json.dumps(product, indent=2, ensure_ascii=False))

    search = await scrape_search("https://stockx.com/search/sneakers/top-selling?s=indigo", max_pages=2)
    out_dir.joinpath("search.json").write_text(json.dumps(search, indent=2, ensure_ascii=False))

if __name__ == " __main__":
    asyncio.run(example_run())

Enter fullscreen mode Exit fullscreen mode

<!--kg-card-end: markdown--><!--kg-card-begin: html-->{<br> &quot;@context&quot;: &quot;<a href="https://schema.org">https://schema.org</a>&quot;,<br> &quot;@type&quot;: &quot;FAQPage&quot;,<br> &quot;mainEntity&quot;: [<br> {<br> &quot;@type&quot;: &quot;Question&quot;,<br> &quot;name&quot;: &quot;Is it legal to scrape StockX.com?&quot;,<br> &quot;acceptedAnswer&quot;: {<br> &quot;@type&quot;: &quot;Answer&quot;,<br> &quot;text&quot;: &quot;<p>Yes, it is legal to scrape StockX.com. StockX e-commerce data is publically available and as long as the scraper doesn&#39;t inflict damages to the website it&#39;s perfectly legal to scrape.</p>&quot;<br> }<br> },<br> {<br> &quot;@type&quot;: &quot;Question&quot;,<br> &quot;name&quot;: &quot;Can StockX.com be crawled?&quot;,<br> &quot;acceptedAnswer&quot;: {<br> &quot;@type&quot;: &quot;Answer&quot;,<br> &quot;text&quot;: &quot;<p>Yes, StockX.com can be crawled. Crawling is an alternative web scraping approach where the scraper is capable of discovering pages on its own. StockX offers sitemaps and recommended product areas that can be used to develop crawling logic. For more see our <a class=\"text-reference\" href=\"https://scrapfly.io/blog/crawling-with-python/\">Crawling With Python</a> introduction.</p>&quot;<br> }<br> }<br> ]<br> }<!--kg-card-end: html-->

Top comments (0)