Crawlbase

Posted on Jun 9 • Originally published at crawlbase.com

How to Use Web Scraping for Price Intelligence

#pythonintelligence

This blog was initially posted to Crawlbase Blog

In this age, it is a necessity to have some automation when handling a wide-scope and complex task such as price intelligence. So, in this post, we’ll help you automate the process and guide you on how to build a pricing scraper with Python and Crawlbase that can make this task faster, easier, and more reliable.

Selecting Target Websites (Amazon and eBay)

Before you start writing code, it’s important to plan ahead: decide which websites you want to target, what data points you need to extract, and how you plan to use that data. Many developers naturally begin with popular sites like Amazon and eBay because they offer rich, diverse datasets that are valuable for applications like price intelligence.

For this coding exercise, we'll focus on the following SERP URLs as our targets:
Amazon
eBay

Set Up Your Workspace

At this point, you’ll need to choose your preferred programming language. For this example, we’ll use Python, as it’s widely considered one of the easiest languages to get started with. That said, you're free to use any other language; follow along and apply the same logic provided here.

Setup Your Coding Environment

Install Python 3 on your computer.
Create a root directory in our filesystem.
Under the root directory, create a file named requirements.txt, and add the following entries:

requests
tabulate
price_parser
pandas

Then run:

python -m pip install -r requirements.txt

Obtain Your API Credentials
1. Create an account at Crawlbase and log in
2. After registration, you will receive 1,000 free requests
3. Go to your account dashboard and copy your Crawling API Normal requests token

Sending Requests Using Crawlbase

Let’s start by building the scraper for the Amazon Search Engine Results Page. In the root directory of your project, create a file named amazon_serp_scraper.py, and follow along.

Add Import statements to your script.

import requests
import urllib.parse
from requests.exceptions import RequestException

Add function searches for products on Amazon using a given query, with optional parameters for the country code and top-level domain (such as .com or .co.uk), and returns a list of product dictionaries.

def get_products_from_amazon(query: str, country: str = None, top_level_domain: str = 'com') -> list[dict]:

Sets up your API token and Crawlbase endpoint to send the request.

API_TOKEN = "<Normal requests token>"
API_ENDPOINT = "https://api.crawlbase.com/"

Construct the parameters for the GET request by encoding the search query for the Amazon URL and specifying the Amazon SERP scraper along with an optional country parameter.

params = {
    "token": API_TOKEN,
    "url": f"https://www.amazon.{top_level_domain}/s?k={urllib.parse.quote_plus(query)}",
    "scraper": "amazon-serp",
    "country": country
}

Send the request to the Crawling API, then parse and return the product data.

 response = requests.get(API_ENDPOINT, params=params)
 response.raise_for_status()

 result = response.json()
 return result['body']['products']

To execute the program, insert the following code snippet to the end of the script:

if __name__ == "__main__":

    import json

    products = get_products_from_amazon("Apple iPhone 15 Pro Max 256GB", country="US", top_level_domain="co.uk")
    pretty_json = json.dumps(products, indent=2)
    print(pretty_json)

You can modify the values in the function call to suit your needs. For example, replace "Apple iPhone 15 Pro Max 256GB" with any other product name, change "US" to a different country code, or update "co.uk" to another Amazon domain such as "com" or "de".

Complete code example:

import requests
import urllib.parse
from requests.exceptions import RequestException

def get_products_from_amazon(query: str, country: str = None, top_level_domain: str = 'com') -> list[dict]:
    API_TOKEN = "<Normal requests token>"
    API_ENDPOINT = "https://api.crawlbase.com/"

    params = {
        "token": API_TOKEN,
        "url": f"https://www.amazon.{top_level_domain}/s?k={urllib.parse.quote_plus(query)}",
        "scraper": "amazon-serp",
        "country": country
    }

    response = requests.get(API_ENDPOINT, params=params)
    response.raise_for_status()

    result = response.json()

    return result['body']['products']

if __name__ == "__main__":

    import json

    products = get_products_from_amazon("Apple iPhone 15 Pro Max 256GB", country="US", top_level_domain="co.uk")
    pretty_json = json.dumps(products, indent=2)
    print(pretty_json)

Run the script from the terminal using the following command:

python amazon_serp_scraper.py

Once successful, you will see the following raw data output:

[
  {
    "name": "Apple iPhone 16 Pro Max 256GB: 5G Mobile phone with Apple Intelligence - Desert Titanium + Silicone Case with MagSafe - Black",
    "price": "\u00a31,138.00",
    "rawPrice": 1138.0,
    "currency": "\u00a3",
    "offer": "",
    "customerReview": "4.4 out of 5 stars",
    "customerReviewCount": "367",
    "shippingMessage": "",
    "asin": "B0DGTJ6Y1S",
    "image": "https://m.media-amazon.com/images/I/61EpQCNARNL._AC_UY218_.jpg",
    "url": "https://www.amazon.co.uk/Apple-iPhone-Pro-Max-256GB/dp/B0DGTJ6Y1S/ref=sr_1_1?dib=eyJ2IjoiMSJ9.ZChvrJybm7TlnZ2-2tQPcAJhiEM1rKrU7CkKwkwiDWAnDvRZyGd490ktAc-ukHTMrhCNjGZN-mUv_pB9jpM_b-kakh857EvjHVDsbaPlnqdaWdgP8h8JlYlqZZnnW7Y8aaJ_8IdO3jTMYnwEljkT641W-0jpmOwTktsR4YGToL3KgkE6J14jT_5xU3EZNFkl_L1IYUL72a1mwtfhDapB17WcNOKS6lxZeGSha2Sw1BA.ZYtrbxfYI-d4vIcsxiU9hG5ahBmcc5rtSNtz9VB-nc0&dib_tag=se&keywords=Apple+iPhone+15+Pro+Max+256GB&qid=1748790249&sr=8-1",
    "isPrime": false,
    "sponsoredAd": false,
    "couponInfo": "",
    "badgesInfo": [],
    "boughtInfo": ""
  },
  // Note: some results have been omitted for brevity.
  {
    "name": "Apple iPhone 11 Pro Max, 256GB, Midnight Green (Renewed)",
    "price": "\u00a3264.00",
    "rawPrice": 264.0,
    "currency": "\u00a3",
    "offer": "",
    "customerReview": "4.1 out of 5 stars",
    "customerReviewCount": "258",
    "shippingMessage": "",
    "asin": "B082BGTHP6",
    "image": "https://m.media-amazon.com/images/I/71g5LVVdbaL._AC_UY218_.jpg",
    "url": "https://www.amazon.co.uk/Apple-iPhone-256GB-Midnight-Renewed/dp/B082BGTHP6/ref=sr_1_16?dib=eyJ2IjoiMSJ9.ZChvrJybm7TlnZ2-2tQPcAJhiEM1rKrU7CkKwkwiDWAnDvRZyGd490ktAc-ukHTMrhCNjGZN-mUv_pB9jpM_b-kakh857EvjHVDsbaPlnqdaWdgP8h8JlYlqZZnnW7Y8aaJ_8IdO3jTMYnwEljkT641W-0jpmOwTktsR4YGToL3KgkE6J14jT_5xU3EZNFkl_L1IYUL72a1mwtfhDapB17WcNOKS6lxZeGSha2Sw1BA.ZYtrbxfYI-d4vIcsxiU9hG5ahBmcc5rtSNtz9VB-nc0&dib_tag=se&keywords=Apple+iPhone+15+Pro+Max+256GB&qid=1748790249&sr=8-16",
    "isPrime": false,
    "sponsoredAd": false,
    "couponInfo": "",
    "badgesInfo": [],
    "boughtInfo": ""
  }
]

Next, create a new file named ebay_serp_scraper.py for the eBay integration, and add the following code using the same approach we applied in the Amazon scraper.

import requests
import urllib.parse
from requests.exceptions import RequestException

def get_products_from_ebay(query: str, country: str = None, top_level_domain: str = 'com') -> list[dict]:
    API_TOKEN = "<Normal requests token>"
    API_ENDPOINT = "https://api.crawlbase.com/"

    params = {
        "token": API_TOKEN,
        "url": f"https://www.ebay.{top_level_domain}/sch/i.html?_nkw={urllib.parse.quote_plus(query)}",
        "scraper": "ebay-serp",
        "country": country
    }

    response = requests.get(API_ENDPOINT, params=params)
    response.raise_for_status()

    result = response.json()

    return result['body']['products']

if __name__ == "__main__":

    import json

    products = get_products_from_ebay("Apple iPhone 15 Pro Max 256GB", country="US", top_level_domain="co.uk")
    pretty_json = json.dumps(products, indent=2)
    print(pretty_json)

Once you run the script by running python ebay_serp_scraper.py in your terminal, you should see the raw output as shown below:

[
  {
    "title": "New listingiPhone 15 Pro Max - 256gb Blue Titanium",
    "subTitles": ["Pre-owned"],
    "price": {
      "current": {
        "from": "\u00a3513.93",
        "to": "\u00a3513.93"
      },
      "trendingPrice": null,
      "previousPrice": ""
    },
    "soldDate": "",
    "endDate": "",
    "bidsCount": 0,
    "hotness": "",
    "additionalHotness": [],
    "customerReviews": {
      "review": "",
      "count": 0,
      "link": ""
    },
    "shippingMessage": "Postage not specified",
    "image": "https://i.ebayimg.com/images/g/cQkAAOSww3toOdUj/s-l500.webp",
    "url": "https://www.ebay.co.uk/itm/146617243528?_skw=Apple+iPhone+15+Pro+Max+256GB&itmmeta=01JWP0SV257V8N99DP0GB2B2FK&hash=item2223119788:g:cQkAAOSww3toOdUj&itmprp=enc%3AAQAKAAAA4FkggFvd1GGDu0w3yXCmi1e4U9NgN8cbPDciCpZxd4y9M7arS1PIV503raP8NGkFrAGVGTcTJ93CtRBBBR%2BVdNPFCvqlvQFqv0p54lMQYaGfzNm2BDkKB8pOYhDKGV44h10dcGqP8Txe9bOa2%2BxUU4c03zlQjb5BBp5tZ6gfqvtXuJtaf6xW09Np944o3hzvtrVoM%2Fv9BcdTnUznCeLvNxWNQB9IG4%2BYEFyqvvcZce%2FbSdKA4hmdFerIaeAjWlWEpu7%2F1ZZV9ElPz9yibQZ6YcvruBvpIS3YYlvAypHn6yp8%7Ctkp%3ABk9SR5yx58DlZQ",
    "location": "United Kingdom",
    "time": {
      "timeLeft": "6d 23h left",
      "timeEnd": "(Sun, 08:00)"
    },
    "listingDate": "",
    "topRatedSeller": false,
    "sponsoredAd": false,
    "sellerInfo": "davidawilliams1 (2,001) 100%"
  },
  // Note: some results have been omitted for brevity.
  {
    "title": "Great Condition - Apple iPhone 15 Pro Max 256GB Blue Titanium 98% Battery -1047-",
    "subTitles": ["Pre-owned"],
    "price": {
      "current": {
        "from": "\u00a3709.00",
        "to": "\u00a3709.00"
      },
      "trendingPrice": null,
      "previousPrice": ""
    },
    "soldDate": "",
    "endDate": "",
    "bidsCount": 0,
    "hotness": "20 watchers",
    "additionalHotness": [],
    "customerReviews": {
      "review": "",
      "count": 19,
      "link": "https://www.ebay.co.uk/p/24062761146?iid=236096139018#UserReviews"
    },
    "shippingMessage": "Postage not specified",
    "image": "https://i.ebayimg.com/images/g/xIEAAeSw-NhoILdt/s-l140.webp",
    "url": "https://www.ebay.co.uk/itm/236096139018?_skw=Apple+iPhone+15+Pro+Max+256GB&epid=24062761146&itmmeta=01JWP0SV29B7VZRTR5336M14WJ&hash=item36f86d2f0a:g:xIEAAeSw-NhoILdt&itmprp=enc%3AAQAKAAABAFkggFvd1GGDu0w3yXCmi1d4wiok8NH9ED6XupF4lEB05soO%2BX9YmeNcvnhAT2xl8%2BMR4mTSpPbdLrOdQTzvZweJCy9AbPRRp%2F3fEqBcyd%2B5KAg80QbLwBoSqH0%2BeYox7n1qscvQrlMrB71D%2FNgjlJfHccWLZ%2FGngbtsV5ccazqhwfxQdUKd0i%2BgDc6vXrrD%2F5SrwBFP5B7By2Vao286t0uJFGHFt28Hit8Si6T2mAYp5VdmEJN4YPv0xAlEwjmiPZ94WWHtYHGzBfZgdg9pRZHngDYjfm0iVEkcLab9c3G5xbhlMKPjRvjgC8DcoRUnHD%2FJSfIkRTLH0PzFBsjmyTw%3D%7Ctkp%3ABFBMpLHnwOVl",
    "location": "United Kingdom",
    "time": {
      "timeLeft": "",
      "timeEnd": ""
    },
    "listingDate": "",
    "topRatedSeller": false,
    "sponsoredAd": true,
    "sellerInfo": "2mselectronics (728) 100%"
  }
]

Extracting and Structuring Price Data

While the current API output is clean and provides valuable information, not all of it is relevant to our specific goal in this context. In this section, we'll demonstrate how to further standardize and parse the data to extract only the specific data points you need.

Let’s create a new file named structured_consolidated_data.py and insert the following code:

from amazon_serp_scraper import get_products_from_amazon
from ebay_serp_scraper import get_products_from_ebay

def get_structured_consolidated_data(query: str, country: str = None, top_level_domain: str = 'com') -> list[dict]:
    products = []

    amazon_data = get_products_from_amazon(query, country=country, top_level_domain=top_level_domain)
    products.extend([{"product": item['name'], "price": item['price'], "url": item['url'], "source": "Amazon"} for item in amazon_data])

    ebay_data = get_products_from_ebay(query, country=country, top_level_domain=top_level_domain)
    products.extend([{"product": item['title'], "price": item['price']['current']['to'].strip(), "url": item['url'], "source": "Ebay"} for item in ebay_data])

    return products

This script pulls relevant product data from Amazon and eBay from the Crawlbase data scrapers. It extracts and combines key details such as source, product, price, and URL from each search result into a single, easy-to-use list.

Here is the complete code:

from amazon_serp_scraper import get_products_from_amazon
from ebay_serp_scraper import get_products_from_ebay

def get_structured_consolidated_data(query: str, country: str = None, top_level_domain: str = 'com') -> list[dict]:
    products = []

    amazon_data = get_products_from_amazon(query, country=country, top_level_domain=top_level_domain)
    products.extend([{"product": item['name'], "price": item['price'], "url": item['url'], "source": "Amazon"} for item in amazon_data])

    ebay_data = get_products_from_ebay(query, country=country, top_level_domain=top_level_domain)
    products.extend([{"product": item['title'], "price": item['price']['current']['to'].strip(), "url": item['url'], "source": "Ebay"} for item in ebay_data])

    return products

if __name__ == "__main__":

    import json

    products = get_structured_consolidated_data("Apple iPhone 15 Pro Max 256GB", country="US", top_level_domain="co.uk")

    pretty_json = json.dumps(products, indent=2)
    print(pretty_json)

Run the script from the terminal using the following command:

python structured_consolidated_data.py

Once you get a successful response, the consolidated output will be displayed in a structured format, showing the source field instead of raw data.

[
  {
    "name": "Apple iPhone 16 Pro Max 256GB: 5G Mobile phone with Apple Intelligence - Desert Titanium + Silicone Case with MagSafe - Black",
    "price": "\u00a31,138.00",
    "url": "https://www.amazon.co.uk/Apple-iPhone-Pro-Max-256GB/dp/B0DGTJ6Y1S/ref=sr_1_1?dib=eyJ2IjoiMSJ9.ZChvrJybm7TlnZ2-2tQPcAJhiEM1rKrU7CkKwkwiDWAnDvRZyGd490ktAc-ukHTMrhCNjGZN-mUv_pB9jpM_b-kakh857EvjHVDsbaPlnqdaWdgP8h8JlYlqZZnnW7Y8aaJ_8IdO3jTMYnwEljkT641W-0jpmOwTktsR4YGToL3KgkE6J14jT_5xU3EZNFkl_L1IYUL72a1mwtfhDapB17WcNOKS6lxZeGSha2Sw1BA.ZYtrbxfYI-d4vIcsxiU9hG5ahBmcc5rtSNtz9VB-nc0&dib_tag=se&keywords=Apple+iPhone+15+Pro+Max+256GB&qid=1748791075&sr=8-1",
    "source": "Amazon"
  },
  // Note: some results have been omitted for brevity.
  {
    "name": "Apple iPhone 15 Pro Max (256 GB) - White Titanium - Good Condition",
    "price": "\u00a3679.99",
    "url": "https://www.ebay.co.uk/itm/205475768012?_skw=Apple+iPhone+15+Pro+Max+256GB&itmmeta=01JWP14FHS44GRVT9KYP8AFKEW&hash=item2fd74f66cc:g:QkcAAOSwLKRoHd86&itmprp=enc%3AAQAKAAABAFkggFvd1GGDu0w3yXCmi1enej%2BIHaZBwUjnCYNkoIrYanJLRykGLG546KgFE4C%2BH%2FGVT3ptDyAFH87uYJ2y6Ih4qylSr70KgmTvf7QzxWJJb8UIuLl9GWlI4h4QLVbnS26iLFU08zLSz8kbcbyI5kILO9IRzzTpKec0Cxb4G8ujEojvnrdM8G3oP5ud4QwSccYRK7L8PnDvS7qECHgMXmshCmZh749EOMqeDYRFSCqmPYQ6etMUr0y38Wag%2BT%2BLOIkx8XxR3fTC4FbbMPGGUDdNpG1jLJ3e%2F6X9tZuQuDp4lprfyjTKWD564verk%2FxhORgHzaHvDhmmEE121dibsOU%3D%7Ctkp%3ABFBMgvmRweVl",
    "source": "Ebay"
  }
]

Make Use of the Scraped Data

Now that we've successfully parsed the relevant data for our needs, we can begin applying it to basic pricing intelligence. For example, we can write a script to calculate and show the average price of the "Apple iPhone 15 Pro Max 256GB" listed on Amazon and compare it to the average price on eBay.

Create a new file named market_average_price.py and insert the following code:

import pandas as pd
from price_parser import Price
from structured_consolidated_data import get_structured_consolidated_data

products = get_structured_consolidated_data("Apple iPhone 15 Pro Max 256GB", country="US", top_level_domain="co.uk")
products_with_prices = filter(lambda product: (product["price"] is not None and product["price"].strip() != ''), products)
sanitized_products = [product | {"price": float(Price.fromstring(product["price"]).amount) } for product in products_with_prices]

data = list(sanitized_products)
df = pd.DataFrame(data)

# Check data types and examine the source column
iphone_mask = df['name'].str.contains('iPhone', case=False, na=False)

# Calculate average prices for iPhone models by source
iphone_df = df[df['name'].str.contains('iPhone', case=False, na=False)]

# Group by source and calculate average prices
avg_prices = iphone_df.groupby('source')['price'].agg(['mean', 'count']).round(2)
avg_prices.columns = ['Average Price (£)', 'Number of Products']

print("\n\nAverage iPhone prices by source:")
print(avg_prices)

# Calculate the difference
amazon_avg = avg_prices.loc['Amazon', 'Average Price (£)']
ebay_avg = avg_prices.loc['Ebay', 'Average Price (£)']
difference = amazon_avg - ebay_avg

print("\n\nPrice comparison:")
print(f"Amazon average: £{amazon_avg}")
print(f"eBay average: £{ebay_avg}")
print(f"Difference: £{difference:.2f} (Amazon is {'higher' if difference > 0 else 'lower'} than eBay)\n\n")