eBay's marketplace generates incredible pricing data — especially "sold listings," which reveal what items actually sell for versus what sellers are asking. This data powers reselling arbitrage, market research, and pricing models.
Why Mine eBay Data?
- Arbitrage opportunities: Find items selling for less than their market value
- Price history: Track how item values change over time
- Seller analysis: Identify top sellers and their strategies
- Market sizing: Understand demand for specific product categories
Building an eBay Price Analyzer
import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import time
class EbayMiner:
BASE_URL = "https://www.ebay.com/sch/i.html"
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)',
'Accept': 'text/html,application/xhtml+xml',
})
def search_sold_listings(self, query, pages=3):
"""Search eBay sold/completed listings."""
results = []
for page in range(1, pages + 1):
params = {
'_nkw': query,
'LH_Sold': '1', # Sold items only
'LH_Complete': '1', # Completed listings
'_pgn': page,
'_ipg': 60, # Items per page
}
resp = self.session.get(self.BASE_URL, params=params)
soup = BeautifulSoup(resp.text, 'html.parser')
items = soup.select('.s-item')
for item in items:
title_el = item.select_one('.s-item__title')
price_el = item.select_one('.s-item__price')
date_el = item.select_one('.s-item__ended-date')
link_el = item.select_one('.s-item__link')
if not title_el or not price_el:
continue
price_text = price_el.get_text(strip=True)
price = self._parse_price(price_text)
results.append({
'title': title_el.get_text(strip=True),
'price': price,
'price_raw': price_text,
'sold_date': date_el.get_text(strip=True) if date_el else '',
'url': link_el['href'] if link_el else '',
})
time.sleep(2)
return results
def _parse_price(self, price_text):
"""Extract numeric price from text like '$29.99'."""
match = re.search(r'\$(\d+[,\d]*\.?\d*)', price_text)
if match:
return float(match.group(1).replace(',', ''))
return 0.0
Finding Arbitrage Opportunities
The real power is comparing sold prices against current active listings:
def find_arbitrage(miner, query, min_margin=0.2):
"""Find items listed below their typical sold price."""
# Get sold listing prices
sold = miner.search_sold_listings(query, pages=3)
sold_df = pd.DataFrame(sold)
avg_sold_price = sold_df['price'].median()
print(f"Median sold price for '{query}': ${avg_sold_price:.2f}")
print(f"Sample size: {len(sold_df)} sold listings")
# Price distribution
print(f"25th percentile: ${sold_df['price'].quantile(0.25):.2f}")
print(f"75th percentile: ${sold_df['price'].quantile(0.75):.2f}")
# Items that sold below median (potential buyer opportunities)
deals = sold_df[sold_df['price'] < avg_sold_price * (1 - min_margin)]
print(f"\nBelow-market deals found: {len(deals)}")
return deals
# Example usage
miner = EbayMiner()
deals = find_arbitrage(miner, "Nintendo Switch OLED", min_margin=0.15)
Seller Analysis
Understanding top sellers reveals market dynamics:
def analyze_sellers(miner, query):
"""Analyze seller distribution for a product category."""
sold = miner.search_sold_listings(query, pages=5)
df = pd.DataFrame(sold)
# Extract seller info from listing pages
seller_stats = df.groupby('title').agg(
avg_price=('price', 'mean'),
total_sales=('price', 'count'),
revenue=('price', 'sum')
).sort_values('revenue', ascending=False)
print("Top selling items:")
print(seller_stats.head(10).to_string())
return seller_stats
Scaling with Cloud Solutions
For production-grade eBay data mining across thousands of queries, the eBay Scraper on Apify handles pagination, anti-bot measures, and data structuring automatically. Feed it a list of search terms and get clean JSON output.
Handling eBay's Anti-Scraping Measures
eBay rotates page structures and uses detection mechanisms. Use a proxy rotation service like ScraperAPI to handle IP rotation and request headers automatically:
def scrape_with_proxy(url):
"""Route requests through a proxy service."""
SCRAPER_API_KEY = "your_key_here"
proxy_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}"
response = requests.get(proxy_url, timeout=60)
return response
Building a Price Tracking Dashboard
import json
from datetime import datetime
def track_prices(miner, queries, output_file='ebay_prices.json'):
"""Track prices over time for multiple product queries."""
timestamp = datetime.now().isoformat()
tracking_data = {'timestamp': timestamp, 'products': {}}
for query in queries:
sold = miner.search_sold_listings(query, pages=2)
prices = [item['price'] for item in sold if item['price'] > 0]
tracking_data['products'][query] = {
'median_price': pd.Series(prices).median(),
'mean_price': pd.Series(prices).mean(),
'sample_size': len(prices),
'min_price': min(prices) if prices else 0,
'max_price': max(prices) if prices else 0,
}
with open(output_file, 'a') as f:
f.write(json.dumps(tracking_data) + '\n')
print(f"Tracked {len(queries)} products at {timestamp}")
return tracking_data
# Run daily
queries = ["iPhone 15 Pro", "PS5 Console", "RTX 4090"]
track_prices(miner, queries)
Conclusion
eBay data mining is uniquely valuable because sold listings provide ground-truth pricing data. Build your pipeline to collect sold prices, identify arbitrage windows, and track market trends. Start small with the Python code above, then scale to cloud-based solutions for continuous monitoring across product categories.
Top comments (0)