DEV Community

agenthustler
agenthustler

Posted on

eBay Data Mining: Prices, Sellers, and Sold Listings with Python

eBay's marketplace generates incredible pricing data — especially "sold listings," which reveal what items actually sell for versus what sellers are asking. This data powers reselling arbitrage, market research, and pricing models.

Why Mine eBay Data?

  • Arbitrage opportunities: Find items selling for less than their market value
  • Price history: Track how item values change over time
  • Seller analysis: Identify top sellers and their strategies
  • Market sizing: Understand demand for specific product categories

Building an eBay Price Analyzer

import requests
from bs4 import BeautifulSoup
import pandas as pd
import re
import time

class EbayMiner:
    BASE_URL = "https://www.ebay.com/sch/i.html"

    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7)',
            'Accept': 'text/html,application/xhtml+xml',
        })

    def search_sold_listings(self, query, pages=3):
        """Search eBay sold/completed listings."""
        results = []

        for page in range(1, pages + 1):
            params = {
                '_nkw': query,
                'LH_Sold': '1',       # Sold items only
                'LH_Complete': '1',   # Completed listings
                '_pgn': page,
                '_ipg': 60,           # Items per page
            }

            resp = self.session.get(self.BASE_URL, params=params)
            soup = BeautifulSoup(resp.text, 'html.parser')

            items = soup.select('.s-item')
            for item in items:
                title_el = item.select_one('.s-item__title')
                price_el = item.select_one('.s-item__price')
                date_el = item.select_one('.s-item__ended-date')
                link_el = item.select_one('.s-item__link')

                if not title_el or not price_el:
                    continue

                price_text = price_el.get_text(strip=True)
                price = self._parse_price(price_text)

                results.append({
                    'title': title_el.get_text(strip=True),
                    'price': price,
                    'price_raw': price_text,
                    'sold_date': date_el.get_text(strip=True) if date_el else '',
                    'url': link_el['href'] if link_el else '',
                })

            time.sleep(2)

        return results

    def _parse_price(self, price_text):
        """Extract numeric price from text like '$29.99'."""
        match = re.search(r'\$(\d+[,\d]*\.?\d*)', price_text)
        if match:
            return float(match.group(1).replace(',', ''))
        return 0.0
Enter fullscreen mode Exit fullscreen mode

Finding Arbitrage Opportunities

The real power is comparing sold prices against current active listings:

def find_arbitrage(miner, query, min_margin=0.2):
    """Find items listed below their typical sold price."""
    # Get sold listing prices
    sold = miner.search_sold_listings(query, pages=3)
    sold_df = pd.DataFrame(sold)
    avg_sold_price = sold_df['price'].median()

    print(f"Median sold price for '{query}': ${avg_sold_price:.2f}")
    print(f"Sample size: {len(sold_df)} sold listings")

    # Price distribution
    print(f"25th percentile: ${sold_df['price'].quantile(0.25):.2f}")
    print(f"75th percentile: ${sold_df['price'].quantile(0.75):.2f}")

    # Items that sold below median (potential buyer opportunities)
    deals = sold_df[sold_df['price'] < avg_sold_price * (1 - min_margin)]
    print(f"\nBelow-market deals found: {len(deals)}")
    return deals

# Example usage
miner = EbayMiner()
deals = find_arbitrage(miner, "Nintendo Switch OLED", min_margin=0.15)
Enter fullscreen mode Exit fullscreen mode

Seller Analysis

Understanding top sellers reveals market dynamics:

def analyze_sellers(miner, query):
    """Analyze seller distribution for a product category."""
    sold = miner.search_sold_listings(query, pages=5)
    df = pd.DataFrame(sold)

    # Extract seller info from listing pages
    seller_stats = df.groupby('title').agg(
        avg_price=('price', 'mean'),
        total_sales=('price', 'count'),
        revenue=('price', 'sum')
    ).sort_values('revenue', ascending=False)

    print("Top selling items:")
    print(seller_stats.head(10).to_string())
    return seller_stats
Enter fullscreen mode Exit fullscreen mode

Scaling with Cloud Solutions

For production-grade eBay data mining across thousands of queries, the eBay Scraper on Apify handles pagination, anti-bot measures, and data structuring automatically. Feed it a list of search terms and get clean JSON output.

Handling eBay's Anti-Scraping Measures

eBay rotates page structures and uses detection mechanisms. Use a proxy rotation service like ScraperAPI to handle IP rotation and request headers automatically:

def scrape_with_proxy(url):
    """Route requests through a proxy service."""
    SCRAPER_API_KEY = "your_key_here"
    proxy_url = f"http://api.scraperapi.com?api_key={SCRAPER_API_KEY}&url={url}"
    response = requests.get(proxy_url, timeout=60)
    return response
Enter fullscreen mode Exit fullscreen mode

Building a Price Tracking Dashboard

import json
from datetime import datetime

def track_prices(miner, queries, output_file='ebay_prices.json'):
    """Track prices over time for multiple product queries."""
    timestamp = datetime.now().isoformat()
    tracking_data = {'timestamp': timestamp, 'products': {}}

    for query in queries:
        sold = miner.search_sold_listings(query, pages=2)
        prices = [item['price'] for item in sold if item['price'] > 0]

        tracking_data['products'][query] = {
            'median_price': pd.Series(prices).median(),
            'mean_price': pd.Series(prices).mean(),
            'sample_size': len(prices),
            'min_price': min(prices) if prices else 0,
            'max_price': max(prices) if prices else 0,
        }

    with open(output_file, 'a') as f:
        f.write(json.dumps(tracking_data) + '\n')

    print(f"Tracked {len(queries)} products at {timestamp}")
    return tracking_data

# Run daily
queries = ["iPhone 15 Pro", "PS5 Console", "RTX 4090"]
track_prices(miner, queries)
Enter fullscreen mode Exit fullscreen mode

Conclusion

eBay data mining is uniquely valuable because sold listings provide ground-truth pricing data. Build your pipeline to collect sold prices, identify arbitrage windows, and track market trends. Start small with the Python code above, then scale to cloud-based solutions for continuous monitoring across product categories.

Top comments (0)