DEV Community

Cover image for Scraping eBay's 1.2 Billion Listings: A Scrapy Tale
Noorsimar Singh
Noorsimar Singh

Posted on

Scraping eBay's 1.2 Billion Listings: A Scrapy Tale

TL;DR: Built a production-ready eBay scraper using Scrapy framework with ScrapeOps proxy integration. Handles search results, product details, price extraction, and exports clean CSV/JSON data while bypassing anti-bot measures.


Last quarter, I needed to track laptop prices across eBay for a friend research project. What started as a "quick scraping script" turned into a full-featured eBay scraper that I'm now sharing as an open-source project.

The challenge? eBay has sophisticated anti-bot protection, dynamic pricing, and multiple page layouts. The solution? A robust Scrapy spider with proper proxy rotation and intelligent data extraction.

Why eBay Scraping Matters for Developers

E-commerce data extraction is everywhere in modern development. Whether you're building price comparison tools, market research platforms, or inventory management systems, you'll eventually need to scrape product data.

eBay presents unique challenges:

  • Multiple anti-detection systems
  • Complex auction vs. fixed-price listings
  • Global marketplace variations
  • Real-time price changes

Getting this right teaches you patterns that apply to almost any web scraping project.

The Architecture: Two Spiders, Clean Data

I built two specialized spiders instead of one monolithic scraper:

1. Search Spider (ebay_search.py)

Extracts product listings from search results pages:

def extract_search_item_data(self, container, meta):
    """Extract data from a single search result item"""
    item = EbaySearchItem()

    # Skip placeholder items (common eBay issue)
    if not item['product_id'] or not item['product_id'].isdigit() or len(item['product_id']) < 9:
        return None

    # Price extraction with fallbacks
    price_element = response.css('.s-item__price .notranslate::text').get()
    if not price_element:
        price_element = response.css('.s-item__price::text').get()
    item['current_price'] = price_element.strip() if price_element else ''

    return item
Enter fullscreen mode Exit fullscreen mode

2. Product Spider (ebay_product.py)

Extracts detailed information from individual product pages:

def extract_product_data(self, response):
    """Extract detailed product data from eBay product page"""
    # Updated selectors for new eBay layout
    price_element = response.css('.x-price-primary .ux-textspans::text').get()
    if not price_element:
        price_element = response.css('.x-price-approx__price .ux-textspans::text').get()

    # Handle multiple currencies (AU $575.00, $199.99, etc.)
    item['current_price'] = price_element.strip() if price_element else ''
Enter fullscreen mode Exit fullscreen mode

The key insight? eBay frequently updates their CSS selectors, so I implemented cascading fallbacks that check multiple possible selectors.

Handling Anti-Bot Protection with ScrapeOps

Here's where most DIY scrapers fail. eBay's security is no joke. I integrated ScrapeOps proxy rotation early in the project, and honestly, it saved me weeks of debugging.

# settings.py
SCRAPEOPS_API_KEY = 'YOUR_API_KEY'
SCRAPEOPS_PROXY_ENABLED = True
SCRAPEOPS_RENDER_JS = True
SCRAPEOPS_PROXY_SETTINGS = {'bypass': 'generic_level_2'}

DOWNLOADER_MIDDLEWARES = {
    'scrapeops_scrapy_proxy_sdk.scrapeops_scrapy_proxy_sdk.ScrapeOpsScrapyProxySdk': 725,
}
Enter fullscreen mode Exit fullscreen mode

The bypass: generic_level_2 parameter was crucial for handling eBay's JavaScript challenges. Without it, I was getting blocked after just a few requests.

Clean Data Export Pipeline

Raw scraped data is messy. I built validation and cleaning pipelines that output production-ready CSV and JSON:

# pipelines.py
class EbayDataCleaningPipeline:
    def process_item(self, item, spider):
        # Normalize prices across different currencies
        if item.get('current_price'):
            item['current_price'] = self.clean_price(item['current_price'])

        # Standardize conditions
        if item.get('condition'):
            item['condition'] = self.normalize_condition(item['condition'])

        return item
Enter fullscreen mode Exit fullscreen mode

The result? Clean CSV files with consistent formatting, ready for analysis or database import.

Running the Scraper

Basic usage is straightforward:

# Search for products
scrapy crawl ebay_search -a query="laptop" -a max_results=50

# Get detailed product info  
scrapy crawl ebay_product -a item_ids="123456789,987654321"

# Export to specific format
scrapy crawl ebay_search -o results.json -a query="vintage camera"
Enter fullscreen mode Exit fullscreen mode

The scraper automatically handles:

  • Rate limiting and delays
  • User agent rotation
  • Failed request retries
  • Data validation and cleaning

Next Steps and Enhancements

Want to take this further? The complete GitHub repository includes advanced features like:

  • Multi-currency price normalization
  • Seller reputation analysis
  • Historical price tracking setup
  • Global eBay site support (UK, DE, CA, AU)

For deeper insights into eBay's anti-bot measures and scraping strategies, I recommend checking out the eBay Website Analyzer and their comprehensive scraping guide.

The ScrapeOps Integration Experience

Full transparency: I grabbed a free ScrapeOps API key early in this project, and it eliminated the proxy headaches entirely. Their free tier covers 1,000 requests monthly, which is perfect for development and testing.

The JavaScript rendering feature was particularly valuable for eBay's dynamic content. Instead of wrestling with Selenium or Playwright, I got reliable results with a simple configuration change.

Wrapping Up

Building a production-ready web scraper taught me that the hard part isn't extracting data—it's handling all the edge cases, anti-bot measures, and data quality issues that come up in real-world usage.

This eBay scraper handles those challenges with proven patterns you can apply to other e-commerce sites. The code is open source, well-documented, and battle-tested.

Found this useful? Star the GitHub repo and let me know what you build with it. I'm always curious to see how other developers adapt these patterns for their own projects.

What's your biggest web scraping challenge? Drop a comment below—I might cover it in a future post.

Top comments (0)