TL;DR: Built a production-ready eBay scraper using Scrapy framework with ScrapeOps proxy integration. Handles search results, product details, price extraction, and exports clean CSV/JSON data while bypassing anti-bot measures.
Last quarter, I needed to track laptop prices across eBay for a friend research project. What started as a "quick scraping script" turned into a full-featured eBay scraper that I'm now sharing as an open-source project.
The challenge? eBay has sophisticated anti-bot protection, dynamic pricing, and multiple page layouts. The solution? A robust Scrapy spider with proper proxy rotation and intelligent data extraction.
Why eBay Scraping Matters for Developers
E-commerce data extraction is everywhere in modern development. Whether you're building price comparison tools, market research platforms, or inventory management systems, you'll eventually need to scrape product data.
eBay presents unique challenges:
- Multiple anti-detection systems
- Complex auction vs. fixed-price listings
- Global marketplace variations
- Real-time price changes
Getting this right teaches you patterns that apply to almost any web scraping project.
The Architecture: Two Spiders, Clean Data
I built two specialized spiders instead of one monolithic scraper:
1. Search Spider (ebay_search.py
)
Extracts product listings from search results pages:
def extract_search_item_data(self, container, meta):
"""Extract data from a single search result item"""
item = EbaySearchItem()
# Skip placeholder items (common eBay issue)
if not item['product_id'] or not item['product_id'].isdigit() or len(item['product_id']) < 9:
return None
# Price extraction with fallbacks
price_element = response.css('.s-item__price .notranslate::text').get()
if not price_element:
price_element = response.css('.s-item__price::text').get()
item['current_price'] = price_element.strip() if price_element else ''
return item
2. Product Spider (ebay_product.py
)
Extracts detailed information from individual product pages:
def extract_product_data(self, response):
"""Extract detailed product data from eBay product page"""
# Updated selectors for new eBay layout
price_element = response.css('.x-price-primary .ux-textspans::text').get()
if not price_element:
price_element = response.css('.x-price-approx__price .ux-textspans::text').get()
# Handle multiple currencies (AU $575.00, $199.99, etc.)
item['current_price'] = price_element.strip() if price_element else ''
The key insight? eBay frequently updates their CSS selectors, so I implemented cascading fallbacks that check multiple possible selectors.
Handling Anti-Bot Protection with ScrapeOps
Here's where most DIY scrapers fail. eBay's security is no joke. I integrated ScrapeOps proxy rotation early in the project, and honestly, it saved me weeks of debugging.
# settings.py
SCRAPEOPS_API_KEY = 'YOUR_API_KEY'
SCRAPEOPS_PROXY_ENABLED = True
SCRAPEOPS_RENDER_JS = True
SCRAPEOPS_PROXY_SETTINGS = {'bypass': 'generic_level_2'}
DOWNLOADER_MIDDLEWARES = {
'scrapeops_scrapy_proxy_sdk.scrapeops_scrapy_proxy_sdk.ScrapeOpsScrapyProxySdk': 725,
}
The bypass: generic_level_2
parameter was crucial for handling eBay's JavaScript challenges. Without it, I was getting blocked after just a few requests.
Clean Data Export Pipeline
Raw scraped data is messy. I built validation and cleaning pipelines that output production-ready CSV and JSON:
# pipelines.py
class EbayDataCleaningPipeline:
def process_item(self, item, spider):
# Normalize prices across different currencies
if item.get('current_price'):
item['current_price'] = self.clean_price(item['current_price'])
# Standardize conditions
if item.get('condition'):
item['condition'] = self.normalize_condition(item['condition'])
return item
The result? Clean CSV files with consistent formatting, ready for analysis or database import.
Running the Scraper
Basic usage is straightforward:
# Search for products
scrapy crawl ebay_search -a query="laptop" -a max_results=50
# Get detailed product info
scrapy crawl ebay_product -a item_ids="123456789,987654321"
# Export to specific format
scrapy crawl ebay_search -o results.json -a query="vintage camera"
The scraper automatically handles:
- Rate limiting and delays
- User agent rotation
- Failed request retries
- Data validation and cleaning
Next Steps and Enhancements
Want to take this further? The complete GitHub repository includes advanced features like:
- Multi-currency price normalization
- Seller reputation analysis
- Historical price tracking setup
- Global eBay site support (UK, DE, CA, AU)
For deeper insights into eBay's anti-bot measures and scraping strategies, I recommend checking out the eBay Website Analyzer and their comprehensive scraping guide.
The ScrapeOps Integration Experience
Full transparency: I grabbed a free ScrapeOps API key early in this project, and it eliminated the proxy headaches entirely. Their free tier covers 1,000 requests monthly, which is perfect for development and testing.
The JavaScript rendering feature was particularly valuable for eBay's dynamic content. Instead of wrestling with Selenium or Playwright, I got reliable results with a simple configuration change.
Wrapping Up
Building a production-ready web scraper taught me that the hard part isn't extracting data—it's handling all the edge cases, anti-bot measures, and data quality issues that come up in real-world usage.
This eBay scraper handles those challenges with proven patterns you can apply to other e-commerce sites. The code is open source, well-documented, and battle-tested.
Found this useful? Star the GitHub repo and let me know what you build with it. I'm always curious to see how other developers adapt these patterns for their own projects.
What's your biggest web scraping challenge? Drop a comment below—I might cover it in a future post.
Top comments (0)