Etsy is one of the largest marketplaces for handmade, vintage, and unique goods — with over 90 million active buyers and 7+ million sellers. Whether you're doing market research, price monitoring, or building a product comparison tool, extracting Etsy product data programmatically can give you a significant edge.
In this guide, I'll walk you through everything you need to know about scraping Etsy product data at scale: the site's structure, the data fields available, how to handle rate limiting, and how to use cloud-based tools to extract thousands of listings efficiently.
Why Scrape Etsy?
There are several legitimate reasons to extract Etsy data:
- Market research: Understand pricing trends, popular categories, and emerging niches
- Competitor analysis: Track competitor listings, pricing, reviews, and ranking
- Price monitoring: Watch price fluctuations across similar products
- Catalog building: Aggregate product data for comparison platforms
- SEO research: Analyze how top sellers optimize their titles and tags
Before diving in, remember to respect Etsy's Terms of Service and robots.txt. Use scraped data responsibly and avoid overloading their servers.
Understanding Etsy's Site Structure
Etsy's website follows a predictable URL pattern that makes scraping relatively straightforward once you understand the layout.
Search Pages
https://www.etsy.com/search?q=handmade+candles&page=1
Search pages contain a grid of product cards, each with:
- Product thumbnail image
- Title
- Price (including sale prices)
- Shop name
- Star rating and review count
- Free shipping badge
Product Detail Pages
https://www.etsy.com/listing/123456789/product-title-here
Product pages contain the richest data:
- Full title and description
- All product images (usually 5-10)
- Current price and original price (if on sale)
- Variation options (size, color, material)
- Quantity available
- Shop information
- Reviews and ratings
- Shipping details
- Tags and categories
Shop Pages
https://www.etsy.com/shop/ShopNameHere
Shop pages give you seller-level data including total sales, star ratings, location, and their full product inventory.
Key Data Fields You Can Extract
Here's a comprehensive breakdown of the data points available on Etsy product listings:
{
"url": "https://www.etsy.com/listing/123456789/...",
"title": "Handmade Soy Candle - Lavender Scent - 8oz",
"price": 24.99,
"originalPrice": 29.99,
"currency": "USD",
"discount": "17% off",
"images": [
"https://i.etsystatic.com/image1.jpg",
"https://i.etsystatic.com/image2.jpg"
],
"description": "Our hand-poured soy candles are made with...",
"shop": {
"name": "CandleCraftCo",
"url": "https://www.etsy.com/shop/CandleCraftCo",
"rating": 4.9,
"totalSales": 12543,
"location": "Portland, Oregon"
},
"rating": 4.8,
"reviewCount": 342,
"variations": [
{ "name": "Scent", "options": ["Lavender", "Vanilla", "Rose"] },
{ "name": "Size", "options": ["4oz", "8oz", "16oz"] }
],
"shipping": {
"freeShipping": true,
"estimatedDelivery": "3-5 business days"
},
"tags": ["soy candle", "handmade candle", "lavender candle"],
"quantityAvailable": 28,
"favoriteCount": 1893
}
Handling Etsy's Anti-Scraping Measures
Etsy employs several techniques to prevent automated data extraction. Understanding these is critical to building a reliable scraper.
1. Rate Limiting
Etsy will throttle or block IPs that make too many requests in a short period. Typical thresholds are around 20-30 requests per minute from a single IP address. Rapid sequential requests without delays will trigger blocks quickly.
Solution: Implement random delays between requests and rotate IP addresses.
const delay = (ms) => new Promise(resolve => setTimeout(resolve, ms));
async function scrapeWithDelay(urls) {
const results = [];
for (const url of urls) {
const data = await scrapePage(url);
results.push(data);
// Random delay between 2-5 seconds
await delay(2000 + Math.random() * 3000);
}
return results;
}
2. Browser Fingerprinting
Etsy checks for signs of automated browsing. Simple HTTP requests with default user-agent headers will be detected and blocked immediately.
Solution: Use a real browser environment like Puppeteer or Playwright that renders JavaScript and mimics real browser behavior:
const { chromium } = require('playwright');
async function scrapeEtsyProduct(url) {
const browser = await chromium.launch({ headless: true });
const context = await browser.newContext({
userAgent: 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36',
viewport: { width: 1920, height: 1080 }
});
const page = await context.newPage();
await page.goto(url, { waitUntil: 'networkidle' });
const data = await page.evaluate(() => {
return {
title: document.querySelector('h1')?.textContent?.trim(),
price: document.querySelector('[data-buy-box-listing-overlay] p')
?.textContent?.trim(),
description: document.querySelector('[data-id="description-text"]')
?.textContent?.trim(),
images: Array.from(
document.querySelectorAll('img[data-listing-card-listing-image]')
).map(img => img.src),
};
});
await browser.close();
return data;
}
3. CAPTCHAs
When Etsy detects unusual patterns in your traffic, it may present CAPTCHA challenges that block automated access.
Solution: Use residential proxies and implement human-like browsing patterns. Cloud-based scraping platforms handle CAPTCHA solving automatically with their proxy infrastructure.
4. Dynamic Content Loading
Many elements on Etsy pages load asynchronously via JavaScript. Simple HTTP-based scrapers using libraries like axios or node-fetch will miss this dynamically loaded content entirely.
Solution: Use browser-based scraping that waits for JavaScript to finish rendering:
// Wait for specific elements to load before extracting
await page.waitForSelector('[data-buy-box-listing-overlay]', {
timeout: 10000
});
// Or wait for all network activity to settle
await page.waitForLoadState('networkidle');
Building a Basic Etsy Scraper
Here's a more complete example that extracts product data from Etsy search results across multiple pages:
const { chromium } = require('playwright');
async function scrapeEtsySearch(query, maxPages = 3) {
const browser = await chromium.launch({ headless: true });
const page = await browser.newPage();
const allProducts = [];
for (let pageNum = 1; pageNum <= maxPages; pageNum++) {
const url = `https://www.etsy.com/search?q=${
encodeURIComponent(query)
}&page=${pageNum}`;
console.log(`Scraping page ${pageNum}: ${url}`);
await page.goto(url, { waitUntil: 'networkidle' });
const products = await page.evaluate(() => {
const cards = document.querySelectorAll(
'[data-search-results] .wt-grid__item-xs-6'
);
return Array.from(cards).map(card => {
const titleEl = card.querySelector('h3');
const priceEl = card.querySelector('.currency-value');
const linkEl = card.querySelector('a[href*="/listing/"]');
const shopEl = card.querySelector('.wt-text-caption');
const ratingEl = card.querySelector('[aria-label*="star"]');
return {
title: titleEl?.textContent?.trim(),
price: priceEl?.textContent?.trim(),
url: linkEl?.href,
shop: shopEl?.textContent?.trim(),
rating: ratingEl?.getAttribute('aria-label'),
};
}).filter(p => p.title);
});
allProducts.push(...products);
console.log(`Found ${products.length} products on page ${pageNum}`);
// Respectful delay between pages
await new Promise(r =>
setTimeout(r, 3000 + Math.random() * 2000)
);
}
await browser.close();
return allProducts;
}
// Usage
scrapeEtsySearch('vintage jewelry').then(products => {
console.log(`Total products: ${products.length}`);
console.log(JSON.stringify(products, null, 2));
});
Extracting Pricing Data
Etsy pricing has several nuances that you need to handle carefully. Products can have sale prices, variation-based pricing, and quantity discounts:
function extractPricing(page) {
return page.evaluate(() => {
const priceSection = document.querySelector(
'[data-buy-box-listing-overlay]'
);
// Current price
const currentPrice = priceSection
?.querySelector('.wt-text-title-03')
?.textContent?.trim();
// Original price (if on sale)
const originalPrice = priceSection
?.querySelector('.wt-text-strikethrough')
?.textContent?.trim();
// Variation-based pricing (size, color, etc.)
const variations = Array.from(
document.querySelectorAll(
'select[id^="variation-selector"] option'
)
).map(opt => ({
label: opt.textContent.trim(),
priceModifier: opt.dataset.priceModifier || null
}));
// Quantity discounts
const quantityDiscounts = Array.from(
document.querySelectorAll('[data-quantity-discounts] tr')
).map(row => ({
quantity: row.cells[0]?.textContent?.trim(),
pricePerItem: row.cells[1]?.textContent?.trim(),
discount: row.cells[2]?.textContent?.trim()
}));
return {
currentPrice,
originalPrice,
onSale: !!originalPrice,
variations,
quantityDiscounts
};
});
}
Extracting JSON-LD Structured Data
One powerful technique that many scrapers overlook: Etsy embeds structured data in JSON-LD format directly in the page HTML. This is far more reliable than parsing CSS selectors, which can break when Etsy updates their frontend:
async function extractStructuredData(page) {
const jsonLd = await page.evaluate(() => {
const scripts = document.querySelectorAll(
'script[type="application/ld+json"]'
);
return Array.from(scripts).map(s => {
try { return JSON.parse(s.textContent); }
catch { return null; }
}).filter(Boolean);
});
const product = jsonLd.find(d => d['@type'] === 'Product');
if (product) {
return {
name: product.name,
description: product.description,
image: product.image,
price: product.offers?.price,
currency: product.offers?.priceCurrency,
availability: product.offers?.availability,
rating: product.aggregateRating?.ratingValue,
reviewCount: product.aggregateRating?.reviewCount,
brand: product.brand?.name,
sku: product.sku
};
}
return null;
}
This technique works because search engines require structured data, so Etsy has strong incentives to keep it accurate and up-to-date. It's more stable than scraping HTML elements directly.
Scaling Up with Apify
Building and maintaining your own scraping infrastructure is time-consuming and expensive. You need to handle proxy rotation, browser lifecycle management, retry logic, session management, and data storage. This is where cloud-based platforms like Apify save enormous amounts of development time.
The Apify Store has ready-made Etsy scrapers that handle all the infrastructure complexity for you. These actors run in the cloud, manage proxies automatically, and output clean structured data you can export as JSON, CSV, or push directly to a database or webhook.
Using an Apify Etsy Actor
Here's how you'd use an Etsy scraper from the Apify platform via the JavaScript client:
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({
token: 'YOUR_API_TOKEN',
});
async function scrapeEtsy() {
const run = await client.actor('ACTOR_ID').call({
searchUrls: [
'https://www.etsy.com/search?q=handmade+candles',
'https://www.etsy.com/search?q=vintage+jewelry'
],
maxItems: 500,
proxyConfiguration: {
useApifyProxy: true,
apifyProxyGroups: ['RESIDENTIAL']
}
});
// Fetch results from the dataset
const { items } = await client.dataset(
run.defaultDatasetId
).listItems();
console.log(`Extracted ${items.length} products`);
return items;
}
Benefits of Using a Cloud Scraping Platform
- Built-in proxy rotation: Residential and datacenter proxies are included and managed automatically
- Automatic retries: Failed requests are retried with exponential backoff
- Browser management: Chromium instances are managed, recycled, and scaled
- Scheduling: Run scrapers on a cron schedule (hourly, daily, weekly)
- Storage: Results are stored in datasets, downloadable as JSON or CSV
- Monitoring: Track run status, get notifications on failures via webhooks
- API access: Integrate scraping results into your existing data pipelines via REST API
- Pay-per-result: Many actors charge per result extracted, so you only pay for data you get
Integrating with Your Data Pipeline
Apify integrates with many downstream tools out of the box:
// Set up a webhook to get notified when scraping completes
const run = await client.actor('ACTOR_ID').call(input, {
webhooks: [{
eventTypes: ['ACTOR.RUN.SUCCEEDED'],
requestUrl: 'https://your-api.com/webhook/etsy-data'
}]
});
// Or poll for results via the API
const dataset = client.dataset(run.defaultDatasetId);
const { items } = await dataset.listItems({
format: 'json',
limit: 1000
});
// Push to your database
for (const item of items) {
await db.products.upsert({
etsyId: item.listingId,
title: item.title,
price: item.price,
lastUpdated: new Date()
});
}
Data Export and Storage
Once you've scraped the data, you need to store it effectively for analysis:
const fs = require('fs');
const { stringify } = require('csv-stringify/sync');
function exportToCSV(products, filename) {
const csv = stringify(products, {
header: true,
columns: [
'title', 'price', 'originalPrice',
'shop', 'rating', 'reviewCount', 'url'
]
});
fs.writeFileSync(filename, csv);
console.log(`Exported ${products.length} products to ${filename}`);
}
function exportToJSON(products, filename) {
fs.writeFileSync(
filename,
JSON.stringify(products, null, 2)
);
console.log(`Exported ${products.length} products to ${filename}`);
}
Best Practices for Etsy Scraping
Here are the key guidelines to follow for reliable, responsible Etsy scraping:
Respect rate limits: Add random delays of 2-5 seconds between requests. Never hammer the server with rapid-fire requests.
Rotate proxies: Use residential proxies for significantly better success rates compared to datacenter IPs.
Handle variations: Etsy products frequently have multiple variants (size, color, material) with different prices. Make sure your scraper captures all variations.
Monitor for layout changes: Etsy updates its HTML structure periodically. Build resilient selectors and consider JSON-LD as a more stable alternative.
Cache results: Don't re-scrape data you already have. Implement a cache layer with appropriate TTLs.
Use structured data: As shown above, Etsy embeds JSON-LD data that's more reliable than HTML parsing.
Handle pagination correctly: Etsy search results can go up to 250 pages (roughly 16,000 listings per query). Plan your scraping strategy around this limit.
Validate your data: Always check that extracted fields contain expected data types and formats before storing.
Log everything: Keep detailed logs of your scraping runs so you can diagnose issues when selectors break or rate limits change.
Start small: Test your scraper on 10-20 pages before scaling to thousands. This saves proxy costs and avoids getting your IPs banned during development.
Conclusion
Scraping Etsy product data at scale requires a combination of the right tools, respectful rate limiting, and robust error handling. While building your own scraper with Playwright is a great way to learn and gives you full control, cloud platforms like Apify with their ready-made actors and built-in infrastructure can save you weeks of development and ongoing maintenance time.
Whether you're tracking prices, doing market research, or building a product comparison tool, the techniques in this guide should give you a solid foundation. Start small, test your scraper thoroughly, and scale up gradually as you validate your approach.
Happy scraping!
Looking for ready-made scraping solutions? Check out the Apify Store for pre-built actors that handle Etsy and hundreds of other websites out of the box.
Top comments (0)