If you're sourcing wholesale products from Yiwugo.com, you'll quickly realize that manually downloading product images is tedious and time-consuming. Each product listing has multiple high-resolution images, and copying them one by one doesn't scale.
In this tutorial, I'll show you how to automate product image extraction from Yiwugo.com and prepare them for your e-commerce store.
Why Extract Images from Yiwugo?
Yiwugo.com (义乌购) is China's largest wholesale marketplace, with millions of products. When you're building an e-commerce store or dropshipping business, you need:
- High-quality product images for your listings
- Multiple angles of each product
- Batch processing to handle hundreds of products
- CDN optimization for fast loading
Manual downloading doesn't work at scale. Automation does.
What You'll Learn
- How to scrape product image URLs from Yiwugo
- How to batch download images efficiently
- How to optimize images for web (compression, resizing)
- How to integrate with CDN services (optional)
Prerequisites
- Basic Python knowledge
- An Apify account (free tier works)
- Node.js installed (for the scraper)
Step 1: Get Product Image URLs
First, we need to extract image URLs from Yiwugo product pages. The easiest way is to use the Yiwugo Scraper on Apify Store.
Using the Scraper
// Run via Apify API
const { ApifyClient } = require('apify-client');
const client = new ApifyClient({
token: 'YOUR_APIFY_TOKEN',
});
const input = {
startUrls: [
{ url: 'https://www.yiwugo.com/search?keyword=backpack' }
],
maxItems: 50,
};
const run = await client.actor('jungle_intertwining/yiwugo-scraper').call(input);
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(`Scraped ${items.length} products`);
Sample Output
Each product includes an images array:
{
"title": "Fashion Backpack",
"price": "¥45.00",
"images": [
"https://cbu01.alicdn.com/img/ibank/O13_1234567890.jpg",
"https://cbu01.alicdn.com/img/ibank/O1CN01def456_0987654321.jpg",
"https://cbu01.alicdn.com/img/ibank/O1CN01ghi789_1122334455.jpg"
],
"url": "https://www.yiwugo.com/item/12345.html"
}
Step 2: Batch Download Images
Now let's download all images efficiently using Python:
import os
import requests
from concurrent.futures import ThreadPoolExecutor
from urllib.parse import urlparse
import hashlib
def download_image(url, product_id, index):
"""Download a single image with error handling"""
try:
response = requests.get(url, timeout=10)
response.raise_for_status()
# Generate filename from URL hash (avoid duplicates)
url_hash = hashlib.md5(url.encode()).h8]
ext = os.path.splitext(urlparse(url).path)[1] or '.jpg'
filename = f"{product_id}_{index}_{url_hash}{ext}"
filepath = os.path.join('images', filename)
os.makedirs('images', exist_ok=True)
with open(filepath, 'wb') as f:
f.write(response.content)
print(f"✓ Downloaded: {filename}")
return filepath
except Exception as e:
print(f"✗ Failed {url}: {e}")
return None
def batch_download(products, max_workers=10):
"""Download all images from multiple products in parallel"""
tasks = []
for product in products:
product_id = product.get('id', 'unknown')
images = product.get('images', [])
for idx, img_url in enumerate(images):
tasks.append((img_url, product_id, idx))
print(f"Downloading {len(tasks)} images from {len(products)} products...")
with ThreadPoolExecutor(max_workers) as executor:
results = executor.map(lambda t: download_image(*t), tasks)
downloaded = [r for r in results if r]
print(f"\n✓ Downloaded {len(downloaded)}/{len(tasks)} images")
return downloaded
# Example usage
products = [
{
'id': '12345',
'images': [
'https://cbu01.alicdn.com/img/ibank/O1CN01abc123.jpg',
'https://cbu01.alicdn.com/img/ibank/O1CN01def456.jpg'
]
},
# ... more products
]
batch_download(products)
Key features:
- Parallel downloads (10 concurrent threads)
- Error handling (skips failed downloads)
- Duplicate prevention (URL hash in filename)
- Progress tracking
Step 3: Optimize Images for Web
Raw images from Yiwugo are often large (1-3et's compress and resize them:
from PIL import Image
import os
def optimize_image(filepath, max_width=800, quality=85):
"""Compress and resize image for web"""
try:
img = Image.open(filepath)
# Convert RGBA to RGB if needed
if img.mode == 'RGBA':
img = img.convert('RGB')
# Resize if too large
if img.width > max_width:
ratio = max_width / img.width
new_height = int(img.height * ratio)
img = img.resize((max_width, new_height), Image.LANCZOS)
# Save with compression
optimized_path = filepath.replace('images/', 'images/optimized_')
img.save(optimized_path, 'JPEG', quality=quality, optimize=True)
original_size = os.path.getsize(filepath) / 1024
optimized_size = os.path.getsize(optimized_path) / 1024
saved = ((original_size - optimized_size) / original_size) * 100
print(f"✓ Optimized: {os.path.basename(filepath)} "
f"({original_size:.1f}KB → {optimized_size:.1f}KB, -{saved:.1f}%)")
return optimized_path
except Exception as e:
print(f"✗ Failed to optimize {filepath}: {e}")
return None
# Optimize all downloaded images
image_files = [f for f in os.listdir('images') if f.endswith(('.jpg', '.png'))]
for img_file in image_files:
optimize_image(os.path.join('images', img_file))
Typical results:
- Original: 1.2 MB → Optimized: 180 KB (85% reduction)
- Page load time: 3s → 0.5s
Step 4: Upload to CDN (Optional)
For production e-commerce stores, serve images from a CDN:
Using Cloudflare Images
import requests
def upload_to_cloudflare(filepath, account_id, api_tokn """Upload imagare Images"""
url = f"https://api.cloudflare.com/client/v4/accounts/{account_id}/images/v1"
headers = {
'Authorization': f'Bearer {api_token}'
}
with open(filepath, 'rb') as f:
files = {'file': f}
response = requests.post(url, headers=headers, files=files)
if response.status_code == 200:
data = response.json()
cdn_url = data['result']['variants'][0]
print(f"✓ Uploaded: {cdn_url}")
return cdn_url
else:
print(f"✗ Upload failed: {response.text}")
return None
Using AWS S3
import boto3
def upload_to_s3(filepath, bucket_name, s3_key):
"""Upload image to AWS S3"""
s3 = boto3.client('s3')
with open(filepath, 'rb') as f:
s3.upload_fileobj(
f,
bucket_name,
s3_key,
ExtraArgs={'ContentType': 'image/jpeg', 'ACL': 'public-read'}
)
cdn_url = f"https://{bucket_name}.s3.amazonaws.com/{s3_key}"
print(f"✓ Uploaded: {cdn_url}")
return cdn_url
Complete flow Script
Here's the full pipeline:
import os
import requests
from PIL import Image
from concurrent.futures import ThreadPoolExecutor
import hashlib
def extract_and_optimize_images(products, output_dir='images'):
"""Complete pipeline: download → optimize → return CDN-ready URLs"""
os.makedirs(output_dir, exist_ok=True)
os.makedirs(f"{output_dir}/optimized", exist_ok=True)
results = []
for product in products:
product_id = product.get('id', 'unknown')
images = product.get('images', [])
product_images = []
for idx, img_url in enumerate(images):
# Download
try:
response = requests.get(img_url, timeout=10)
response.raise_for_status()
url_hash = hashlib.md5(img_url.encode()).hexdigest()[:8]
filename = f"{product_id}_{idx}_{url_hash}.jpg"
filepath = os.path.join(output_dir, filename)
with open(filepath, 'wb') as f:
f.write(response.content)
# Optimize
img = Image.open(filepath)
if img.mode == 'RGBA':
img = img.convert('RGB')
if img.width > 800:
ratio = 800 / img.width
new_height = int(img.height * ratio)
img = img.resize((800, new_height), Image.LANCZOS)
optimized_path = os.path.join(f"{output_dir}/optimized", filename)
img.save(optimized_path, 'JPEG', quality=85, optimize=True)
product_images.append({
'original': filepath,
'optimized': optimized_path,
'url': img_url
})
print(f"✓ Processed: {filename}")
except Exception as e:
print(f"✗ Failed {img_url}: {e}")
results.append({
'product_id': product_id,
'images': product_images
})
return results
# Example usage
products = [
{
'id': '12345',
'title': 'Fashion Backpack',
'images': [
'https://cbu01.alicdn.com/img/ibank/O1CN01abc123.jpg',
'https://cbu01.alicdn.com/img/ibank/O1CN01def456.jpg'
]
}
]
results = extract_and_optimize_images(products)
print(f"\n✓ Processed {len(results)} products")
Real-World Use Cases
1. Dropshipping Store Setup
- Scrape 500 products from Yiwugo
- Download and optimize all images
- Upload to Shopify/WooCommerce
- Time saved: 40 hours → 2 hours
2. Price Comparison Website
- Extract images from multiple suppliers dize image sizes (800x800)
- Serve from CDN for fast loading
- Result: 3x faster page load
3. Product Catalog Generation
- Batch download 10,000+ product images
- Auto-generate thumbnails (200x200)
- Create image galleries
- Storage saved: 15 GB → 2 GB (optimized)
Best Practices
- Respect rate limits - Don't hammer Yiwugo's servers (use delays)
- Handle errors gracefully - Some images may be deleted or moved
- Check image licenses - Ensure you have rights to use the images
- *Optimize for mobile responsive images (srcset)
- Cache aggressively - Set long cache headers on CDN
Troubleshooting
Images won't download
- Check if URL is still valid (Yiwugo sometimes removes old images)
- Verify your IP isn't blocked (use proxies if needed)
- Increase timeout (some images are large)
Optimization fails
- Install Pillow:
pip install Pillow - Check image format (some WebP images need conversion)
- Ensure enough disk space
CDN upload errors
- Verify API credentials
- Check file size limits (Cloudflare: 10 MB max)
- Ensure correct content-type headers
Next Steps
- Automate the pipeline - Run daily to sync new products
- Add watermarks - Protect your curated image library
- Generate variants - Create thumbnails, zoom views, etc.
- Track performance - Monitor CDN bandwidth and costs
Try It Yourself
Get started with the Yiwugo Scraper on Apify Store. The free tier includes 100 scraping runs per month.
GitHub Example: yiwugo-scraper-example
Questions? Drop a comment below or check out these related articles:
- How to Build a Wholesale Product Price Comparison Tool with Yiwugo Data
- How to Monitor Yiwugo Product Prices Automatically
- Automating Supplier Discovery: A Python Script for Yiwugo.com
- The Complete Guide to China Wholesale Data Scraping
📦 Also check out: DHgate Scraper — Extract DHgate product data for dropshipping research.
- Made-in-China Scraper — Extract B2B product data, supplier info, and MOQ from Made-in-China.com
📚 More on wholesale data:
Top comments (0)