Automating Supplier Discovery: A Python Script for Yiwugo.com
Finding reliable wholesale suppliers on Yiwugo.com manually is tedious. You search, scroll, compare prices, check minimum order quantities — and repeat for every product category. What if you could automate the entire process?
In this tutorial, I'll walk you through building a Python script that automatically discovers suppliers on Yiwugo.com based on your criteria, ranks them by relevance, and exports a clean report you can act on.
What We're Building
A command-line tool that:
- Searches Yiwugo.com for products by keyword
- Extracts supplier details (name, location, product count, price range)
- Scores and ranks suppliers based on configurable criteria
- Exports results to CSV for easy analysis
Prerequisites
- Python 3.8+
- An Apify account (free tier works)
- The Yiwugo Scraper actor: apify.com/jungle_intertwining/yiwugo-scraper
Install the required packages:
pip install apify-client pandas
Step 1: Set Up the Scraper Client
First, create a wrapper that calls the Yiwugo Scraper on Apify:
# supplier_discovery.py
import os
import json
from apify_client import ApifyClient
import pandas as pd
from collections import defaultdict
APIFY_TOKEN = os.environ.get(APIFY_TOKEN", "your-token-here")
ACTOR_ID = "jungle_intertwining/yiwugo-scraper"
client = ApifyClient(APIFY_TOKEN)
def search_products(keyword: str, max_items: int = 50) -> list[dict]:
"""Search Yiwugo.com for products matching a keyword."""
run = client.actor(ACTOR_ID).call(
run_input={
"keyword": keyword,
"maxItems": max_items,
}
)
dataset = client.dataset(run["defaultDatasetId"])
return list(dataset.iterate_items())
Step 2: Extract Supplier Profiles
Each product listing on Yiwugo includes supplier information. We aggregate this to build supofiles:
def build_supplier_profiles(products: list[dict]) -> list[dict]:
"""Aggregate product data into supplier profiles."""
suppliers = defaultdict(lambda: {
"name": "",
"location": "",
"products": [],
"price_min": float("inf"),
"price_max": 0,
"total_listings": 0,
})
for product in products:
supplier_name = product.get("shopName", "Unknown")
if not supplier_name or supplier_name == "Unknown":
continue
s = suppliers[supplier_name]
s["name"] = supplier_name
s["location"] = product.get("area", "N/A")
s["total_listings"] += 1
s["products"].append({
"title": product.get("title", ""),
"price": product.get("price", ""),
"moq": product.get("minOrderQuantity", ""),
})
# Track price range
try:
price = float(str(product.get("price", "0")).replace("¥", "").split("-")[0].strip())
s["price_min"] = min(s["price_min"], price)
s["price_max"] = max(s["price_max"], price)
except (ValueError, IndexError):
pass
# Clean up infinity values
result = []
for data in suppliers.values():
if data["price_min"] == float("inf"):
data["price_min"] = 0
result.append(data)
return result
Step 3: Score and Rank Suppliers
Not all suppliers are equal. We score them based on factors that matter for wholesale sourcing:
def score_suppliers(
suppliers: list[dict],
weights: dict | None = None,
) -> list[dict]:
"""Score suppliers based on configurable criteria.
Default weights:
- listing_count: More products listed = more established
- price_competitiveness: Lower average price = better deal
- location: Yiwu-based suppliers get a bonus (closer to market)
"""
if weights is None:
weights = {
"listing_count": 0.4,
"price_competitiveness": 0.35,
"location_bonus": 0.25,
}
if not suppliers:
return []
# Normalize listing counts (0-100 scale)
max_listings = max(s["total_listings"] for s in suppliers)
for s in suppliers:
score = 0
# Listing count score (more = better)
if max_listings > 0:
listing_score = (s["total_listings"] / max_listings) * 100
score += listing_score * weights["listing_count"]
# Price competitiveness (lower = better)
avg_price = (s["price_min"] + s["price_max"]) / 2 if s["price_max"] > 0 else 0
if avg_price > 0:
# Inverse scoring: lower price = higher score
price_scores = []
for sup in suppliers:
p = (sup["price_min"] + sup["price_max"]) / 2
if p > 0:
price_scores.append(p)
if price_scores:
max_price = max(price_scores)
price_score = ((max_price - avg_price) / max_price) * 100 if max_price > 0 else 50
score += price_score * weights["price_competitiveness"]
# Location bonus (Yiwu-based suppliers)
location = s.get("location", "").lower()
if "义乌" in location or "yiwu" in location:
score += 100 * weights["location_bonus"]
s["score"] = round(score, 1)
return sorted(suppliers, key=lambda x: x["score"], reverse=True)
Step 4: Export Results
Generate a clean CSV report:
def export_to_csv(suppliers: list[dict], filename: str = "suppliers.csv"):
"""Export ranked suppliers to CSV."""
rows = []
for rank, s in enumerate(suppliers, 1):
rows.append({
"Rank": rank,
"Supplier": s["name"],
"Location": s["location"],
"Score": s["score"],
"Listings Found": s["total_listings"],
"Price Range (¥)": f"{s['price_min']:.2f} - {s['price_max']:.2f}",
"Sample Products": " | ".join(
p["title"][:40] for p in s["products"][:3]
),
})
df = pd.DataFrame(rows)
df.to_csv(filename, index=False, encoding="utf-8-sig")
print(f"Exported {len(rows)} suppliers to {filename}")
return df
Step 5: Put It All Together
def discover_suppliers(
keyword: str,
max_items: int = 50,
top_n: int = 20,
output_file: str = "suppliers.csv",
):
"""Full pipeline: search → profile → score → export."""
print(f"Searching Yiwugo.com for '{keyword}'...")
products = search_products(keyword, max_items)
print(f"Found {len(products)} product listings")
print("Building supplier profiles...")
suppliers = build_supplier_profiles(products)
print(f"Identified {len(suppliers)} unique suppliers")
print("Scoring and ranking...")
ranked = score_suppliers(suppliers)
top = ranked[:top_n]
df = export_to_csv(top, output_file)
# Print summary
print(f"\nTop {min(top_n, len(top))} Suppliers for '{keyword}':")
print("-" * 60)
for s in top[:5]:
print(f" #{ranked.index(s)+1} {s['name']} (Score: {s['score']})")
print(f" 📍 {s['location']} | {s['total_listings']} listings | ¥{s['price_min']:.0f}-{s['price_max']:.0f}")
return df
if __name__ == "__main__":
import sys
keyword = sys.argv[1] if len(sys.argv) > 1 else "LED lights"
discover_suppliers(keyword, max_items=100, top_n=20)
Running the Script
export APIFY_TOKEN="your-apify-token"
python supplier_discovery.py "LED lights"
Output:
Searching Yiwugo.com for 'LED lights'...
Found 87 product listings
Building supplier profiles...
Identified 34 unique suppliers
Scoring and ranking...
Exported 20 suppliers to suppliers.csv
Top 5 Suppliers for 'LED lights':
------------------------------------------------------------
#1 义乌市明辉照明 (Score: 82.3)
📍 浙江 义乌市 | 8 listings | ¥3-45
#2 金华市光源电子 (Score: 71.5)
📍 浙江 金华市 | 5 listings | ¥5-38
...
Extending the Script
A few ideas to take this further:
- Multi-keyword search: Loop through a list of product categories and merge results
- Supplier comparison: Search the same keyword periodically and track which suppliers appear consistently (they're likely more established)
- Price alerts: Run the script on a schedule and flag suppliers whose prices drop below a threshold
- Integration with CRM: Push top suppliers directly into your sourcing pipeline
The Scraper Behind This
This tutorial uses the Yiwugo Scraper on Apify Store. It handles all the complexity of navigating Yiwugo.com's Chinese interface, pagination, and anti-scraping measures — so you can focus on the business logic.
Key features:
- Keyword and category search
- Configurable result limits
- Structured JSON output with 20+ fields per product
- Runs on Apify's cloud infrastructure (no local setup needed)
👉 Try it free: apify.com/jungle_intertwining/yiwugo-scraper
📚 Related: If you're scraping Chinese e-commerce platforms beyond Yiwugo, check out Scraping Chinese E-commerce Sites: Challenges and Solutions for a deep dive into anti-bot systems, encoding issues, and proven workarounds.
Building tools for wholesale sourcing automation. Follow me for more tutorials on e-commerce data extraction.
📦 Also check out: DHgate Scraper — Extract DHgate product data for dropshipping research.
- Made-in-China Scraper — Extract B2B product data, supplier info, and MOQ from Made-in-China.com
📚 More on wholesale data:
Top comments (0)