DEV Community

wfgsss
wfgsss

Posted on • Edited on

Automating Supplier Discovery: A Python Script for Yiwugo.com

Automating Supplier Discovery: A Python Script for Yiwugo.com

Finding reliable wholesale suppliers on Yiwugo.com manually is tedious. You search, scroll, compare prices, check minimum order quantities — and repeat for every product category. What if you could automate the entire process?

In this tutorial, I'll walk you through building a Python script that automatically discovers suppliers on Yiwugo.com based on your criteria, ranks them by relevance, and exports a clean report you can act on.

What We're Building

A command-line tool that:

  1. Searches Yiwugo.com for products by keyword
  2. Extracts supplier details (name, location, product count, price range)
  3. Scores and ranks suppliers based on configurable criteria
  4. Exports results to CSV for easy analysis

Prerequisites

Install the required packages:

pip install apify-client pandas
Enter fullscreen mode Exit fullscreen mode

Step 1: Set Up the Scraper Client

First, create a wrapper that calls the Yiwugo Scraper on Apify:

# supplier_discovery.py
import os
import json
from apify_client import ApifyClient
import pandas as pd
from collections import defaultdict

APIFY_TOKEN = os.environ.get(APIFY_TOKEN", "your-token-here")
ACTOR_ID = "jungle_intertwining/yiwugo-scraper"

client = ApifyClient(APIFY_TOKEN)


def search_products(keyword: str, max_items: int = 50) -> list[dict]:
    """Search Yiwugo.com for products matching a keyword."""
    run = client.actor(ACTOR_ID).call(
        run_input={
            "keyword": keyword,
            "maxItems": max_items,
        }
    )
    dataset = client.dataset(run["defaultDatasetId"])
    return list(dataset.iterate_items())
Enter fullscreen mode Exit fullscreen mode

Step 2: Extract Supplier Profiles

Each product listing on Yiwugo includes supplier information. We aggregate this to build supofiles:

def build_supplier_profiles(products: list[dict]) -> list[dict]:
    """Aggregate product data into supplier profiles."""
    suppliers = defaultdict(lambda: {
        "name": "",
        "location": "",
        "products": [],
        "price_min": float("inf"),
        "price_max": 0,
        "total_listings": 0,
    })

    for product in products:
        supplier_name = product.get("shopName", "Unknown")
        if not supplier_name or supplier_name == "Unknown":
            continue

        s = suppliers[supplier_name]
        s["name"] = supplier_name
        s["location"] = product.get("area", "N/A")
        s["total_listings"] += 1
        s["products"].append({
            "title": product.get("title", ""),
            "price": product.get("price", ""),
            "moq": product.get("minOrderQuantity", ""),
        })

        # Track price range
        try:
            price = float(str(product.get("price", "0")).replace("¥", "").split("-")[0].strip())
            s["price_min"] = min(s["price_min"], price)
            s["price_max"] = max(s["price_max"], price)
        except (ValueError, IndexError):
            pass

    # Clean up infinity values
    result = []
    for data in suppliers.values():
        if data["price_min"] == float("inf"):
            data["price_min"] = 0
        result.append(data)

    return result
Enter fullscreen mode Exit fullscreen mode

Step 3: Score and Rank Suppliers

Not all suppliers are equal. We score them based on factors that matter for wholesale sourcing:

def score_suppliers(
    suppliers: list[dict],
    weights: dict | None = None,
) -> list[dict]:
    """Score suppliers based on configurable criteria.

    Default weights:
    - listing_count: More products listed = more established
    - price_competitiveness: Lower average price = better deal
    - location: Yiwu-based suppliers get a bonus (closer to market)
    """
    if weights is None:
        weights = {
            "listing_count": 0.4,
            "price_competitiveness": 0.35,
            "location_bonus": 0.25,
        }

    if not suppliers:
        return []

    # Normalize listing counts (0-100 scale)
    max_listings = max(s["total_listings"] for s in suppliers)

    for s in suppliers:
        score = 0

        # Listing count score (more = better)
        if max_listings > 0:
            listing_score = (s["total_listings"] / max_listings) * 100
            score += listing_score * weights["listing_count"]

        # Price competitiveness (lower = better)
        avg_price = (s["price_min"] + s["price_max"]) / 2 if s["price_max"] > 0 else 0
        if avg_price > 0:
            # Inverse scoring: lower price = higher score
            price_scores = []
            for sup in suppliers:
                p = (sup["price_min"] + sup["price_max"]) / 2
                if p > 0:
                    price_scores.append(p)
            if price_scores:
                max_price = max(price_scores)
                price_score = ((max_price - avg_price) / max_price) * 100 if max_price > 0 else 50
                score += price_score * weights["price_competitiveness"]

        # Location bonus (Yiwu-based suppliers)
        location = s.get("location", "").lower()
        if "义乌" in location or "yiwu" in location:
            score += 100 * weights["location_bonus"]

        s["score"] = round(score, 1)

    return sorted(suppliers, key=lambda x: x["score"], reverse=True)


Enter fullscreen mode Exit fullscreen mode

Step 4: Export Results

Generate a clean CSV report:

def export_to_csv(suppliers: list[dict], filename: str = "suppliers.csv"):
    """Export ranked suppliers to CSV."""
    rows = []
    for rank, s in enumerate(suppliers, 1):
        rows.append({
            "Rank": rank,
            "Supplier": s["name"],
            "Location": s["location"],
            "Score": s["score"],
            "Listings Found": s["total_listings"],
            "Price Range (¥)": f"{s['price_min']:.2f} - {s['price_max']:.2f}",
            "Sample Products": " | ".join(
                p["title"][:40] for p in s["products"][:3]
            ),
        })

    df = pd.DataFrame(rows)
    df.to_csv(filename, index=False, encoding="utf-8-sig")
    print(f"Exported {len(rows)} suppliers to {filename}")
    return df
Enter fullscreen mode Exit fullscreen mode

Step 5: Put It All Together

def discover_suppliers(
    keyword: str,
    max_items: int = 50,
    top_n: int = 20,
    output_file: str = "suppliers.csv",
):
    """Full pipeline: search → profile → score → export."""
    print(f"Searching Yiwugo.com for '{keyword}'...")
    products = search_products(keyword, max_items)
    print(f"Found {len(products)} product listings")

    print("Building supplier profiles...")
    suppliers = build_supplier_profiles(products)
    print(f"Identified {len(suppliers)} unique suppliers")

    print("Scoring and ranking...")
    ranked = score_suppliers(suppliers)

    top = ranked[:top_n]
    df = export_to_csv(top, output_file)

    # Print summary
    print(f"\nTop {min(top_n, len(top))} Suppliers for '{keyword}':")
    print("-" * 60)
    for s in top[:5]:
        print(f"  #{ranked.index(s)+1} {s['name']} (Score: {s['score']})")
        print(f"     📍 {s['location']} | {s['total_listings']} listings | ¥{s['price_min']:.0f}-{s['price_max']:.0f}")

    return df


if __name__ == "__main__":
    import sys

    keyword = sys.argv[1] if len(sys.argv) > 1 else "LED lights"
    discover_suppliers(keyword, max_items=100, top_n=20)
Enter fullscreen mode Exit fullscreen mode

Running the Script

export APIFY_TOKEN="your-apify-token"
python supplier_discovery.py "LED lights"
Enter fullscreen mode Exit fullscreen mode

Output:

Searching Yiwugo.com for 'LED lights'...
Found 87 product listings
Building supplier profiles...
Identified 34 unique suppliers
Scoring and ranking...
Exported 20 suppliers to suppliers.csv

Top 5 Suppliers for 'LED lights':
------------------------------------------------------------
  #1 义乌市明辉照明 (Score: 82.3)
     📍 浙江 义乌市 | 8 listings | ¥3-45
  #2 金华市光源电子 (Score: 71.5)
     📍 浙江 金华市 | 5 listings | ¥5-38
  ...
Enter fullscreen mode Exit fullscreen mode

Extending the Script

A few ideas to take this further:

  • Multi-keyword search: Loop through a list of product categories and merge results
  • Supplier comparison: Search the same keyword periodically and track which suppliers appear consistently (they're likely more established)
  • Price alerts: Run the script on a schedule and flag suppliers whose prices drop below a threshold
  • Integration with CRM: Push top suppliers directly into your sourcing pipeline

The Scraper Behind This

This tutorial uses the Yiwugo Scraper on Apify Store. It handles all the complexity of navigating Yiwugo.com's Chinese interface, pagination, and anti-scraping measures — so you can focus on the business logic.

Key features:

  • Keyword and category search
  • Configurable result limits
  • Structured JSON output with 20+ fields per product
  • Runs on Apify's cloud infrastructure (no local setup needed)

👉 Try it free: apify.com/jungle_intertwining/yiwugo-scraper

📚 Related: If you're scraping Chinese e-commerce platforms beyond Yiwugo, check out Scraping Chinese E-commerce Sites: Challenges and Solutions for a deep dive into anti-bot systems, encoding issues, and proven workarounds.


Building tools for wholesale sourcing automation. Follow me for more tutorials on e-commerce data extraction.

📦 Also check out: DHgate Scraper — Extract DHgate product data for dropshipping research.

📚 More on wholesale data:

Top comments (0)