wfgsss

Posted on Feb 15

How to Scrape Made-in-China.com for B2B Product Data

#webscraping #ecommerce #javascript #b2b

If you're sourcing products from China for your business, Made-in-China.com is one of the biggest B2B platforms out there — over 21 million products from verified Chinese suppliers. But manually browsing through thousands of listings to compare prices, MOQs, and suppliers? That's a full-time job nobody wants.

In this guide, I'll show you how to build a scraper that extracts structured product data from Made-in-China.com, ready for analysis or integration into your sourcing workflow.

What Data Can You Extract?

Each product listing on Made-in-China.com contains rich B2B data:

Product name and detail URL
Price (usually a range like US$16.00-25.00)
Minimum Order Quantity (MOQ) — critical for B2B buyers
Supplier name and profile URL
Supplier location (province, China)
Business type (Trading Company, Manufacturer, etc.)
Product images
Audited supplier status (SGS/TÜV verified)

This is significantly more B2B-focused data than what you'd get from consumer platforms like DHgate or AliExpress.

How Made-in-China.com Works (Technical Overview)

Good news: Made-in-China.com uses server-side rendering (SSR). Unlike SPAs that require a headless browser, search result pages return complete HTML that you can parse directly with an HTTP client.

The URL structure is straightforward:

# First page of results
https://www.made-in-china.com/products-search/hot-china-products/Bluetooth_Speaker.html

# Page 2+
https://www.made-in-china.com/products-search/find-china-products/0b0nolimit/Bluetooth_Speaker-2.html

Keywords use underscores instead of spaces. Page 1 uses a different URL pattern than subsequent pages — a common pattern on older B2B sites.

Anti-Scraping Measures

Made-in-China.com uses FCaptcha (not Cloudflare) to protect certain pages:

✅ Search result pages — accessible via direct HTTP requests
❌ Product detail pages — protected by captcha
❌ Supplier profile pages — protected by captcha

For most B2B sourcing use cases, search result data is sufficient. You get prices, MOQs, and supplier info without needing to hit detail pages.

Building the Scraper

Here's a working scraper using Crawlee (CheerioCrawler) that extracts product data from search results:

Project Setup

mkdir made-in-china-scraper && cd made-in-china-scraper
npm init -y
npm install apify crawlee cheerio

Core Extraction Logic

The key is knowing which CSS selectors to target. Here's the extraction function:

function extractProducts($, keyword) {
    const products = [];

    $('.prod-list .prod-item, .product-list .product-item, .J-offer-wrapper')
      .each((i, el) => {
        const $el = $(el);

        // Product name & URL
        const $nameLink = $el.find('h2.product-name a, .product-name a').first();
        const productName = Link.attr('title') || $nameLink.text() || '').trim();
        const productUrl = $nameLink.attr('href') || '';
        if (!productName) return;

        // Price (handles range format like US$16.00-25.00)
        const $price = $el.find('.price-new .price, .price-new strong');
        let price = '';
        if ($price.length) {
            const spans = $price.find('span');
            if (spans.length >= 2) {
                price = `US$${$(spans[0]).text().trim()}-${$(spans[1]).text().trim()}`;
            } else {
                price = $price.text().trim();
            }
        }

        // Minimum Order Quantity
        const $moq = $el.find('.moq-new');
        let moq = '';
        if ($moq.length) {
            const moqSpan = $moq.find('.attribute span').first();
            moq = (moqSpan.text() || $moq.text() || '')
                  .replace(/\(MOQ\)/i, '').trim();
        }

        // Supplier info
        const $company = $el.find('.company-name .compnay-name').first();
        const supplierName = ($company.attr('title') || $company.text() || '').trim();
        const supplierUrl = $company.attr('href') || '';
        const supplierLocation = $el.find('.company-address-info').text().trim();
        const businessType = $el.find('.business-type-info').text().trim();

        // Product image
        const $img = $el.find('img[data-original], img.lazy').first();
        let imageUrl = $img.attr('data-original') || $img.attr('src') || '';
        if (imageUrl && !imageUrl.startsWith('http')) {
            imageUrl = `https:${imageUrl}`;
        }

        // Audited supplier badge
        const auditedSupplier = $el.find('[data-title*="Audited"], .as-info').length > 0;

        products.push({
            productName, productUrl, oq,
            supplierName, supplierUrl, supplierLocation,
            businessType, auditedSupplier, imageUrl,
            searchKeyword: keyword,
        });
    });

    return products;
}

Full Scraper with Pagination

const { Actor } = require('apify');
const { CheerioCrawler } = require('crawlee');

const SEARCH_URL = 'https://www.made-in-china.com/products-search/hot-china-products';
const SEARCH_PAGE_URL = 'https://www.made-in-china.com/products-search/find-china-products/0b0nolimit';

function buildSearchUrl(keyword, page) {
    const slug = keyword.replace(/\s+/g, '_');
    if (page === 1) return `${SEARCH_URL}/${slug}.html`;
    return `${SEARCH_PAGE_URL}/${slug}-${page}.html`;
}

Actor.main(async () => {
    const input = await Actor.getInput();
    const { searchKeywords = ['bluetooth speaker'], maxPages = 3 } = input || {};

    const dataset = await Actor.openDataset();
    const proxyConfiguration = await Actor.createProxyConfiguration({
        groups: ['RESIDENTIAL'],
    }).catch(() => Actor.createProxyConfiguration().catch(() => null));

    const requests = [];
    for (const keyword of searchKeywords) {
        for (let page = 1; page <= maxPages; page++) {
            requests.push({
                url: buildSearchUrl(keyword, page),
                userData: { keyword, page },
            });
        }
    }

    let totalScraped = 0;

    const crawler = new CheerioCrawler({
        proxyConfiguration,
        maxConcurrency: 1,
        maxRequestRetries: 3,
        requestHandlerTimeoutSecs: 60,
        preNavigationHooks: [
            (ctx) => {
                ctx.request.headers = {
                    'Accept': 'text/html,application/xhtml+xml',
                    'Accept-Lan'en-US,en;q=0.9',
                gent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36',
                };
            },
        ],
        async requestHandler({ request, body, $ }) {
            const { keyword, page } = request.userData;
            const html = typeof body === 'string' ? body : body.toString();

            if (html.includes('fcaptcha') || html.includes('captcha.vemic.com')) {
                console.log(`Captcha on page ${page} for \"${keyword}\", skipping`);
                return;
            }

            const products = extractProducts($, keyword);
            if (products.length === 0) retu            const enriched = products.map((p, idx) => ({
                ...p,
                pageNum: page,
                position: (page - 1) * 20 + idx + 1,
            }));

            await dataset.pushData(enriched);
            totalScraped += products.length;
            console.log(`\"${keyword}\" p${page}: ${products.length} products (total: ${totalScraped})`);
        },
    });

    await crawler.run(requests);
    console.log(`Done! ${totalScraped} products scraped.`);
});

Sample Output

{
  "productName": "2025 Hopestar P49 Bluetooth 5.3 Speaker Portable Wireless Subwoofer",
  "productUrl": "https://colpoint.en.made-in-china.com/product/...",
  "price": "US$16.00-25.00",
  "moq": "2 Pieces",
  "supplierName": "Colpoint Technology Limited",
  "supplierUrl": "https://colpoint.en.made-in-china.com",
  "supplierLocation": "Guangdong, China",
  "businessType": "Trading Company",
  "auditedSupplier": true,
  "imageUrl": "https://image.made-in-china.com/43f34j00...",
  "searchKeyword": "bluetooth speaker",
  "pageNum": 1,
  "position": 1
}

Tips for Reliable Scraping

Keep concurrency low. Made-in-China.com triggers captchas based on request frequency. maxConcurrency: 1 with natural delays keeps you under the radar.

Use residential proxies. Datacenter IPs get flagged faster. Apify's residential proxy group works well here.

Handle captcha gracefully. Don't retry captcha pages aggressively — log and move on. You'll still get data from most pages.

Stick to search results. Product detail pages are heavily pro For B2B sourcing, search result data (price, MOQ, supplier, location) covers 80% of what buyers need.

Use It Without Coding

If you'd rather skip the setup, I've published this as a ready-to-use tool on Apify Store:

👉 Made-in-China.com Scraper on Apify Store

Just enter your keywords, set the number of pages, and run. Results export to JSON, CSV, or Excel.

The China Wholesale Scraping Toolkit

This is the third tool in a series covering major Chinese wholesale platforms:

Yiwugo Scraper — Small commodity wholesale market data
DHgate Scraper — Cross-border e-commerce / dropshipping data
Made-in-China.com Scraper — B2B industrial products and bulk sourcing data

Together, they cover the full spectrum of China wholesale sourcing — from small MOQ consumer goods to large-scale industrial procurement.

DEV Community