Sami

Posted on May 15

JD.com's isJdSelfRun Flag Is the Best Gray-Market Detection Signal in Chinese E-Commerce (Python Scraper Inside)

#webscraping #python #ecommerce #china

If your brand sells on JD.com (China's #2 e-commerce platform, ~600M annual active users) — or competes against one that does — there's a gray-market problem you can't see without one specific field in JD's data.

That field is isJdSelfRun. It tells you whether a given product listing is fulfilled by JD itself (their warehouses, their warranty, their return logistics) or by a third-party merchant on JD's marketplace. Combined with the seller's sellerType (flagship / franchise / specialty / self-run), it's the single cleanest signal for detecting unauthorized resellers on Chinese e-commerce — and almost no generic scraper surfaces it.

This post walks through:

Why JD's hybrid retail model creates the gray-market detection opportunity
The exact field signatures (isJdSelfRun, sellerType) and what they mean
Three concrete workflows: brand authorization audit, competitive pricing, gray-market detection
A 50-line Python integration with the Apify Actor I built around this
Honest cost math at indie scale and at hedge-fund scale

If you don't want to read the whole thing: the Actor is at zhorex/jd-scraper, and pricing is $0.008 per product detail + $0.02 per seller store record (pay-per-event, no subscription).

The hybrid retail model that creates the signal

JD.com is structurally different from Tmall and Pinduoduo. Tmall is a marketplace — every SKU is sold by a third-party merchant; Alibaba just runs the platform. JD operates a hybrid: a meaningful chunk of its catalog is sold and shipped by JD itself (JD Logistics, JD Plus warranty, JD's own returns), with the rest fulfilled by marketplace merchants.

That hybrid creates an information asymmetry buyers can exploit. When a consumer searches a brand's SKU on JD, they see all listings — but the trust signal comes from whether it's JD-self-run or a third-party. For a brand monitoring team, the question becomes: of the third-party listings of my SKU, which are authorized resellers and which are gray-market?

The data answers it in two fields:

{
    "productId": "100009082476",
    "sellerName": "Apple产品京东自营旗舰店",
    "isJdSelfRun": true,
    "sellerId": "1000003566",
    ...
}

isJdSelfRun: true means JD is the seller. The other listings — those with isJdSelfRun: false — are where the gray-market questions live, and where you need the seller's type to decide.

The seller type enum

A separate scrape against the seller store endpoint resolves to one of four values:

{
    "sellerId": "1000003566",
    "sellerType": "flagship_store",
    "serviceScore": 4.9,
    "logisticsScore": 4.8,
    "descriptionAccuracyScore": 4.9,
    ...
}

flagship_store (官方旗舰店) — the brand's own JD store. There should be exactly one per brand. If you see multiple, you have a counterfeit-or-impersonator problem.
franchise_store (品牌专营店) — authorized franchise of the brand. Brands typically maintain a list of these.
specialty_store (专卖店) — third-party that specializes in selling the brand. Often authorized via distribution agreement; sometimes gray-market.
jd_self_run (京东自营) — JD's direct retail. Always legitimate.

The canonical gray-market signature: a flagship_store listing alongside three specialty_store listings priced 20-40% lower on the same SKU. Those specialty stores are usually moving inventory acquired outside the authorized channel (parallel imports, diverted product, refurbished-as-new). They're flagged the moment your monitoring sees the price gap.

Three workflows the data unlocks

Workflow 1 — Brand authorization audit

Submit your SKU IDs. Get back a record per listing with sellerType resolved. Filter to entries where isJdSelfRun: false AND sellerType is not in your authorized list. That's your unauthorized reseller list, refreshed on whatever cadence you want.

A small brand watching 50 SKUs across 200 listings (4 average sellers per SKU) costs about $2 per refresh: 200 seller records × $0.02 + 50 product details × $0.008 = $4.40 ($2.20 if you skip product detail and only check sellers).

Workflow 2 — Competitive pricing intelligence

The product detail mode returns a realtimePrice field that is fetched fresh at scrape time, not parsed from cached HTML. JD runs flash discounts that move prices within hours; cached scrapers miss them entirely.

Tracking 200 competitor SKUs hourly = 200 × 24 × 30 = 144,000 detail records per month, $1,152 in raw event cost. At hedge-fund-grade refresh rates this is real money, but it's the right order of magnitude for the buyer cohort that already pays $3K-15K/month for alt-data feeds.

Tracking 200 SKUs daily (more realistic for a brand team) = 6,000 records × $0.008 = $48/month. Cheap enough to run as a cron.

Workflow 3 — Gray-market detection at scale

The canonical pattern in code:

for product in products:
    listings = [l for l in product["allListings"] if not l["isJdSelfRun"]]
    cheap_specialty = [
        l for l in listings
        if l["sellerType"] == "specialty_store"
        and l["price"] < product["msrp"] * 0.80
    ]
    if len(cheap_specialty) >= 3 and any(l["sellerType"] == "flagship_store" for l in listings):
        alert(product, cheap_specialty)

That's the brand-monitoring signal: a real flagship store coexisting with three or more sub-MSRP specialty stores on the same SKU. Brand teams pay agencies five-figure annual contracts to surface exactly this kind of alert; running it yourself on this data feed costs cents per check.

The 50-line Python integration

Here's the working integration end-to-end. Replace YOUR_TOKEN with your Apify API token:

from apify_client import ApifyClient

client = ApifyClient("YOUR_TOKEN")

# Scrape product details for your SKU list
run = client.actor("zhorex/jd-scraper").call(run_input={
    "mode": "product_detail",
    "productUrls": [
        "https://item.jd.com/100009082476.html",
        "https://item.jd.com/100012345678.html",
    ],
})

unauthorized_sellers = set()
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    if not item["isJdSelfRun"]:
        unauthorized_sellers.add(item["sellerId"])
        print(
            f"Non-self-run listing: {item['productTitle']}\n"
            f"  Seller: {item['sellerName']} (id {item['sellerId']})\n"
            f"  Price: {item['realtimePrice']}\n"
        )

# Now drill into those sellers to classify them
if unauthorized_sellers:
    seller_run = client.actor("zhorex/jd-scraper").call(run_input={
        "mode": "seller_store",
        "sellerUrls": [f"https://mall.jd.com/index-{sid}.html" for sid in unauthorized_sellers],
    })
    for seller in client.dataset(seller_run["defaultDatasetId"]).iterate_items():
        flag = "⚠️ AUDIT" if seller["sellerType"] == "specialty_store" else "✅"
        print(f"{flag} {seller['sellerName']} → {seller['sellerType']} (service: {seller['serviceScore']})")

That's the whole audit. Two API calls, classified output, ready to feed into Slack alerts / spreadsheet exports / BI dashboards.

Honest pricing — what does this cost in production?

Workflow	Volume / month	Cost / month
Brand watchlist — 50 SKUs daily	1,500 product details	~$12
Brand authorization audit	500 sellers, monthly	~$10
Competitive pricing — 200 SKUs daily	6,000 product details	~$48
Competitive pricing — 200 SKUs hourly	144,000 product details	~$1,152
Gray-market sweep — 200 SKUs + 50 sellers	200 details + 50 sellers	~$2.60

Indie brand teams typically run the daily/monthly workflows ($10-60/month). Hedge-fund alt-data and agency-scale customers run hourly or 15-minute refreshes (low four figures monthly). Both work on the same Actor with the same event-priced billing.

What this Actor doesn't do

Two honesty disclosures:

No search discovery. You bring the SKU list. Discovery requires a different scraping pattern that doesn't survive shared residential proxy pools the way product detail and seller store do.
No review scraping. Same reason — JD's WAF gates the review API at the IP-reputation level on shared pools. If you need review sentiment, the Apify Store has other scrapers, or contact me for a premium-proxy integration.

The README on the Actor page documents this in a "Known limitations" section. If your workflow needs either, this Actor isn't the right tool.

Why I built this

I run a portfolio of six Chinese-platform scrapers on Apify Store (zhorex). Five of them cover sentiment and content: Weibo for trending, RedNote (Xiaohongshu) for lifestyle, Bilibili for video, Douban for long-form reviews, Xueqiu for stock-cashtag discussion. The JD scraper extends the suite into commerce — the missing layer for buyers who already use the social ones for brand monitoring.

The six together are a stack. A consumer-electronics brand can track sentiment on Weibo, video reviews on Bilibili, lifestyle unboxings on RedNote, and gray-market resellers on JD — all on the same vendor, same billing, same API surface.

Try it

The Actor is live: zhorex/jd-scraper. Pay-per-event billing — no subscription, no setup fee. Run a small evaluation batch (the Apify Free plan includes monthly platform credit you can apply to the run) to confirm output quality on your SKU list before scaling up.

The rest of the Chinese Digital Intelligence Suite:

Weibo Scraper — pair with JD to catch when a SKU trends socially before stock-outs hit
RedNote Scraper — Chinese lifestyle unboxings; useful for fashion, beauty, baby, home brands
Bilibili Scraper — video reviews; especially valuable for tech and consumer electronics SKUs
Xueqiu Scraper — Chinese retail-investor sentiment; pair if you trade JD stock (NASDAQ:JD) alongside operational metrics
Douban Scraper — long-form film / book / music reviews; less relevant for commerce but useful for IP / entertainment teams

If you ship a brand-monitoring workflow on top of any of these, drop a comment with what you're tracking. If this saved you the time of building an integration from scratch, a heart on the post or a follow keeps these writeups coming.

DEV Community