Sami

Posted on May 20

Synthesio charges $36K+/year for Chinese platform coverage. I built one for $0.045/mention.

#python #webscraping #china #analytics

Synthesio sells Chinese platform coverage for $36K+/year. Brandwatch and Meltwater sit in roughly the same $24K-80K/year band. I built an Apify Actor that does the equivalent core job — Weibo, RedNote, Bilibili, Douban, Xueqiu — for $0.045 per deduplicated mention, billed pay-as-you-go.

If you've ever tried to DIY this, you know the math. Five Chinese platforms means five different parsers, five different rate-limit dances, five different schema-drift surprises every couple of weeks, and zero deduplication when a KOL reposts the same content across all of them. By the time you've normalized author identity, follower counts, and timestamps into a usable cross-platform record, you've built a small distributed system that breaks every other Tuesday.

The pitch for zhorex/chinese-brand-monitor is simple: one API call, one normalized schema, one PPE event per canonical mention. You pass a brand keyword (Chinese or English), get back deduplicated records with sentiment scores and reach signals across all five platforms. You don't write per-platform code. You don't run five cron jobs. You don't pay an enterprise floor.

This post walks through six concrete workflows with runnable Python — brand health, crisis monitoring, KOL discovery, hedge fund alt-data, AI training corpora, and a cross-tool finance signal — so you can decide if this fits your stack.

What it does

The Actor takes a single brand keyword (or a list of keywords) and returns deduplicated, sentiment-scored mentions from five Chinese platforms:

Weibo — China's largest microblog; broad consumer chatter
RedNote / Xiaohongshu (小红书) — lifestyle and product discovery; heavy DTC signal
Bilibili — long-form video community; strong Gen-Z signal
Douban — long-form reviews, especially media and lifestyle
Xueqiu (雪球) — retail investor chatter, cashtag-tracked stock sentiment

The Actor handles:

Single keyword input — Chinese 护肤 or English Estée Lauder both work
Normalized cross-platform schema — same fields on every record, no per-platform parsing in your downstream code
Lexicon-based Chinese sentiment scoring per mention (polarity + score)
Cross-platform deduplication — when the same KOL reposts identical content on Weibo and RedNote, you get one canonical record with crossPlatformReposts listing the other appearances
Author identity normalization with follower count for reach-weighted analysis

Engineering choices worth knowing: a browser-grade HTTP client, polite rate limiting, session warming, and a public-data scope that respects each platform's accessible surface. The point is that you don't have to think about any of that — you call the Actor, you get records.

Six concrete workflows

a) Brand health dashboard (~$135/mo)

Daily 8am cron, single brand, 7-day rolling lookback. Push to Looker, Metabase, or a Notion database. Compare this to a $4K/mo Synthesio seat for the same functional coverage.

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient("YOUR_APIFY_TOKEN")

run_input = {
    "brandKeyword": "Estée Lauder",
    "platforms": ["weibo", "rednote", "bilibili", "douban", "xueqiu"],
    "lookbackDays": 7,
    "maxMentionsPerPlatform": 600,
    "sentimentAnalysis": True,
    "deduplication": True,
}

run = client.actor("zhorex/chinese-brand-monitor").call(run_input=run_input)
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())

df = pd.DataFrame(items)
df["polarity"] = df["sentiment"].apply(lambda s: s["polarity"])
summary = (
    df.groupby(["platform", "polarity"])
      .agg(mentions=("mentionId", "count"),
           reach=("authorFollowerCount", "sum"))
      .reset_index()
)
print(summary)

The grouped DataFrame is what you push to your BI tool. ~3,000 deduplicated mentions/month at this cadence lands around $135 in PPE charges.

b) Crisis monitoring (~$270/mo)

Hourly cron, 1-day lookback, filter for negative polarity from accounts above 10K followers. Slack webhook fires on match. This is the workflow that justifies the spend during a product recall, a CEO quote going viral, or a competitor smear campaign.

import requests
from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")
SLACK_WEBHOOK = "https://hooks.slack.com/services/XXX/YYY/ZZZ"

run = client.actor("zhorex/chinese-brand-monitor").call(run_input={
    "brandKeyword": "Estée Lauder",
    "platforms": ["weibo", "rednote", "bilibili", "douban", "xueqiu"],
    "lookbackDays": 1,
    "maxMentionsPerPlatform": 200,
    "sentimentAnalysis": True,
})

for m in client.dataset(run["defaultDatasetId"]).iterate_items():
    if m["sentiment"]["polarity"] == "negative" and m["authorFollowerCount"] >= 10000:
        requests.post(SLACK_WEBHOOK, json={
            "text": (
                f"[{m['platform']}] {m['authorName']} "
                f"({m['authorFollowerCount']:,} followers) — "
                f"sentiment {m['sentiment']['score']:.2f}\n"
                f"{m['contentSnippet']}\n{m['url']}"
            )
        })

Hourly × 24 × 30 ≈ ~6,000 deduplicated mentions/month if the brand has steady chatter — roughly $270/mo. Cheap insurance for a comms team.

c) KOL identification (~$90/mo)

Weekly category-keyword run. Skincare = 护肤, sneakers = 球鞋, supplements = 保健品. Filter verified authors above 50K followers, sort by engagement.

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("zhorex/chinese-brand-monitor").call(run_input={
    "brandKeyword": "护肤",
    "platforms": ["weibo", "rednote", "bilibili", "douban"],
    "lookbackDays": 14,
    "maxMentionsPerPlatform": 500,
})

df = pd.DataFrame(list(client.dataset(run["defaultDatasetId"]).iterate_items()))
df["engagement"] = df["engagementMetrics"].apply(
    lambda e: e.get("likes", 0) + e.get("comments", 0) * 3 + e.get("shares", 0) * 5
)

candidates = (
    df[(df["authorVerified"]) & (df["authorFollowerCount"] >= 50000)]
      .sort_values("engagement", ascending=False)
      .drop_duplicates("authorId")
      .head(50)
)
candidates[
    ["authorName", "platform", "authorFollowerCount", "engagement", "url"]
].to_csv("kol_candidates.csv", index=False)

Weekly cadence on 1-2 category keywords ≈ ~2,000 mentions/month — roughly $90/mo. The output is a ranked candidate list your social team can outreach directly.

d) Hedge fund alt-data (~$990/mo)

Daily run across 20 portfolio tickers on Xueqiu + Weibo + RedNote. Build a sentiment-velocity feature: 7-day mention-count delta paired with polarity shift. Join two consecutive runs to compute the velocity.

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient("YOUR_APIFY_TOKEN")

tickers = [
    "BABA", "PDD", "JD", "BIDU", "NIO", "XPEV", "LI", "MEITUAN",
    "TENCENT", "BYD", "LKNCY", "TME", "BILI", "VIPS", "TAL", "YMM",
    "DIDI", "ZH", "NTES", "FUTU",
]

def pull(lookback_days):
    rows = []
    for ticker in tickers:
        run = client.actor("zhorex/chinese-brand-monitor").call(run_input={
            "brandKeyword": ticker,
            "platforms": ["xueqiu", "weibo", "rednote"],
            "lookbackDays": lookback_days,
            "sentimentAnalysis": True,
        })
        for item in client.dataset(run["defaultDatasetId"]).iterate_items():
            rows.append(item)
    return pd.DataFrame(rows)

today = pull(1)
week = pull(7)

def agg(df):
    df = df.copy()
    df["score"] = df["sentiment"].apply(lambda s: s["score"])
    return df.groupby("brandKeyword").agg(
        count=("mentionId", "count"),
        avg_polarity=("score", "mean"),
    )

today_agg = agg(today)
week_agg = agg(week)
features = today_agg.join(week_agg, lsuffix="_1d", rsuffix="_7d")
features["velocity"] = features["count_1d"] - (features["count_7d"] / 7)
features["polarity_shift"] = features["avg_polarity_1d"] - features["avg_polarity_7d"]
print(features.sort_values("velocity", ascending=False))

20 tickers × daily × 3 platforms ≈ ~22K mentions/month — roughly $990/mo. Compare to a single Bloomberg terminal at ~$28K/year for one analyst.

e) AI training corpus (~$2,250 one-shot)

50 brand keywords × 1,000 mentions each = 50K Chinese-language labeled records for SFT or RLHF corpora. Every record has an explicit sentiment polarity, author follower bracket, and platform. Compare to $15-50K academic licensing fees for comparable annotated Chinese sentiment corpora.

from apify_client import ApifyClient
import json

client = ApifyClient("YOUR_APIFY_TOKEN")

brands = [
    "华为", "小米", "比亚迪", "蔚来", "理想汽车", "拼多多", "美团",
    "完美日记", "花西子", "钟薛高", "元气森林", "瑞幸咖啡", "海底捞",
    # ... 50 total
]

with open("china_sft_corpus.jsonl", "w", encoding="utf-8") as f:
    for brand in brands:
        run = client.actor("zhorex/chinese-brand-monitor").call(run_input={
            "brandKeyword": brand,
            "platforms": ["weibo", "rednote", "bilibili", "douban", "xueqiu"],
            "lookbackDays": 30,
            "maxMentionsPerPlatform": 200,
            "sentimentAnalysis": True,
        })
        for m in client.dataset(run["defaultDatasetId"]).iterate_items():
            f.write(json.dumps({
                "text": m["content"],
                "label": m["sentiment"]["polarity"],
                "score": m["sentiment"]["score"],
                "platform": m["platform"],
                "brand": m["brandKeyword"],
            }, ensure_ascii=False) + "\n")

50K records × $0.045 = $2,250. One-shot. No annotator contracts, no FTE-month spent labeling.

f) Cross-tool finance signal: Xueqiu sentiment × TradingView price

Pair the Chinese Brand Monitor with the TradingView Scraper for a sentiment-vs-price divergence signal. When Xueqiu retail sentiment turns sharply positive while the price stays flat or drifts down, you have a setup worth a closer look.

from apify_client import ApifyClient
import pandas as pd

client = ApifyClient("YOUR_APIFY_TOKEN")

sent_run = client.actor("zhorex/chinese-brand-monitor").call(run_input={
    "brandKeyword": "BABA",
    "platforms": ["xueqiu"],
    "lookbackDays": 7,
    "sentimentAnalysis": True,
})
sent = pd.DataFrame(list(client.dataset(sent_run["defaultDatasetId"]).iterate_items()))
sent_score_7d = sent["sentiment"].apply(lambda s: s["score"]).mean()

price_run = client.actor("zhorex/tradingview-scraper").call(run_input={
    "mode": "technical_analysis",
    "symbols": ["NYSE:BABA"],
    "includeIndicators": True,
})
price = next(iter(client.dataset(price_run["defaultDatasetId"]).iterate_items()))
perf_week_pct = (price.get("perfWeek") or 0) / 100

# Positive Xueqiu sentiment minus weekly price return: large positive = retail
# is loud-bullish but the tape hasn't caught up yet.
divergence = sent_score_7d - perf_week_pct
print({
    "ticker": "BABA",
    "xueqiu_sentiment_7d": round(sent_score_7d, 3),
    "tradingview_perfWeek_pct": price.get("perfWeek"),
    "divergence": round(divergence, 3),
})

A positive divergence row is "sentiment positive, price not yet moved." That's the setup quants pay alt-data brokers tens of thousands a year to surface.

Normalized output schema

Every record across every platform has this shape:

{
  "mentionId": "rednote_8b3c2f91a4",
  "platform": "rednote",
  "brandKeyword": "Estée Lauder",
  "brandMatchType": "exact",
  "content": "雅诗兰黛小棕瓶用了三个月，肌肤紧致很多...",
  "contentSnippet": "雅诗兰黛小棕瓶用了三个月...",
  "language": "zh-CN",
  "authorId": "rednote_user_4429871",
  "authorName": "小琳护肤日记",
  "authorFollowerCount": 184230,
  "authorVerified": true,
  "publishedAt": "2026-05-18T14:23:11Z",
  "engagementMetrics": {
    "likes": 2104,
    "comments": 187,
    "shares": 56,
    "views": 18430
  },
  "url": "https://www.xiaohongshu.com/explore/...",
  "mediaUrls": ["https://sns-img-...jpg"],
  "sentiment": { "polarity": "positive", "score": 0.78, "method": "lexicon" },
  "crossPlatformReposts": [
    {
      "platform": "weibo",
      "url": "https://weibo.com/...",
      "publishedAt": "2026-05-18T15:02:00Z"
    }
  ],
  "scrapedAt": "2026-05-20T08:00:01Z"
}

Your downstream code stays platform-agnostic. Pandas, BigQuery, Snowflake, ClickHouse — pick your warehouse and the records load directly.

Pricing

$0.045 per canonical mention, billed only after deduplication. If a KOL reposts the same content across Weibo + RedNote + Bilibili, that's one billable mention with the reposts attached, not three.

Use case	Volume	Monthly cost	Enterprise alternative
Single brand, daily, 7-day lookback	~3K/mo	~$135	$4K/mo Synthesio seat
5-brand agency, daily, sentiment + dedup	~15K/mo	~$675	$24K-80K/year
20-ticker hedge fund	~22K/mo	~$990	$28K/year Bloomberg seat
AI training corpus one-shot	50K	~$2,250	$15K-50K academic license

What this Actor does NOT do

Honest scoping matters more than pitch volume:

Not real-time push streaming. Cron-based polling, 5-minute minimum interval. If you need sub-second push, this isn't it.
Not a historical archive. Maximum 30-day lookback. For multi-year backfill, you need a different tool.
Not authentication-walled content. No Zhihu authenticated answers, no private WeChat groups, no closed Weibo Super Topic posts.
Not a CRM or BI tool. This is the data layer. You bring the dashboard.

If those constraints are dealbreakers for your use case, save the credit and don't run it.

The broader China stack

The main Actor here is zhorex/chinese-brand-monitor, but the rest of the stack exists for cases when you need single-platform depth or a different angle:

For deeper single-platform RedNote dives — full creator profiles, comment threads, hashtag networks — reach for the standalone RedNote/Xiaohongshu Scraper.
For Weibo-only bulk pulls — historical hashtag sweeps, single-account timelines, Super Topic posts — the Weibo Scraper is the dedicated tool.
For Bilibili-only deep pulls — video metadata, danmaku, UP主 channel coverage — use the Bilibili Scraper.
For finance-only sentiment with cashtag granularity and reply trees, the Xueqiu Scraper goes deeper than the brand-monitor surface.
For long-form review extraction, especially books, films, and lifestyle, the Douban Scraper handles the review-thread structure.
For the cross-tool finance workflow above, the TradingView Scraper provides the price half of the sentiment-vs-price divergence signal.
If you're tracking brand mentions, you usually also want competitor pricing — the JD Scraper covers the e-commerce price side of the China stack.

Try it

$5 of Apify free credits cover roughly 100 mentions — enough to run a single brand for a week and see whether the output shape fits your downstream code. Start here: zhorex/chinese-brand-monitor.

If you build something on top of this — a Looker dashboard, a Slack bot, a Streamlit explorer, a sentiment ETF screen — drop a comment, or open an Issue on the Actor page. Schema customization, missing platforms, follower-bracket additions, new sentiment lexicons — those are the kinds of changes that get prioritized when users ask for them.

DEV Community