<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sami</title>
    <description>The latest articles on DEV Community by Sami (@sami_8858131362756585e4f4).</description>
    <link>https://dev.to/sami_8858131362756585e4f4</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3877584%2F63d2c24c-ec4e-457f-8a71-2b79bb969554.png</url>
      <title>DEV Community: Sami</title>
      <link>https://dev.to/sami_8858131362756585e4f4</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sami_8858131362756585e4f4"/>
    <language>en</language>
    <item>
      <title>Synthesio charges $36K+/year for Chinese platform coverage. I built one for $0.045/mention.</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Wed, 20 May 2026 01:46:55 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/synthesio-charges-36kyear-for-chinese-platform-coverage-i-built-one-for-0045mention-4d1l</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/synthesio-charges-36kyear-for-chinese-platform-coverage-i-built-one-for-0045mention-4d1l</guid>
      <description>&lt;p&gt;Synthesio sells Chinese platform coverage for $36K+/year. Brandwatch and Meltwater sit in roughly the same $24K-80K/year band. I built an Apify Actor that does the equivalent core job — Weibo, RedNote, Bilibili, Douban, Xueqiu — for $0.045 per deduplicated mention, billed pay-as-you-go.&lt;/p&gt;

&lt;p&gt;If you've ever tried to DIY this, you know the math. Five Chinese platforms means five different parsers, five different rate-limit dances, five different schema-drift surprises every couple of weeks, and zero deduplication when a KOL reposts the same content across all of them. By the time you've normalized author identity, follower counts, and timestamps into a usable cross-platform record, you've built a small distributed system that breaks every other Tuesday.&lt;/p&gt;

&lt;p&gt;The pitch for &lt;code&gt;zhorex/chinese-brand-monitor&lt;/code&gt; is simple: one API call, one normalized schema, one PPE event per canonical mention. You pass a brand keyword (Chinese or English), get back deduplicated records with sentiment scores and reach signals across all five platforms. You don't write per-platform code. You don't run five cron jobs. You don't pay an enterprise floor.&lt;/p&gt;

&lt;p&gt;This post walks through six concrete workflows with runnable Python — brand health, crisis monitoring, KOL discovery, hedge fund alt-data, AI training corpora, and a cross-tool finance signal — so you can decide if this fits your stack.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it does
&lt;/h2&gt;

&lt;p&gt;The Actor takes a single brand keyword (or a list of keywords) and returns deduplicated, sentiment-scored mentions from five Chinese platforms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Weibo&lt;/strong&gt; — China's largest microblog; broad consumer chatter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RedNote / Xiaohongshu (小红书)&lt;/strong&gt; — lifestyle and product discovery; heavy DTC signal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bilibili&lt;/strong&gt; — long-form video community; strong Gen-Z signal&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Douban&lt;/strong&gt; — long-form reviews, especially media and lifestyle&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Xueqiu (雪球)&lt;/strong&gt; — retail investor chatter, cashtag-tracked stock sentiment&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Actor handles:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Single keyword input — Chinese &lt;code&gt;护肤&lt;/code&gt; or English &lt;code&gt;Estée Lauder&lt;/code&gt; both work&lt;/li&gt;
&lt;li&gt;Normalized cross-platform schema — same fields on every record, no per-platform parsing in your downstream code&lt;/li&gt;
&lt;li&gt;Lexicon-based Chinese sentiment scoring per mention (polarity + score)&lt;/li&gt;
&lt;li&gt;Cross-platform deduplication — when the same KOL reposts identical content on Weibo and RedNote, you get one canonical record with &lt;code&gt;crossPlatformReposts&lt;/code&gt; listing the other appearances&lt;/li&gt;
&lt;li&gt;Author identity normalization with follower count for reach-weighted analysis&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Engineering choices worth knowing: a browser-grade HTTP client, polite rate limiting, session warming, and a public-data scope that respects each platform's accessible surface. The point is that you don't have to think about any of that — you call the Actor, you get records.&lt;/p&gt;

&lt;h2&gt;
  
  
  Six concrete workflows
&lt;/h2&gt;

&lt;h3&gt;
  
  
  a) Brand health dashboard (~$135/mo)
&lt;/h3&gt;

&lt;p&gt;Daily 8am cron, single brand, 7-day rolling lookback. Push to Looker, Metabase, or a Notion database. Compare this to a $4K/mo Synthesio seat for the same functional coverage.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;run_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brandKeyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Estée Lauder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platforms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weibo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rednote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bilibili&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;douban&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xueqiu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookbackDays&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxMentionsPerPlatform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentimentAnalysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deduplication&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/chinese-brand-monitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;items&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;polarity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;polarity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;summary&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;polarity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;mentions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mentionId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
           &lt;span class="n"&gt;reach&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorFollowerCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sum&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reset_index&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;summary&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The grouped DataFrame is what you push to your BI tool. ~3,000 deduplicated mentions/month at this cadence lands around $135 in PPE charges.&lt;/p&gt;

&lt;h3&gt;
  
  
  b) Crisis monitoring (~$270/mo)
&lt;/h3&gt;

&lt;p&gt;Hourly cron, 1-day lookback, filter for negative polarity from accounts above 10K followers. Slack webhook fires on match. This is the workflow that justifies the spend during a product recall, a CEO quote going viral, or a competitor smear campaign.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;SLACK_WEBHOOK&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://hooks.slack.com/services/XXX/YYY/ZZZ&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/chinese-brand-monitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brandKeyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Estée Lauder&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platforms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weibo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rednote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bilibili&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;douban&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xueqiu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookbackDays&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxMentionsPerPlatform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentimentAnalysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;polarity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;negative&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorFollowerCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SLACK_WEBHOOK&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;authorName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;authorFollowerCount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; followers) — &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;contentSnippet&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hourly × 24 × 30 ≈ ~6,000 deduplicated mentions/month if the brand has steady chatter — roughly $270/mo. Cheap insurance for a comms team.&lt;/p&gt;

&lt;h3&gt;
  
  
  c) KOL identification (~$90/mo)
&lt;/h3&gt;

&lt;p&gt;Weekly category-keyword run. Skincare = &lt;code&gt;护肤&lt;/code&gt;, sneakers = &lt;code&gt;球鞋&lt;/code&gt;, supplements = &lt;code&gt;保健品&lt;/code&gt;. Filter verified authors above 50K followers, sort by engagement.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/chinese-brand-monitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brandKeyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;护肤&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platforms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weibo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rednote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bilibili&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;douban&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookbackDays&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;14&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxMentionsPerPlatform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagementMetrics&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;likes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;e&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;shares&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorVerified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorFollowerCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;50000&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;drop_duplicates&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
      &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;head&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorFollowerCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_csv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;kol_candidates.csv&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;index&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Weekly cadence on 1-2 category keywords ≈ ~2,000 mentions/month — roughly $90/mo. The output is a ranked candidate list your social team can outreach directly.&lt;/p&gt;

&lt;h3&gt;
  
  
  d) Hedge fund alt-data (~$990/mo)
&lt;/h3&gt;

&lt;p&gt;Daily run across 20 portfolio tickers on Xueqiu + Weibo + RedNote. Build a sentiment-velocity feature: 7-day mention-count delta paired with polarity shift. Join two consecutive runs to compute the velocity.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tickers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BABA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PDD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;JD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BIDU&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NIO&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;XPEV&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MEITUAN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TENCENT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BYD&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;LKNCY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TME&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BILI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;VIPS&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YMM&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;DIDI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ZH&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NTES&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;FUTU&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lookback_days&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;rows&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;ticker&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;tickers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/chinese-brand-monitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brandKeyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ticker&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platforms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xueqiu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weibo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rednote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookbackDays&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;lookback_days&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentimentAnalysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;rows&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;today&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;week&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;pull&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;copy&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;groupby&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brandKeyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;count&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mentionId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="n"&gt;avg_polarity&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mean&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;today_agg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;today&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;week_agg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;agg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;week&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;today_agg&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;week_agg&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;lsuffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_1d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;rsuffix&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;_7d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;velocity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count_1d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count_7d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;polarity_shift&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg_polarity_1d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;avg_polarity_7d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;features&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort_values&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;velocity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ascending&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;20 tickers × daily × 3 platforms ≈ ~22K mentions/month — roughly $990/mo. Compare to a single Bloomberg terminal at ~$28K/year for one analyst.&lt;/p&gt;

&lt;h3&gt;
  
  
  e) AI training corpus (~$2,250 one-shot)
&lt;/h3&gt;

&lt;p&gt;50 brand keywords × 1,000 mentions each = 50K Chinese-language labeled records for SFT or RLHF corpora. Every record has an explicit sentiment polarity, author follower bracket, and platform. Compare to $15-50K academic licensing fees for comparable annotated Chinese sentiment corpora.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;brands&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;华为&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;小米&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;比亚迪&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;蔚来&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;理想汽车&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;拼多多&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;美团&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;完美日记&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;花西子&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;钟薛高&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;元气森林&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;瑞幸咖啡&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;海底捞&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="c1"&gt;# ... 50 total
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;china_sft_corpus.jsonl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;brand&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;brands&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/chinese-brand-monitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brandKeyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;brand&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platforms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weibo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rednote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bilibili&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;douban&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xueqiu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookbackDays&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxMentionsPerPlatform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentimentAnalysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;label&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;polarity&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brand&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brandKeyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt; &lt;span class="n"&gt;ensure_ascii&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;50K records × $0.045 = $2,250. One-shot. No annotator contracts, no FTE-month spent labeling.&lt;/p&gt;

&lt;h3&gt;
  
  
  f) Cross-tool finance signal: Xueqiu sentiment × TradingView price
&lt;/h3&gt;

&lt;p&gt;Pair the Chinese Brand Monitor with &lt;a href="https://apify.com/zhorex/tradingview-scraper" rel="noopener noreferrer"&gt;the TradingView Scraper&lt;/a&gt; for a sentiment-vs-price divergence signal. When Xueqiu retail sentiment turns sharply positive while the price stays flat or drifts down, you have a setup worth a closer look.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;pandas&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;sent_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/chinese-brand-monitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;brandKeyword&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BABA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platforms&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xueqiu&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;lookbackDays&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentimentAnalysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;sent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;DataFrame&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sent_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="n"&gt;sent_score_7d&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sent&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sentiment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;apply&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;score&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;mean&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="n"&gt;price_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/tradingview-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;technical_analysis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;symbols&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;NYSE:BABA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;includeIndicators&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;price&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;iter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;span class="n"&gt;perf_week_pct&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;perfWeek&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;

&lt;span class="c1"&gt;# Positive Xueqiu sentiment minus weekly price return: large positive = retail
# is loud-bullish but the tape hasn't caught up yet.
&lt;/span&gt;&lt;span class="n"&gt;divergence&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sent_score_7d&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;perf_week_pct&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ticker&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BABA&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;xueqiu_sentiment_7d&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;sent_score_7d&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tradingview_perfWeek_pct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;perfWeek&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;divergence&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;divergence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A positive divergence row is "sentiment positive, price not yet moved." That's the setup quants pay alt-data brokers tens of thousands a year to surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Normalized output schema
&lt;/h2&gt;

&lt;p&gt;Every record across every platform has this shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mentionId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rednote_8b3c2f91a4"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rednote"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"brandKeyword"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Estée Lauder"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"brandMatchType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"exact"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"雅诗兰黛小棕瓶用了三个月，肌肤紧致很多..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"contentSnippet"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"雅诗兰黛小棕瓶用了三个月..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"zh-CN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authorId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rednote_user_4429871"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authorName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"小琳护肤日记"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authorFollowerCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;184230&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"authorVerified"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"publishedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-18T14:23:11Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"engagementMetrics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"likes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2104&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"comments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;187&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"shares"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;56&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"views"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18430&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.xiaohongshu.com/explore/..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mediaUrls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"https://sns-img-...jpg"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sentiment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"polarity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"positive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.78&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"method"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"lexicon"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"crossPlatformReposts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"weibo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://weibo.com/..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"publishedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-18T15:02:00Z"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scrapedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-20T08:00:01Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Your downstream code stays platform-agnostic. Pandas, BigQuery, Snowflake, ClickHouse — pick your warehouse and the records load directly.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;$0.045 per canonical mention, billed only after deduplication. If a KOL reposts the same content across Weibo + RedNote + Bilibili, that's one billable mention with the reposts attached, not three.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Volume&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;th&gt;Enterprise alternative&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Single brand, daily, 7-day lookback&lt;/td&gt;
&lt;td&gt;~3K/mo&lt;/td&gt;
&lt;td&gt;~$135&lt;/td&gt;
&lt;td&gt;$4K/mo Synthesio seat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5-brand agency, daily, sentiment + dedup&lt;/td&gt;
&lt;td&gt;~15K/mo&lt;/td&gt;
&lt;td&gt;~$675&lt;/td&gt;
&lt;td&gt;$24K-80K/year&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20-ticker hedge fund&lt;/td&gt;
&lt;td&gt;~22K/mo&lt;/td&gt;
&lt;td&gt;~$990&lt;/td&gt;
&lt;td&gt;$28K/year Bloomberg seat&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AI training corpus one-shot&lt;/td&gt;
&lt;td&gt;50K&lt;/td&gt;
&lt;td&gt;~$2,250&lt;/td&gt;
&lt;td&gt;$15K-50K academic license&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  What this Actor does NOT do
&lt;/h2&gt;

&lt;p&gt;Honest scoping matters more than pitch volume:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not real-time push streaming.&lt;/strong&gt; Cron-based polling, 5-minute minimum interval. If you need sub-second push, this isn't it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a historical archive.&lt;/strong&gt; Maximum 30-day lookback. For multi-year backfill, you need a different tool.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not authentication-walled content.&lt;/strong&gt; No Zhihu authenticated answers, no private WeChat groups, no closed Weibo Super Topic posts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a CRM or BI tool.&lt;/strong&gt; This is the data layer. You bring the dashboard.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If those constraints are dealbreakers for your use case, save the credit and don't run it.&lt;/p&gt;

&lt;h2&gt;
  
  
  The broader China stack
&lt;/h2&gt;

&lt;p&gt;The main Actor here is &lt;a href="https://apify.com/zhorex/chinese-brand-monitor" rel="noopener noreferrer"&gt;zhorex/chinese-brand-monitor&lt;/a&gt;, but the rest of the stack exists for cases when you need single-platform depth or a different angle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;For deeper single-platform RedNote dives — full creator profiles, comment threads, hashtag networks — reach for &lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;the standalone RedNote/Xiaohongshu Scraper&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For Weibo-only bulk pulls — historical hashtag sweeps, single-account timelines, Super Topic posts — &lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;the Weibo Scraper&lt;/a&gt; is the dedicated tool.&lt;/li&gt;
&lt;li&gt;For Bilibili-only deep pulls — video metadata, danmaku, UP主 channel coverage — use &lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;the Bilibili Scraper&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;For finance-only sentiment with cashtag granularity and reply trees, &lt;a href="https://apify.com/zhorex/xueqiu-scraper" rel="noopener noreferrer"&gt;the Xueqiu Scraper&lt;/a&gt; goes deeper than the brand-monitor surface.&lt;/li&gt;
&lt;li&gt;For long-form review extraction, especially books, films, and lifestyle, &lt;a href="https://apify.com/zhorex/douban-scraper" rel="noopener noreferrer"&gt;the Douban Scraper&lt;/a&gt; handles the review-thread structure.&lt;/li&gt;
&lt;li&gt;For the cross-tool finance workflow above, &lt;a href="https://apify.com/zhorex/tradingview-scraper" rel="noopener noreferrer"&gt;the TradingView Scraper&lt;/a&gt; provides the price half of the sentiment-vs-price divergence signal.&lt;/li&gt;
&lt;li&gt;If you're tracking brand mentions, you usually also want competitor pricing — &lt;a href="https://apify.com/zhorex/jd-scraper" rel="noopener noreferrer"&gt;the JD Scraper&lt;/a&gt; covers the e-commerce price side of the China stack.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;$5 of Apify free credits cover roughly 100 mentions — enough to run a single brand for a week and see whether the output shape fits your downstream code. Start here: &lt;a href="https://apify.com/zhorex/chinese-brand-monitor" rel="noopener noreferrer"&gt;zhorex/chinese-brand-monitor&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;If you build something on top of this — a Looker dashboard, a Slack bot, a Streamlit explorer, a sentiment ETF screen — drop a comment, or open an Issue on the Actor page. Schema customization, missing platforms, follower-bracket additions, new sentiment lexicons — those are the kinds of changes that get prioritized when users ask for them.&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>china</category>
      <category>analytics</category>
    </item>
    <item>
      <title>Pinnacle odds for $0.01 a snapshot: the OddsJam / Odds API replacement sharp bettors are using in 2026</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Tue, 19 May 2026 13:54:32 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/pinnacle-odds-for-001-a-snapshot-the-oddsjam-odds-api-replacement-sharp-bettors-are-using-in-3kgl</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/pinnacle-odds-for-001-a-snapshot-the-oddsjam-odds-api-replacement-sharp-bettors-are-using-in-3kgl</guid>
      <description>&lt;p&gt;If you bet sharp lines, the only book that genuinely matters for fair-value is Pinnacle. Every EV model, every CLV report, every "did I beat the close?" check eventually compresses down to one question: what was Pinnacle showing on this market at T-1?&lt;/p&gt;

&lt;p&gt;For years the standard way to get that feed was The Odds API ($249/mo for 15M credits) or OddsJam Gold ($249/mo, $499+ for Pro). For a tipster shop polling 100 fixtures a day that math is tolerable. For a solo bettor running CLV on 20 fixtures it's overspend. For a specials trader it's worse — OddsJam gates futures and yes/no markets behind their highest tier and The Odds API doesn't surface most of them at all.&lt;/p&gt;

&lt;p&gt;There's now an Apify Actor that does the same job pay-per-snapshot:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;zhorex/sports-odds-aggregator&lt;/strong&gt; — Pinnacle h2h + spreads + totals + 5,000+ specials per sport, from $0.01 a snapshot. Datacenter-proxy friendly. No login, no monthly minimum.&lt;/p&gt;

&lt;p&gt;This post is the playbook: four recipes that show exactly how to run it, what each costs, and where the savings show up vs. the SaaS-incumbent pricing.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Quick note on naming: the Actor's title still references Bet365 because Bet365 is the second-book slot, but Bet365's public mobile-web path is under repair as of May 2026. Pinnacle is shipping today, and the moment Bet365 returns the cross-book best-price flag (&lt;code&gt;isBestPriceAcrossBooks&lt;/code&gt;) and fuzzy event-matching activate automatically — no input change on your side.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Pricing in plain English
&lt;/h2&gt;

&lt;p&gt;Four event types, billed pay-per-event (PPE):&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Event&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;When it fires&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;odds-snapshot-pre-match&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$0.01 / snapshot&lt;/td&gt;
&lt;td&gt;One market-outcome from a scheduled (not in-play) fixture&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;odds-snapshot-live&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$0.02 / snapshot&lt;/td&gt;
&lt;td&gt;One market-outcome from a live (in-play) match&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;odds-snapshot-player-prop&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$0.04 / snapshot&lt;/td&gt;
&lt;td&gt;One special / future / yes-no / team prop / exact-totals row&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;scheduled-run&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;$0.05 / run&lt;/td&gt;
&lt;td&gt;Once per cron tick — often fully offset by the dedup window&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;A typical pre-match fixture with &lt;code&gt;["h2h", "spreads", "totals"]&lt;/code&gt; emits ~7 snapshots (3 h2h outcomes + 2 spreads + 2 totals). Add &lt;code&gt;"specials"&lt;/code&gt; and you get an extra 30–80 rows per fixture — yes/no markets, exact totals, first-team-to-score, winning margin per scoreline, team props.&lt;/p&gt;

&lt;p&gt;The bit that turns this from "interesting" to "actually cheap": the &lt;code&gt;deduplicationWindowSeconds&lt;/code&gt; setting suppresses snapshots when the line hasn't moved. On stable mid-week Premier League pre-match polls you typically charge for 5–15% of "naïve" volume. A 60-second cron on a stable line is essentially free.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recipe 1 — Pinnacle closing-line value (CLV) tracker
&lt;/h2&gt;

&lt;p&gt;The recipe that pays for the Actor in its first weekend.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;mode: "pre_match_only"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Trigger at T-30 minutes and T-1 minute per fixture&lt;/li&gt;
&lt;li&gt;Bet your soft book at T-30, log Pinnacle's T-1 close, compute CLV per ticket&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pinnacle's closing line is the canonical sharp benchmark. If you're consistently beating Pinnacle's close, your edge is real. If you aren't, you can stop pretending — CLV is the ground truth of whether you're a winning bettor or a noise-trader.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Cost for 200 fixtures/week&lt;/strong&gt; (h2h+spreads+totals, ~7 snapshots × 2 polls each): &lt;strong&gt;~$65/month&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The Odds API doesn't expose "Pinnacle close at T-1" as a first-class field, so you're paying $249/mo for the feed and still rolling your own snapshot scheduler. Here the snapshot scheduler (Apify cron) and the snapshot itself together come in at ~25% of the price.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recipe 2 — EV-model live edge harvester
&lt;/h2&gt;

&lt;p&gt;The model-on-top use case. If you have a fair-value model and you harvest the moments where &lt;code&gt;book_price × your_fair_value &amp;gt; 1.03&lt;/code&gt;, you want a polling firehose during in-play, not an hourly dump.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"books"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"pinnacle"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"basketball"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"tennis"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"soccer"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"marketTypes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"h2h"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"spreads"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"totals"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"live_only"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"deduplicationWindowSeconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Schedule:&lt;/strong&gt; 60-second cron during target match windows.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Volume:&lt;/strong&gt; ~50K live snapshots/month × $0.02 + orchestration ≈ &lt;strong&gt;~$1,080 / month&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;That looks pricey until you put it next to OddsJam Pro at $499+/mo for a SaaS API you don't control and that throttles by tier. The trade is: you pay more per request, but you pay only for what you actually consume, you set the cadence, and a stable line costs you nothing.&lt;/p&gt;

&lt;p&gt;The other thing the SaaS won't sell you: every snapshot includes &lt;code&gt;isLive&lt;/code&gt;, &lt;code&gt;matchClock&lt;/code&gt;, and &lt;code&gt;matchScore&lt;/code&gt;. Your model doesn't have to join against a separate scoreboard feed during a live NBA fourth quarter.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recipe 3 — Specials sniper (the OddsJam gating trick)
&lt;/h2&gt;

&lt;p&gt;For value bettors and exact-totals modellers. This is the recipe where the pricing gap gets embarrassing.&lt;/p&gt;

&lt;p&gt;Pinnacle's &lt;code&gt;withSpecials=true&lt;/code&gt; matchups call returns &lt;strong&gt;~5,000 markets per major sport&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;First Team To Score (3-way)&lt;/li&gt;
&lt;li&gt;Win to Nil 1st Half (yes/no)&lt;/li&gt;
&lt;li&gt;Exact Total Goals 1st Half (multi-way)&lt;/li&gt;
&lt;li&gt;Winning Margin per scoreline&lt;/li&gt;
&lt;li&gt;A long tail of team props and player-related markets&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These are the markets soft books are slowest to sharpen up on — which is where the actual edge lives. OddsJam gates futures and props behind their highest tier. The Odds API doesn't surface most of them.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"books"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"pinnacle"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sports"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"soccer"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"marketTypes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"specials"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"mode"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pre_match_only"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"deduplicationWindowSeconds"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Schedule:&lt;/strong&gt; 4-hour cron during the season.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Volume:&lt;/strong&gt; ~6K specials snapshots + 180 runs ≈ &lt;strong&gt;~$250 / month&lt;/strong&gt; for the segment that powers the largest EV pockets in retail sports betting.&lt;/p&gt;

&lt;p&gt;A pattern that works: filter the dataset to &lt;code&gt;marketType == "specials" &amp;amp;&amp;amp; impliedProbability &amp;lt; 0.10&lt;/code&gt;. Pinnacle longshots above 10× implied with sharp money backing are where the soft-book mispricings concentrate.&lt;/p&gt;




&lt;h2&gt;
  
  
  Recipe 4 — Tipster Discord auto-poster
&lt;/h2&gt;

&lt;p&gt;The cheapest one and the easiest to sell to a small operation.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Setup:&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;sports: ["soccer"]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;leagueFilter: ["UEFA", "EPL", "La Liga"]&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;mode: "pre_match_only"&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Every 6 hours, webhook → Discord&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Cost:&lt;/strong&gt; ~$30/month for a daily top-10 spreads + totals digest piped straight into the channel. If you currently screenshot OddsJam into Discord by hand, this is the upgrade.&lt;/p&gt;




&lt;h2&gt;
  
  
  The pay-per-event math in one table
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;Volume&lt;/th&gt;
&lt;th&gt;Monthly cost&lt;/th&gt;
&lt;th&gt;Replaces&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Casual bettor — daily 9am pre-match dump, 30 fixtures&lt;/td&gt;
&lt;td&gt;~900 snapshots&lt;/td&gt;
&lt;td&gt;~$11&lt;/td&gt;
&lt;td&gt;$59/mo Odds API tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;CLV tracker — T-30 + T-1, 80 fixtures/wk&lt;/td&gt;
&lt;td&gt;~3.2K snapshots&lt;/td&gt;
&lt;td&gt;~$65&lt;/td&gt;
&lt;td&gt;$249/mo OddsJam Gold&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Tipster shop — 100 fixtures × 7 outcomes, hourly&lt;/td&gt;
&lt;td&gt;~21K snapshots&lt;/td&gt;
&lt;td&gt;~$245&lt;/td&gt;
&lt;td&gt;$249/mo OddsJam Gold (parity)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Specials trader — daily soccer sweep&lt;/td&gt;
&lt;td&gt;~6K snapshots&lt;/td&gt;
&lt;td&gt;~$250&lt;/td&gt;
&lt;td&gt;Highest-tier gate (not available below)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;EV model — live NBA + tennis + soccer, 90s cron&lt;/td&gt;
&lt;td&gt;~50K live snapshots&lt;/td&gt;
&lt;td&gt;~$1,200&lt;/td&gt;
&lt;td&gt;OddsJam Pro $499+ + you control cadence&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Flat-rate SaaS wins only once you cross ~150K snapshots/month of stable workload. Below that — which is most solo sharps, most tipster operations, and every specials trader — PPE is just cheaper, and the cost curve is linear in actual usage rather than tier-jumpy.&lt;/p&gt;

&lt;p&gt;The other PPE advantage that quietly compounds: there's no annual contract. Off-season for a sport? Cron stops, billing stops. You don't pay for unused capacity in August when soccer is dead.&lt;/p&gt;




&lt;h2&gt;
  
  
  Three steps to a running cron
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Step 1 — Pick your sports and markets.&lt;/strong&gt;&lt;br&gt;
Defaults are &lt;code&gt;["soccer", "tennis"]&lt;/code&gt; — the two highest-liquidity sharp markets year-round. For CLV add &lt;code&gt;"spreads", "totals"&lt;/code&gt;. For specials sniping add &lt;code&gt;"specials"&lt;/code&gt;. The full sport list is 11 deep (soccer, tennis, basketball, MMA, baseball, hockey, esports, AFL, NFL/college, golf, rugby).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 2 — Run once with default input&lt;/strong&gt; and verify Pinnacle returns data for your sport+league pick. Output lands in your Apify dataset.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Step 3 — Save as Task → Schedules → New Schedule&lt;/strong&gt; with the cron string you want:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;*/&lt;span class="m"&gt;5&lt;/span&gt; * * * *    &lt;span class="n"&gt;pre&lt;/span&gt;-&lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt; &lt;span class="n"&gt;minutes&lt;/span&gt;
* * * * *      &lt;span class="n"&gt;live&lt;/span&gt; &lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="n"&gt;minute&lt;/span&gt; &lt;span class="n"&gt;during&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="n"&gt;windows&lt;/span&gt;
&lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt; * * *      &lt;span class="n"&gt;daily&lt;/span&gt; &lt;span class="m"&gt;9&lt;/span&gt;&lt;span class="n"&gt;am&lt;/span&gt; &lt;span class="n"&gt;pre&lt;/span&gt;-&lt;span class="n"&gt;match&lt;/span&gt; &lt;span class="n"&gt;dump&lt;/span&gt;
&lt;span class="m"&gt;0&lt;/span&gt; */&lt;span class="m"&gt;6&lt;/span&gt; * * *    &lt;span class="n"&gt;every&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt; &lt;span class="n"&gt;hours&lt;/span&gt;
&lt;span class="m"&gt;0&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt; * * &lt;span class="m"&gt;6&lt;/span&gt;     &lt;span class="n"&gt;Saturday&lt;/span&gt; &lt;span class="n"&gt;morning&lt;/span&gt; &lt;span class="n"&gt;weekly&lt;/span&gt; &lt;span class="n"&gt;audit&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Attach a webhook to the schedule and ship the dataset into your EV pipeline, Discord/Slack bot, Sheets workbook, or wherever your model lives.&lt;/p&gt;




&lt;h2&gt;
  
  
  Python in 12 lines
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/sports-odds-aggregator&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;books&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pinnacle&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sports&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;soccer&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tennis&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;marketTypes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;h2h&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;spreads&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;totals&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;specials&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;pre_match_only&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxEventsPerSport&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;deduplicationWindowSeconds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;snapshot&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;marketType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;specials&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;impliedProbability&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mf"&gt;0.10&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;evaluate_for_bet&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;snapshot&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole integration. Every snapshot arrives in a flat per-market-outcome shape with &lt;code&gt;priceAmerican&lt;/code&gt;, &lt;code&gt;priceFractional&lt;/code&gt;, &lt;code&gt;price&lt;/code&gt; (decimal), &lt;code&gt;impliedProbability&lt;/code&gt;, and &lt;code&gt;isBestPriceAcrossBooks&lt;/code&gt; on every row — your model doesn't have to do format gymnastics or join against a separate American-odds conversion table.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a snapshot looks like
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"snapshotId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"a1b2c3d4e5f6789012345678"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"book"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"pinnacle"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"sport"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"soccer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"league"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Premier League"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"homeTeam"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Manchester City"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"awayTeam"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Liverpool"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"commenceTime"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-22T19:00:00Z"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"isLive"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"marketType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"h2h"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outcomeKey"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"home"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"outcomeLabel"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Home"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;1.91&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priceAmerican"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;-110&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"priceFractional"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"10/11"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"impliedProbability"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.52356&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"isBestPriceAcrossBooks"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scrapedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-18T14:32:00Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;snapshotId&lt;/code&gt; is a stable sha1 derived from book+event+market+outcome+timestamp, so it makes a clean primary key if you're persisting into Postgres / DuckDB.&lt;/p&gt;




&lt;h2&gt;
  
  
  For high-volume operations
&lt;/h2&gt;

&lt;p&gt;If your monthly burn is past 50K snapshots and you need a dedicated polling cadence, custom market types (Asian handicap quarter-lines, derivative props, fancy bets), or a schema SLA for a downstream production pipeline, the Actor page has an "Enterprise inquiry" pointer. Webhook integrations, dedicated proxy pools, and custom dataset views ship in roughly a week. Sustained seven-figure-action operations can talk dedicated-instance posture.&lt;/p&gt;

&lt;p&gt;For everyone else the default Apify Proxy works on Pinnacle's guest API — Pinnacle's public surface tolerates datacenter IPs by design (which is why it's on the supported-books list to begin with). If your plan includes datacenter, override &lt;code&gt;apifyProxyGroups: ["DATACENTER"]&lt;/code&gt; and your proxy cost drops to roughly 5% of a residential-default scraper.&lt;/p&gt;




&lt;h2&gt;
  
  
  Things worth knowing before you run it
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not US bookmakers.&lt;/strong&gt; DraftKings / FanDuel / BetMGM / Caesars / ESPN BET are geo-gated behind Akamai and need US residential proxy, which kills the per-snapshot economics. Other Apify Actors target those — this one stays out.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Personal-analysis use only.&lt;/strong&gt; Pinnacle's TOS forbids commercial redistribution of raw odds. The architecture is per-buyer-execution — you run it in your own Apify account against your own polling cadence. Don't resell the feed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a streaming WebSocket feed.&lt;/strong&gt; Poll-based, fastest meaningful cadence ~60s.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bet365 returns when Bet365 returns.&lt;/strong&gt; Cross-book best-price flag and fuzzy event-matching are already in the codebase; the day a second book ships, arb infra activates without an input change.&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Where to start
&lt;/h2&gt;

&lt;p&gt;If you currently pay The Odds API or OddsJam Gold $249/mo for the Pinnacle column, the cheapest experiment is:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Spin up the Actor with the default input.&lt;/li&gt;
&lt;li&gt;Run it on five of your usual fixtures.&lt;/li&gt;
&lt;li&gt;Compare the snapshots against whatever your incumbent feed gave you for the same fixtures.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The break-even comes faster than you'd expect — most workflows under 150K snapshots/month earn back the SaaS subscription inside the first month, and the dedup window keeps marginal cost near zero on stable lines.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Actor link:&lt;/strong&gt; &lt;a href="https://apify.com/zhorex/sports-odds-aggregator" rel="noopener noreferrer"&gt;apify.com/zhorex/sports-odds-aggregator&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If the Actor saves you a month of OddsJam Gold, the single highest-leverage thing you can do back is a 30-second review on the Actor page — it directly funds the next defensive patch when the books shift their schemas.&lt;/p&gt;

&lt;p&gt;Roadmap is public: Smarkets adapter (v0.4) reactivates cross-book arb infra, Pinnacle alternate-lines / period markets (v0.5) opens half/quarter handicap decomposition, Betfair Exchange BYO-credentials (v0.6), WebSocket mode (v0.7), automatic arb finder (v0.8).&lt;/p&gt;

</description>
      <category>apify</category>
      <category>pinnacle</category>
      <category>sportsbetting</category>
      <category>scraping</category>
    </item>
    <item>
      <title>$0.005 per Weibo post — the Chinese social data layer Western teams keep skipping</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Sun, 17 May 2026 16:54:46 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/0005-per-weibo-post-the-chinese-social-data-layer-western-teams-keep-skipping-699</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/0005-per-weibo-post-the-chinese-social-data-layer-western-teams-keep-skipping-699</guid>
      <description>&lt;p&gt;I shipped a Weibo scraper on Apify eight months ago. Fifteen customers pay me on it now, another thirty-four use the free tier, and in the last sixteen days they pulled 136,400 posts through it. I built it because every Western social-listening tool I evaluated — Synthesio, Brandwatch, Meltwater — quoted four to five figures a year for China coverage that was thinner than what you get from one tuned Apify run.&lt;/p&gt;

&lt;p&gt;The whole pitch is one number: &lt;strong&gt;$0.005 per post.&lt;/strong&gt; Pay only for items you actually take. The Apify free plan covers your first ~1,000 mentions before you spend a cent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Weibo Scraper on Apify Store →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  What's actually in the box
&lt;/h2&gt;

&lt;p&gt;Four modes. All return normalized JSON. No Weibo login. No API key from Weibo. No VPN.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;hot_search&lt;/code&gt; — the live hot-topics list, i.e. what 580M+ monthly active users are looking at right now. The single most-watched signal in Chinese social.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;search&lt;/code&gt; — keyword search across public posts. Brand names, ticker symbols, product launches, Chinese or English.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;post_comments&lt;/code&gt; — every public comment on a given post. Sentiment grenades and viral crises live here.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;user_posts&lt;/code&gt; — full posting history of any public account. KOL vetting, executive watch, competitor monitoring.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Output is flat JSON — post text, author handle, timestamp, repost / comment / like counts, media URLs. Push it straight into a warehouse, a Pandas DataFrame, or a Slack alert with a 30-line script.&lt;/p&gt;

&lt;h2&gt;
  
  
  What people actually pay for
&lt;/h2&gt;

&lt;p&gt;I see what runs every day on this actor. The patterns paying customers settle into:&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Equity / sentiment signal on China-listed names — ~$25/day, ~$750/month
&lt;/h3&gt;

&lt;p&gt;A small fund or research desk covering BABA, NIO, PDD, BILI, JD, BEKE, LI, XPEV, KWEB constituents, or any China-exposed Western name. Scheduled &lt;code&gt;search&lt;/code&gt; over 30-50 tickers and brand names, ~5,000 posts a day, fed into a sentiment model. Sentiment shifts on Weibo lead the Hong Kong open by hours. Dedicated enterprise social-listening contracts that even attempt China coverage start near $30K/year, and most don't index Weibo deeply.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Brand monitoring for Western brands in China — ~$15/day, ~$450/month
&lt;/h3&gt;

&lt;p&gt;A consumer brand with China exposure — Apple, Tesla, Nike, Starbucks, LVMH, Lululemon, any DTC brand on Tmall — needs ~3,000 mentions/day on brand and product-line keywords. Comments mode catches crisis posts before they trend. Synthesio / Brandwatch / Talkwalker contracts that include China typically run $30K-$100K/year. The same daily mention stream costs you less than a streaming subscription.&lt;/p&gt;

&lt;h3&gt;
  
  
  3. KOL / influencer due diligence — ~$1 per KOL
&lt;/h3&gt;

&lt;p&gt;Before you wire 50,000-200,000 RMB to a Weibo influencer for a sponsorship, run &lt;code&gt;user_posts&lt;/code&gt; against the handle. Look at posting cadence, real engagement (not vanity follower counts), brand affinity history, controversy flags. One avoided bad deal pays for years of usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  4. AI / LLM training data — ~1M posts = $5,000
&lt;/h3&gt;

&lt;p&gt;Real-world, conversational, dialect-rich Mandarin from public posts. Filtered Weibo subsets sell on data marketplaces for $20K-$50K and ship stale by months. Pull fresh data on the topics and time windows you care about, own the pipeline, and the per-post cost is a small fraction of either marketplace data or annotator-collected datasets.&lt;/p&gt;

&lt;h3&gt;
  
  
  5. M&amp;amp;A and pre-deal diligence on Chinese targets — $200-$500 one-off
&lt;/h3&gt;

&lt;p&gt;A pre-LOI sentiment pull on a Chinese target — employee chatter, customer complaints, founder reputation, glass-door-equivalent venting. Boutique diligence firms bill $25K-$75K for the equivalent exercise. As a banker or consultant, even a "$500 in cost, $30K invoice" framing is a 60x markup the client is happy to pay for.&lt;/p&gt;

&lt;h3&gt;
  
  
  6. Crisis monitoring / hourly brand watch — ~$50/month
&lt;/h3&gt;

&lt;p&gt;Schedule a six-times-a-day run on brand keywords. &lt;code&gt;hot_search&lt;/code&gt; catches a viral crisis the moment it crosses into the public consciousness — typically a 4-12 hour head start on Western media coverage. For a brand worth eight figures, that gap is the difference between "managed" and "case study."&lt;/p&gt;

&lt;h3&gt;
  
  
  7. Macro / consumer-trend reading from the hot list — ~$5/day
&lt;/h3&gt;

&lt;p&gt;The hot search list is the cheapest macro signal in Chinese markets. Tariff reactions, regulatory rumblings, viral consumer products, celebrity scandals that wreck brand deals — all surface here first. Hedge fund quants, geopolitical analysts, and morning-brief writers all bake this in.&lt;/p&gt;

&lt;h2&gt;
  
  
  The number that matters: $0.005 per item
&lt;/h2&gt;

&lt;p&gt;You pay per item returned. No subscription, no surprise overage.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;1,000 mentions: &lt;strong&gt;$5&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;10,000 mentions: &lt;strong&gt;$50&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;100,000 mentions: &lt;strong&gt;$500&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The Apify free plan gives you ~$5/month in platform credit, which covers your first ~1,000 mentions on this actor. You validate the data fits your use case before you spend a cent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Start on the free plan →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;YOUR_APIFY_TOKEN&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Pull 1,000 posts mentioning Tesla. $5 flat.
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/weibo-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;特斯拉&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# Stream the results
&lt;/span&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;createdAt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repostsCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][:&lt;/span&gt;&lt;span class="mi"&gt;120&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Same shape works for &lt;code&gt;hot_search&lt;/code&gt;, &lt;code&gt;post_comments&lt;/code&gt;, &lt;code&gt;user_posts&lt;/code&gt;. Swap the &lt;code&gt;mode&lt;/code&gt; and the input keys to whatever the run takes. The exact input schema lives on the actor page.&lt;/p&gt;

&lt;h2&gt;
  
  
  Where the real money lives: recurring runs
&lt;/h2&gt;

&lt;p&gt;One-shot pulls are fine for a diligence assignment. The customers who actually extract serious value from this are the ones running it on a schedule. Apify Schedules takes a cron expression and a saved input — the actor runs forever, the dataset accumulates, and you download it as JSON, CSV, or Excel.&lt;/p&gt;

&lt;p&gt;The math gets compelling fast. Below is what my heaviest recurring customers actually run:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Use case&lt;/th&gt;
&lt;th&gt;Cron expression&lt;/th&gt;
&lt;th&gt;Approx. monthly cost&lt;/th&gt;
&lt;th&gt;What it replaces&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Morning hot-search dump for the daily brief&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;0 9 * * 1-5&lt;/code&gt; (Asia/Shanghai)&lt;/td&gt;
&lt;td&gt;~$15&lt;/td&gt;
&lt;td&gt;A junior analyst's 30-min daily task&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brand mentions, every two hours&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0 */2 * * *&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~$450&lt;/td&gt;
&lt;td&gt;$30K/yr Brandwatch contract for China only&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Equity tickers, hourly (the highest-ROI cron on this list)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0 * * * *&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$750&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;A $120K/yr China sentiment analyst, half-replicated&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Crisis watch, every 30 minutes&lt;/td&gt;
&lt;td&gt;&lt;code&gt;*/30 * * * *&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~$1,500&lt;/td&gt;
&lt;td&gt;A 24/7 PR monitoring agency contract&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Overnight KOL sweep on 200 handles&lt;/td&gt;
&lt;td&gt;&lt;code&gt;0 2 * * *&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;~$60&lt;/td&gt;
&lt;td&gt;$5K/mo influencer-vetting subscription&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;strong&gt;Set the cron once, walk away, get paid in compounded insight.&lt;/strong&gt; The customers running hourly equity cron jobs have been doing it for months without touching the config — the actor runs, the data lands, the alpha shows up in their dashboards. That's the only mode of use that actually justifies the time you invested learning the schema.&lt;/p&gt;

&lt;p&gt;If you take one thing from this post: &lt;strong&gt;don't run it manually twice — wire the second run into a cron.&lt;/strong&gt; The actor was built for that, the pricing was designed for that, and that's where every customer who renewed went.&lt;/p&gt;

&lt;h2&gt;
  
  
  Reliability and what happens when things break
&lt;/h2&gt;

&lt;p&gt;You pay per item. If the actor returns nothing on a run, you pay nothing. If it returns 327 items, you pay for 327. That alignment is the whole reason I picked per-event pricing instead of a monthly subscription — my incentive to keep the thing working is exactly your incentive that it works.&lt;/p&gt;

&lt;p&gt;I monitor the actor daily. When something upstream changes, I ship a fix within hours, not weeks. The Apify Store rating and issue history on the actor page are public.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I priced it like this
&lt;/h2&gt;

&lt;p&gt;I shipped this eight months ago. By month two it was profitable. The most recent sixteen-day window: fifteen paying customers, 136,400 items returned, $697 revenue, $675 profit, 96.79% margin. The margin isn't there because the work is trivial — it's there because per-event pricing means I only earn when the data is actually delivered.&lt;/p&gt;

&lt;p&gt;If you're evaluating Chinese social tools and the lowest quote you can get is $20K+, run a 1,000-mention probe through this actor first. You'll know inside ten minutes whether the data covers your use case. Worst case, you spend $5. Then wire it into a cron and forget about it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Try it on the free plan →&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Chinese-platform actors I run
&lt;/h2&gt;

&lt;p&gt;Weibo is the macro signal layer for China. These cover the rest of the surface area:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;Xiaohongshu / RED scraper&lt;/a&gt; — lifestyle, beauty, female-skewing audience. The #1 platform for DTC brand launches in China.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;Bilibili scraper&lt;/a&gt; — long-form video, Gen Z, gaming / anime / tech vertical signal.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/douban-scraper" rel="noopener noreferrer"&gt;Douban scraper&lt;/a&gt; — books, films, music, niche communities. The most "honest" review platform in China.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/xueqiu-scraper" rel="noopener noreferrer"&gt;Xueqiu scraper&lt;/a&gt; — retail-trader-heavy financial discussion. Equity-desk supplement to Weibo.&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/chinese-brand-monitor" rel="noopener noreferrer"&gt;Chinese Brand Monitor&lt;/a&gt; — composite brand signal across the platforms above.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All same pricing model. Pay per item. Schedule freely.&lt;/p&gt;

&lt;h2&gt;
  
  
  Compliance posture
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Only &lt;strong&gt;public&lt;/strong&gt; Weibo posts. No private accounts, no DMs, no content behind a login wall.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No login bypass.&lt;/strong&gt; The actor does not log into Weibo on your behalf, and does not need an account to function.&lt;/li&gt;
&lt;li&gt;Optional cookies are &lt;strong&gt;user-supplied&lt;/strong&gt; and only raise your personal rate limit. They are never required for the actor to work.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your use case requires private data, this actor isn't it — and frankly nothing on the Apify Store will be.&lt;/p&gt;

&lt;p&gt;If you actually run something interesting with it, leave a comment or open an issue on the actor page — I read all of them.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;apify.com/zhorex/weibo-scraper&lt;/a&gt;&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>apify</category>
      <category>python</category>
      <category>china</category>
    </item>
    <item>
      <title>Track brand mentions across China's top 5 social platforms in one API call — $0.045 per mention</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Sat, 16 May 2026 10:36:57 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/i-built-5-single-platform-scrapers-the-one-that-sells-fastest-is-the-aggregator-that-wraps-them-2pli</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/i-built-5-single-platform-scrapers-the-one-that-sells-fastest-is-the-aggregator-that-wraps-them-2pli</guid>
      <description>&lt;p&gt;If your brand competes for Chinese consumers and you're not actively monitoring conversations on Weibo, RedNote, Bilibili, Douban, and Xueqiu, you're flying blind in the world's second-largest consumer market.&lt;/p&gt;

&lt;p&gt;The problem is that the "enterprise" way to do this — Synthesio, Brandwatch, Talkwalker — starts at &lt;strong&gt;$50,000 per year&lt;/strong&gt; for Chinese platform coverage, with annual contracts, locked-in seats, and a sales cycle measured in weeks. So most mid-market teams just… don't. They monitor English-language Twitter for their global brand, see a sentiment dip in APAC revenue a quarter later, and have no leading signal explaining why.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://apify.com/zhorex/chinese-brand-monitor" rel="noopener noreferrer"&gt;&lt;strong&gt;Chinese Brand Monitor&lt;/strong&gt;&lt;/a&gt; launches today on the &lt;a href="https://apify.com" rel="noopener noreferrer"&gt;Apify Store&lt;/a&gt; to fix that. One API call, five platforms, normalized output, &lt;strong&gt;$0.045 per mention. No subscription. No annual contract. No minimum spend. Run it once, run it daily, run it hourly — you only pay for the mentions you actually pull.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  The 5 platforms in one call
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;What it captures&lt;/th&gt;
&lt;th&gt;Why it matters for brand monitoring&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Weibo (微博)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Public microblog mentions, KOL posts, hot search trending&lt;/td&gt;
&lt;td&gt;China's Twitter. 580M+ users. Where consumer crises break first and where KOL endorsements reach hundreds of millions in hours.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;RedNote / Xiaohongshu (小红书)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Lifestyle and consumer brand notes, first-person product reviews&lt;/td&gt;
&lt;td&gt;300M+ users. The single highest-trust channel for Chinese consumer purchase decisions in beauty, skincare, fashion, food, travel.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Bilibili (B站)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Video titles, descriptions, and creator mentions&lt;/td&gt;
&lt;td&gt;China's YouTube. 300M+ users. Where Gen Z consumer brand affinity is built and where unboxing / review culture lives.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Douban (豆瓣)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Movie / book / music subject mentions, brand tie-ins, soundtracks, branded titles&lt;/td&gt;
&lt;td&gt;200M+ users. Long-form opinion-rich content — the densest source of detailed consumer attitude data outside Zhihu.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Xueqiu (雪球)&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Stock cashtag and corporate mentions for listed brands&lt;/td&gt;
&lt;td&gt;20M+ users, financial-grade signal. Critical if your brand is publicly listed (NYSE:BABA, NASDAQ:JD, HK:00700, A-share tickers) — finance KOLs move retail sentiment fast.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These five platforms cover the full spectrum: broad public opinion (Weibo), high-trust consumer reviews (RedNote), Gen-Z video sentiment (Bilibili), long-form opinion (Douban), and investor sentiment (Xueqiu). For most consumer brands, monitoring any 3 of these in real time is a leading indicator that beats your CRM dashboards by 2-6 weeks.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why each mention is worth $0.045
&lt;/h2&gt;

&lt;p&gt;It's tempting to look at $0.045 and compare it to "free" Twitter mentions. That's the wrong comparison. The right comparison is: &lt;strong&gt;what would it cost you to get one Chinese consumer mention, normalized, sentiment-tagged, and cross-platform-deduplicated, any other way?&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Synthesio / Brandwatch / Talkwalker enterprise seat&lt;/strong&gt;: $50K+/year minimum. Cost per mention at typical 100K mentions/year volume: $0.50. &lt;strong&gt;11× the cost of this Actor&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Hire a Chinese-speaking VA to manually check 5 platforms&lt;/strong&gt;: ~$15/hour, ~30 mentions/hour effective. Cost per mention: $0.50. Same 11× cost, plus 24-48 hour latency.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Build it yourself&lt;/strong&gt;: 5 separate scrapers, 5 different output schemas to parse, dedup logic, sentiment classifier, ongoing maintenance every time a platform changes their frontend. Conservatively 60-100 engineering hours upfront and 10-20 hours/month ongoing. At $150/hr loaded engineer cost, that's $9K-$15K to build + $1.5K-$3K/month to maintain. &lt;strong&gt;Break-even vs this Actor: never, unless you're pulling &amp;gt;100K mentions/month.&lt;/strong&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Every mention you pull at $0.045 buys you: the raw text, the author identity (handle, follower count, verified flag), the engagement metrics (likes, comments, shares), the timestamp, the URL, media URLs, language detection, lexicon-based sentiment scoring, and — if dedup is enabled — a &lt;code&gt;crossPlatformReposts&lt;/code&gt; array showing exactly which other platforms amplified the same content. That's a record your competitive intelligence analyst would gladly take and immediately put into a deck.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The more you run it, the more leverage you get from each run.&lt;/strong&gt; A single weekly snapshot tells you nothing about velocity. A daily run shows you trends. An hourly run during a crisis or product launch shows you the inflection point in real time — which is when one mention is worth $4.50 to your PR team, not $0.045.&lt;/p&gt;

&lt;h2&gt;
  
  
  8 concrete use cases (run it like this)
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Daily brand health dashboard
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;: Cron daily at 8am local time, single brand keyword, 7-day lookback, sentiment + dedup enabled. Push the canonical mentions to your BI tool (Looker / Metabase / Hex / Sigma) for a stacked-by-platform sentiment chart, follower-weighted reach total, and top-10 highest-engagement mention list. Run for 30 days, you have a baseline. Run for 90 days, you have a leading-indicator dashboard your CMO will check daily.&lt;br&gt;
&lt;strong&gt;Volume&lt;/strong&gt;: 100 mentions/day × 30 days = 3,000/mo. &lt;strong&gt;Cost: ~$135/mo per brand.&lt;/strong&gt;&lt;br&gt;
&lt;strong&gt;Compare to&lt;/strong&gt;: $4,000/mo for a Synthesio seat covering the same platforms.&lt;/p&gt;
&lt;h3&gt;
  
  
  2. Crisis monitoring (hourly polling with Slack alerts)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;: Cron every hour, 1-day lookback, sentiment-enabled, filter for &lt;code&gt;sentiment.polarity == "negative"&lt;/code&gt; AND &lt;code&gt;authorFollowerCount &amp;gt; 10000&lt;/code&gt;. Pipe matching records to a Slack webhook that pings #pr-alerts. The moment a verified KOL posts a negative mention, your PR team knows within 60 minutes — versus 24-72 hours via Google Alerts or "someone forwarded it to me."&lt;br&gt;
&lt;strong&gt;Volume&lt;/strong&gt;: Most hours return 0-5 mentions. ~200 mentions/day amortized = 6,000/mo. &lt;strong&gt;Cost: ~$270/mo per brand.&lt;/strong&gt; Single prevented PR crisis pays for the entire year.&lt;/p&gt;
&lt;h3&gt;
  
  
  3. Pre-launch competitor intelligence (one-off pull)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;: Before launching a new SKU in China, pull 30 days of mentions on each competitor brand keyword across all 5 platforms. Look at: which platforms each competitor over-indexes on, which KOLs are talking about them, what sentiment dominates, what product attributes get the most positive vs negative mentions. Run this once a quarter on 5 competitors and you have the best competitive intel deck in the room.&lt;br&gt;
&lt;strong&gt;Volume&lt;/strong&gt;: 5 competitors × 500 mentions each = 2,500 mentions one-time. &lt;strong&gt;Cost: ~$112 one-time per quarter.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  4. KOL identification and vetting
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;: Weekly run on your category keyword (e.g. "护肤" for skincare, "球鞋" for sneakers). Pull 500 mentions/week. Filter the output for &lt;code&gt;authorVerified == true&lt;/code&gt; AND &lt;code&gt;authorFollowerCount &amp;gt; 50000&lt;/code&gt;. Sort by &lt;code&gt;engagementMetrics.likes&lt;/code&gt; descending. Top 20 results = your candidate KOL list for the week, scored by actual cultural reach not by paid impressions. Compare against your influencer agency's recommendations.&lt;br&gt;
&lt;strong&gt;Volume&lt;/strong&gt;: 500 mentions/week × 4 weeks = 2,000 mentions/mo. &lt;strong&gt;Cost: ~$90/mo per category.&lt;/strong&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  5. China-watcher hedge fund alt-data signal
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;: Daily run on each portfolio ticker that has Chinese consumer exposure (BABA, JD, PDD, BIDU, NIO, BYD, ANTA, Yum China, POP MART, etc.). Pull mentions from Xueqiu (financial sentiment) + Weibo (consumer sentiment) + RedNote (brand affinity for consumer brands). Build a sentiment-velocity feature: 7-day mention count delta + sentiment polarity shift. Backtest against earnings surprises and brand event days — Chinese consumer sentiment leads Western analyst consensus by 2-6 weeks for most consumer-facing names.&lt;br&gt;
&lt;strong&gt;Volume&lt;/strong&gt;: 20 tickers × 50 mentions/day = 1,000/day × 22 trading days = 22,000/mo. &lt;strong&gt;Cost: ~$990/mo.&lt;/strong&gt; Compare to: a single Bloomberg China consumer alt-data feed subscription, $80K-$200K/year minimum.&lt;/p&gt;
&lt;h3&gt;
  
  
  6. AI / LLM training data corpus
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;: One-time bulk pull on a diverse set of brand keywords across consumer categories. 50 brands × 1,000 mentions each = 50K labeled Chinese-language consumer text records with explicit sentiment labels. Drop into your SFT or RLHF pipeline for Chinese-language consumer-domain fine-tuning. This is the densest source of brand-grounded labeled Chinese text outside of paid academic corpora.&lt;br&gt;
&lt;strong&gt;Volume&lt;/strong&gt;: 50,000 mentions one-time. &lt;strong&gt;Cost: ~$2,250 one-time.&lt;/strong&gt; Compare to: licensing a comparable academic corpus from Trinity College Dublin or Tsinghua, $15K-$50K per corpus, single-use license, 6-month delivery.&lt;/p&gt;
&lt;h3&gt;
  
  
  7. Cross-platform virality discovery (run it weekly, look at the dedup array)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;: Weekly run, dedup ENABLED, scan the output for canonical records where &lt;code&gt;crossPlatformReposts.length &amp;gt;= 2&lt;/code&gt;. Those are mentions that spread across multiple platforms within 24 hours — the closest thing to a "viral" signal you can extract from raw mention data. Use it to identify breakout moments before they hit mainstream Chinese media.&lt;br&gt;
&lt;strong&gt;Volume&lt;/strong&gt;: 500 mentions/week × 4 weeks = 2,000/mo. &lt;strong&gt;Cost: ~$90/mo per brand.&lt;/strong&gt; Most viral moments cost $50K-$200K in PR services to capitalize on; this is how you find them 48-72 hours earlier than the agency.&lt;/p&gt;
&lt;h3&gt;
  
  
  8. Multi-brand portfolio monitoring (agency workflow)
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Workflow&lt;/strong&gt;: One Actor run per client brand, scheduled daily via Apify Schedules. 10 clients × 500 mentions/brand/day = 5,000 mentions/day. Each client gets their own dataset and dashboard. The agency bills $2K-$5K/client/month for "China monitoring," delivers a custom dashboard, and the underlying data cost is ~$675/client/month — leaving healthy 70%+ gross margin per client.&lt;br&gt;
&lt;strong&gt;Volume&lt;/strong&gt;: 5,000 mentions/day × 30 days = 150,000/mo. &lt;strong&gt;Cost: ~$6,750/mo for 10 brands.&lt;/strong&gt; Revenue at $3K/client × 10 = $30K/mo. &lt;strong&gt;Gross margin: 77.5%.&lt;/strong&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  What the output looks like
&lt;/h2&gt;

&lt;p&gt;Every mention is normalized to the same schema regardless of platform. Here's a real Weibo mention from a test run on the Chinese sportswear brand 李宁 (Li-Ning):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"mentionId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"weibo_4923475823745"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"weibo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"brandKeyword"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"李宁"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"brandMatchType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"exact"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"李宁的新款跑鞋质量真不错，比之前的耐克舒服多了！"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"language"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"zh-CN"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"authorName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"运动达人"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"authorFollowerCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"authorVerified"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"publishedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-15T14:32:11+00:00"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"engagementMetrics"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"likes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"comments"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;18&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"shares"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;42&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://weibo.com/1234567890/4923475823745"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sentiment"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"polarity"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"positive"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"score"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;0.72&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"crossPlatformReposts"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"platform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rednote"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://www.xiaohongshu.com/explore/abc123"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Three fields buyers tell me they care about most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;sentiment&lt;/code&gt;&lt;/strong&gt; — lexicon-based Chinese sentiment scoring on every mention. Polarity (positive / neutral / negative) plus a numeric score. Disable it if you have your own pipeline; enabled by default.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;crossPlatformReposts&lt;/code&gt;&lt;/strong&gt; — the same viral post often appears across Weibo and RedNote within hours. The aggregator detects this with SimHash similarity and merges duplicates into the canonical record, with the repost paths preserved. &lt;strong&gt;You don't pay twice for the same mention&lt;/strong&gt;, and you get a free virality signal.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;authorFollowerCount&lt;/code&gt; / &lt;code&gt;authorVerified&lt;/code&gt;&lt;/strong&gt; — the difference between a 200-follower throwaway account and a 1.2M-follower verified KOL is the difference between "ignore this" and "alert the C-suite." Follower-weighting your dashboard is the first thing every serious buyer does with the data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Quick start
&lt;/h2&gt;

&lt;p&gt;The input is brutally simple. This is a complete config:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"brandKeyword"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"李宁"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"platforms"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"weibo"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bilibili"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"rednote"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"douban"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"xueqiu"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"maxMentionsPerPlatform"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;200&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"lookbackDays"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sentimentAnalysis"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"deduplication"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;One brand keyword, one call, get back a normalized stream of mentions across all five platforms. The &lt;code&gt;lookbackDays&lt;/code&gt; filter applies per platform so you only get fresh content; &lt;code&gt;deduplication&lt;/code&gt; collapses cross-platform reposts; &lt;code&gt;sentimentAnalysis&lt;/code&gt; tags every record.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;To run it on a schedule&lt;/strong&gt; (which is where the real value compounds): use &lt;a href="https://docs.apify.com/platform/schedules" rel="noopener noreferrer"&gt;Apify Schedules&lt;/a&gt; and set a cron expression. &lt;code&gt;0 8 * * *&lt;/code&gt; for daily 8am runs, &lt;code&gt;0 * * * *&lt;/code&gt; for hourly, &lt;code&gt;*/15 * * * *&lt;/code&gt; for every 15 minutes during a launch or crisis. Each run hits the same Actor with your saved input config, pushes to the same dataset, and bills only on the new canonical mentions. &lt;strong&gt;The Actor is built to be run thousands of times — that's how you go from "snapshot" to "monitoring system."&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you have cookies for any of the platforms (logged-in browser session) you can pass them under &lt;code&gt;cookieStrings&lt;/code&gt; to unlock higher recall and rate limits. Cookies are optional — the actor degrades gracefully without them.&lt;/p&gt;

&lt;p&gt;For deeper single-platform scraping (full comment trees, infinite scroll, profile enrichment), use the dedicated single-platform actors directly:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Weibo Scraper&lt;/a&gt; — posts, hot search, comments, profiles&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;RedNote (Xiaohongshu) Scraper&lt;/a&gt; — notes, comments, profiles, video&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/douban-scraper" rel="noopener noreferrer"&gt;Douban Scraper&lt;/a&gt; — long-form reviews and group discussions&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/xueqiu-scraper" rel="noopener noreferrer"&gt;Xueqiu Scraper&lt;/a&gt; — ticker-tagged posts, KOL tracking&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/jd-scraper" rel="noopener noreferrer"&gt;JD.com Scraper&lt;/a&gt; — product detail extraction&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The aggregator is for &lt;strong&gt;recurring&lt;/strong&gt; cross-platform brand monitoring with normalized output. The single-platform scrapers are for &lt;strong&gt;one-off&lt;/strong&gt; deep extraction inside one platform.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is NOT
&lt;/h2&gt;

&lt;p&gt;Being honest about scope is more useful than vague promises:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Not real-time streaming.&lt;/strong&gt; Poll-based — 5-15 minute effective refresh is realistic. If you need millisecond latency, this isn't it.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not WeChat coverage.&lt;/strong&gt; WeChat has no public scraping interface; trying is a fast way to get accounts banned.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not Douyin (TikTok China).&lt;/strong&gt; Out of scope for v0.1 — under evaluation for the roadmap.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Not a Synthesio replacement at the largest enterprise scale.&lt;/strong&gt; Synthesio also covers TV, podcasts, news, and provides a managed-service layer. This Actor is the data layer; bring your own BI / dashboard / alerting stack. Most teams who pick this over Synthesio are choosing it because they already own their BI stack and just need the raw normalized feed at a sane price.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Compliance posture
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Only public mentions — no private accounts, no DMs, no follower lists.&lt;/li&gt;
&lt;li&gt;No login bypass; cookies are user-supplied for higher rate limits only, and they're stored as a secret in the Apify input schema (encrypted at rest).&lt;/li&gt;
&lt;li&gt;Reviewer / commenter nicknames are partially redacted by the source platforms; this Actor passes through what the platforms display. No additional PII enrichment.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Try it free, then scale it
&lt;/h2&gt;

&lt;p&gt;The Apify free plan includes monthly platform credit that covers a meaningful first batch of mentions — enough to validate the data quality on your own brand keyword before any commitment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://apify.com/zhorex/chinese-brand-monitor" rel="noopener noreferrer"&gt;Try Chinese Brand Monitor on Apify →&lt;/a&gt;&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The flow most teams follow:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Week 1&lt;/strong&gt;: Run it once manually on your main brand keyword. Verify the output quality on a brand you know well — every mention should be one you'd recognize. (~$2-5 spend.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 2&lt;/strong&gt;: Wire it into a daily Apify Schedule. Stream the dataset to your BI tool. Watch one week of trend data. (~$25-50 spend.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Week 3&lt;/strong&gt;: Add a second brand keyword (competitor, partner, or category term). Add a Slack webhook for negative-sentiment alerts above a follower threshold. (~$50-100 spend.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Month 2 onward&lt;/strong&gt;: Production. Daily monitoring on your core brand portfolio, hourly during launches and crises, monthly competitive intel pulls. (Typical mid-market team: $200-1,500/mo.)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Pricing is transparent at every step: &lt;strong&gt;$0.045 per canonical mention&lt;/strong&gt;, billed only on the deduplicated records. No subscription, no minimum spend, no annual contract. Pause it for a month, scale it up 10x next week, switch brand keywords mid-run — it all just works.&lt;/p&gt;

&lt;p&gt;The teams getting the most out of this are running it on a schedule, daily or hourly, across multiple brand keywords, piping the normalized output into their existing BI / Slack / dashboard stack. Each run pays for itself in the first time it surfaces a mention you would have missed.&lt;/p&gt;

&lt;p&gt;Open an issue on the &lt;a href="https://apify.com/zhorex/chinese-brand-monitor/issues" rel="noopener noreferrer"&gt;Actor page&lt;/a&gt; if you hit any edge case. Typical turnaround on fixes is 48 hours.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>apify</category>
      <category>china</category>
      <category>marketing</category>
    </item>
    <item>
      <title>JD.com's isJdSelfRun Flag Is the Best Gray-Market Detection Signal in Chinese E-Commerce (Python Scraper Inside)</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Fri, 15 May 2026 22:08:41 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/jdcoms-isjdselfrun-flag-is-the-best-gray-market-detection-signal-in-chinese-e-commerce-python-3ib3</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/jdcoms-isjdselfrun-flag-is-the-best-gray-market-detection-signal-in-chinese-e-commerce-python-3ib3</guid>
      <description>&lt;p&gt;If your brand sells on JD.com (China's #2 e-commerce platform, ~600M annual active users) — or competes against one that does — there's a gray-market problem you can't see without one specific field in JD's data.&lt;/p&gt;

&lt;p&gt;That field is &lt;code&gt;isJdSelfRun&lt;/code&gt;. It tells you whether a given product listing is fulfilled by JD itself (their warehouses, their warranty, their return logistics) or by a third-party merchant on JD's marketplace. Combined with the seller's &lt;code&gt;sellerType&lt;/code&gt; (flagship / franchise / specialty / self-run), it's the single cleanest signal for detecting unauthorized resellers on Chinese e-commerce — and almost no generic scraper surfaces it.&lt;/p&gt;

&lt;p&gt;This post walks through:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Why JD's hybrid retail model creates the gray-market detection opportunity&lt;/li&gt;
&lt;li&gt;The exact field signatures (&lt;code&gt;isJdSelfRun&lt;/code&gt;, &lt;code&gt;sellerType&lt;/code&gt;) and what they mean&lt;/li&gt;
&lt;li&gt;Three concrete workflows: brand authorization audit, competitive pricing, gray-market detection&lt;/li&gt;
&lt;li&gt;A 50-line Python integration with the Apify Actor I built around this&lt;/li&gt;
&lt;li&gt;Honest cost math at indie scale and at hedge-fund scale&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you don't want to read the whole thing: the Actor is at &lt;strong&gt;&lt;a href="https://apify.com/zhorex/jd-scraper" rel="noopener noreferrer"&gt;zhorex/jd-scraper&lt;/a&gt;&lt;/strong&gt;, and pricing is &lt;strong&gt;$0.008 per product detail + $0.02 per seller store record&lt;/strong&gt; (pay-per-event, no subscription).&lt;/p&gt;

&lt;h2&gt;
  
  
  The hybrid retail model that creates the signal
&lt;/h2&gt;

&lt;p&gt;JD.com is structurally different from Tmall and Pinduoduo. Tmall is a marketplace — every SKU is sold by a third-party merchant; Alibaba just runs the platform. JD operates a hybrid: a meaningful chunk of its catalog is sold and shipped by JD itself (JD Logistics, JD Plus warranty, JD's own returns), with the rest fulfilled by marketplace merchants.&lt;/p&gt;

&lt;p&gt;That hybrid creates an information asymmetry buyers can exploit. When a consumer searches a brand's SKU on JD, they see all listings — but the &lt;strong&gt;trust signal&lt;/strong&gt; comes from whether it's JD-self-run or a third-party. For a brand monitoring team, the question becomes: of the third-party listings of &lt;em&gt;my&lt;/em&gt; SKU, which are authorized resellers and which are gray-market?&lt;/p&gt;

&lt;p&gt;The data answers it in two fields:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"productId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"100009082476"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sellerName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Apple产品京东自营旗舰店"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"isJdSelfRun"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sellerId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1000003566"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;isJdSelfRun: true&lt;/code&gt; means JD is the seller. The other listings — those with &lt;code&gt;isJdSelfRun: false&lt;/code&gt; — are where the gray-market questions live, and where you need the seller's type to decide.&lt;/p&gt;

&lt;h2&gt;
  
  
  The seller type enum
&lt;/h2&gt;

&lt;p&gt;A separate scrape against the seller store endpoint resolves to one of four values:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sellerId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1000003566"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"sellerType"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"flagship_store"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"serviceScore"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;4.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"logisticsScore"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;4.8&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"descriptionAccuracyScore"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;4.9&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="err"&gt;...&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;flagship_store&lt;/code&gt;&lt;/strong&gt; (官方旗舰店) — the brand's own JD store. There should be exactly one per brand. If you see multiple, you have a counterfeit-or-impersonator problem.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;franchise_store&lt;/code&gt;&lt;/strong&gt; (品牌专营店) — authorized franchise of the brand. Brands typically maintain a list of these.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;specialty_store&lt;/code&gt;&lt;/strong&gt; (专卖店) — third-party that specializes in selling the brand. Often authorized via distribution agreement; sometimes gray-market.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;jd_self_run&lt;/code&gt;&lt;/strong&gt; (京东自营) — JD's direct retail. Always legitimate.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The canonical gray-market signature: a &lt;code&gt;flagship_store&lt;/code&gt; listing alongside three &lt;code&gt;specialty_store&lt;/code&gt; listings priced 20-40% lower on the same SKU. Those specialty stores are usually moving inventory acquired outside the authorized channel (parallel imports, diverted product, refurbished-as-new). They're flagged the moment your monitoring sees the price gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three workflows the data unlocks
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Workflow 1 — Brand authorization audit
&lt;/h3&gt;

&lt;p&gt;Submit your SKU IDs. Get back a record per listing with &lt;code&gt;sellerType&lt;/code&gt; resolved. Filter to entries where &lt;code&gt;isJdSelfRun: false&lt;/code&gt; AND &lt;code&gt;sellerType&lt;/code&gt; is not in your authorized list. That's your unauthorized reseller list, refreshed on whatever cadence you want.&lt;/p&gt;

&lt;p&gt;A small brand watching 50 SKUs across 200 listings (4 average sellers per SKU) costs about $2 per refresh: 200 seller records × $0.02 + 50 product details × $0.008 = $4.40 ($2.20 if you skip product detail and only check sellers).&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflow 2 — Competitive pricing intelligence
&lt;/h3&gt;

&lt;p&gt;The product detail mode returns a &lt;code&gt;realtimePrice&lt;/code&gt; field that is fetched fresh at scrape time, not parsed from cached HTML. JD runs flash discounts that move prices within hours; cached scrapers miss them entirely.&lt;/p&gt;

&lt;p&gt;Tracking 200 competitor SKUs hourly = 200 × 24 × 30 = 144,000 detail records per month, $1,152 in raw event cost. At hedge-fund-grade refresh rates this is real money, but it's the right order of magnitude for the buyer cohort that already pays $3K-15K/month for alt-data feeds.&lt;/p&gt;

&lt;p&gt;Tracking 200 SKUs &lt;em&gt;daily&lt;/em&gt; (more realistic for a brand team) = 6,000 records × $0.008 = $48/month. Cheap enough to run as a cron.&lt;/p&gt;

&lt;h3&gt;
  
  
  Workflow 3 — Gray-market detection at scale
&lt;/h3&gt;

&lt;p&gt;The canonical pattern in code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;listings&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;allListings&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;isJdSelfRun&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="n"&gt;cheap_specialty&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;listings&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sellerType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;specialty_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;price&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;msrp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mf"&gt;0.80&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cheap_specialty&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt; &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;l&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sellerType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;flagship_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;l&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;listings&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;alert&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cheap_specialty&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the brand-monitoring signal: a real flagship store coexisting with three or more sub-MSRP specialty stores on the same SKU. Brand teams pay agencies five-figure annual contracts to surface exactly this kind of alert; running it yourself on this data feed costs cents per check.&lt;/p&gt;

&lt;h2&gt;
  
  
  The 50-line Python integration
&lt;/h2&gt;

&lt;p&gt;Here's the working integration end-to-end. Replace &lt;code&gt;YOUR_TOKEN&lt;/code&gt; with your Apify API token:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Scrape product details for your SKU list
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/jd-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_detail&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;productUrls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://item.jd.com/100009082476.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://item.jd.com/100012345678.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;unauthorized_sellers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;isJdSelfRun&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;unauthorized_sellers&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sellerId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Non-self-run listing: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;productTitle&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Seller: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sellerName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (id &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sellerId&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
            &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  Price: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;realtimePrice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Now drill into those sellers to classify them
&lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;unauthorized_sellers&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;seller_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/jd-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;seller_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sellerUrls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://mall.jd.com/index-&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;sid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.html&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;sid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;unauthorized_sellers&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;seller&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seller_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
        &lt;span class="n"&gt;flag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;⚠️ AUDIT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;seller&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sellerType&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;specialty_store&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;✅&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;flag&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;seller&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sellerName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; → &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;seller&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;sellerType&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (service: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;seller&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;serviceScore&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole audit. Two API calls, classified output, ready to feed into Slack alerts / spreadsheet exports / BI dashboards.&lt;/p&gt;

&lt;h2&gt;
  
  
  Honest pricing — what does this cost in production?
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Workflow&lt;/th&gt;
&lt;th&gt;Volume / month&lt;/th&gt;
&lt;th&gt;Cost / month&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Brand watchlist — 50 SKUs daily&lt;/td&gt;
&lt;td&gt;1,500 product details&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$12&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Brand authorization audit&lt;/td&gt;
&lt;td&gt;500 sellers, monthly&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$10&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Competitive pricing — 200 SKUs daily&lt;/td&gt;
&lt;td&gt;6,000 product details&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$48&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Competitive pricing — 200 SKUs hourly&lt;/td&gt;
&lt;td&gt;144,000 product details&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$1,152&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Gray-market sweep — 200 SKUs + 50 sellers&lt;/td&gt;
&lt;td&gt;200 details + 50 sellers&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;~$2.60&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Indie brand teams typically run the daily/monthly workflows ($10-60/month). Hedge-fund alt-data and agency-scale customers run hourly or 15-minute refreshes (low four figures monthly). Both work on the same Actor with the same event-priced billing.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this Actor doesn't do
&lt;/h2&gt;

&lt;p&gt;Two honesty disclosures:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;No search discovery.&lt;/strong&gt; You bring the SKU list. Discovery requires a different scraping pattern that doesn't survive shared residential proxy pools the way product detail and seller store do.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No review scraping.&lt;/strong&gt; Same reason — JD's WAF gates the review API at the IP-reputation level on shared pools. If you need review sentiment, the Apify Store has other scrapers, or contact me for a premium-proxy integration.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The README on the Actor page documents this in a "Known limitations" section. If your workflow needs either, this Actor isn't the right tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why I built this
&lt;/h2&gt;

&lt;p&gt;I run a portfolio of six Chinese-platform scrapers on Apify Store (&lt;a href="https://apify.com/zhorex" rel="noopener noreferrer"&gt;zhorex&lt;/a&gt;). Five of them cover sentiment and content: Weibo for trending, RedNote (Xiaohongshu) for lifestyle, Bilibili for video, Douban for long-form reviews, Xueqiu for stock-cashtag discussion. The JD scraper extends the suite into commerce — the missing layer for buyers who already use the social ones for brand monitoring.&lt;/p&gt;

&lt;p&gt;The six together are a stack. A consumer-electronics brand can track sentiment on Weibo, video reviews on Bilibili, lifestyle unboxings on RedNote, &lt;em&gt;and&lt;/em&gt; gray-market resellers on JD — all on the same vendor, same billing, same API surface.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;The Actor is live: &lt;strong&gt;&lt;a href="https://apify.com/zhorex/jd-scraper" rel="noopener noreferrer"&gt;zhorex/jd-scraper&lt;/a&gt;&lt;/strong&gt;. Pay-per-event billing — no subscription, no setup fee. Run a small evaluation batch (the Apify Free plan includes monthly platform credit you can apply to the run) to confirm output quality on your SKU list before scaling up.&lt;/p&gt;

&lt;p&gt;The rest of the Chinese Digital Intelligence Suite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Weibo Scraper&lt;/a&gt;&lt;/strong&gt; — pair with JD to catch when a SKU trends socially before stock-outs hit&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://apify.com/zhorex/rednote-scraper" rel="noopener noreferrer"&gt;RedNote Scraper&lt;/a&gt;&lt;/strong&gt; — Chinese lifestyle unboxings; useful for fashion, beauty, baby, home brands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;Bilibili Scraper&lt;/a&gt;&lt;/strong&gt; — video reviews; especially valuable for tech and consumer electronics SKUs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://apify.com/zhorex/xueqiu-scraper" rel="noopener noreferrer"&gt;Xueqiu Scraper&lt;/a&gt;&lt;/strong&gt; — Chinese retail-investor sentiment; pair if you trade JD stock (NASDAQ:JD) alongside operational metrics&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;a href="https://apify.com/zhorex/douban-scraper" rel="noopener noreferrer"&gt;Douban Scraper&lt;/a&gt;&lt;/strong&gt; — long-form film / book / music reviews; less relevant for commerce but useful for IP / entertainment teams&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you ship a brand-monitoring workflow on top of any of these, drop a comment with what you're tracking. If this saved you the time of building an integration from scratch, a heart on the post or a follow keeps these writeups coming.&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>china</category>
      <category>ecommerce</category>
    </item>
    <item>
      <title>Scraping Chinese Social Platforms for LLM Training Data: A Practical Multi-Source Pipeline (Python, 2026)</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Tue, 12 May 2026 20:02:05 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/scraping-chinese-social-platforms-for-llm-training-data-a-practical-multi-source-pipeline-python-584</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/scraping-chinese-social-platforms-for-llm-training-data-a-practical-multi-source-pipeline-python-584</guid>
      <description>&lt;p&gt;If you're training Chinese-language models — or multilingual models that need real Chinese coverage, not just translated English — the data problem is the bottleneck. Common Crawl gives you the open web. HuggingFace gives you the curated stuff. But the linguistic patterns that matter most for cultural alignment — slang, memes, code-mixed English-Chinese, regional variations, real-time discourse — those live in places Common Crawl barely touches.&lt;/p&gt;

&lt;p&gt;Three platforms that matter most for Chinese training corpora in 2026:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Weibo&lt;/strong&gt; (微博) — 580M+ MAU, microblogging, real-time discourse, similar role to X/Twitter&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Bilibili&lt;/strong&gt; (哔哩哔哩) — 300M+ MAU, video platform, comments + danmaku give you code-mixed natural language at volume&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Xiaohongshu / RedNote&lt;/strong&gt; (小红书) — 300M+ MAU, lifestyle posts with longer-form content, female-skewed register&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This post walks through how to build a multi-source pipeline that pulls clean structured data from all three, normalize across platforms, and ship it into your training datasets. With code, schema, and economics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;A note on legal posture&lt;/strong&gt;: this entire pipeline accesses only &lt;strong&gt;publicly visible data&lt;/strong&gt; — no auth bypass, no captcha solving, no scraping behind login. That matches the standard most AI training teams operate under in 2026, post-NYT-vs-OpenAI. Always consult your legal team for your specific use case and jurisdiction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why these three (and not, say, Douyin or Zhihu)
&lt;/h2&gt;

&lt;p&gt;Each platform contributes a different linguistic register:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Weibo posts&lt;/strong&gt; are short, high-frequency, conversational. Best for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Everyday Mandarin patterns&lt;/li&gt;
&lt;li&gt;Trending slang and memes (热搜 reflects what's actually viral &lt;em&gt;right now&lt;/em&gt;)&lt;/li&gt;
&lt;li&gt;Public sentiment on news and policy&lt;/li&gt;
&lt;li&gt;Brand-mention contexts&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Bilibili comments and danmaku&lt;/strong&gt; are unique:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Heavy code-mixing English ↔ Chinese (gaming, tech, anime communities)&lt;/li&gt;
&lt;li&gt;Real-time chat-style language&lt;/li&gt;
&lt;li&gt;Subculture vocabulary (gaming, fandom, two-dimensional culture / 二次元)&lt;/li&gt;
&lt;li&gt;Longer thread discussions on long-form videos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;RedNote posts&lt;/strong&gt; lean longer and more curated:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Beauty / lifestyle / travel / food vocabulary&lt;/li&gt;
&lt;li&gt;Product-attribute language (skincare ingredients, fashion descriptors)&lt;/li&gt;
&lt;li&gt;Female-skewed register and topics&lt;/li&gt;
&lt;li&gt;Aspirational / descriptive framing&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Douyin (Chinese TikTok) and Kuaishou are dominantly video — text data is sparse. Zhihu (Q&amp;amp;A) is great for long-form but dominated by single-author voice. The triad above gives you the best balance of volume, diversity, and accessibility.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pipeline architecture
&lt;/h2&gt;

&lt;p&gt;The cleanest architecture for an AI training data pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[Weibo Scraper]    →
[Bilibili Scraper] →  [Normalize]  →  [Dedup + Filter]  →  [JSONL]
[RedNote Scraper]  →
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Each scraper outputs platform-native JSON. A normalization layer flattens to a common schema. Deduplication on text hash + filtering by min-length / language detection ships clean data into your training format.&lt;/p&gt;

&lt;p&gt;Below: I use Apify-hosted scrapers for the extraction layer (they handle anti-bot, rate limiting, and schema stability so you don't have to). The normalization + dedup is your code — straight Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 1 — Pulling from Weibo
&lt;/h2&gt;

&lt;p&gt;For training data, the high-value combination is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hot search topics&lt;/strong&gt; (real-time trending — what people are talking about right now)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Posts under those topics&lt;/strong&gt; (organic conversation about real issues)
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;collect_weibo_corpus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;target_topics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;posts_per_topic&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 1a. Pull current trending topics
&lt;/span&gt;    &lt;span class="n"&gt;topics_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/weibo-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hot_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;target_topics&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;topics&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;topics_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="c1"&gt;# 1b. For each topic, pull underlying posts
&lt;/span&gt;    &lt;span class="n"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;topics&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;posts_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/weibo-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;posts_per_topic&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;posts_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weibo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;attitudesCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                               &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;commentsCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt;
                               &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;repostsCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;post_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postUrl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraped_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scrapedAt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Volume math&lt;/strong&gt;: 50 topics × 100 posts = 5,000 items per snapshot. At $0.005/item that's $25 per pull. Run daily for a year ≈ $9,125.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 2 — Pulling from Bilibili
&lt;/h2&gt;

&lt;p&gt;Bilibili gives you something the others don't: &lt;strong&gt;comments on long-form videos&lt;/strong&gt;. That's where heavy code-mixing happens (tech tutorials, gaming streams, study-with-me content, drama analysis). For training data, comments are higher-value than video metadata.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;collect_bilibili_comments&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;knowledge&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                               &lt;span class="n"&gt;videos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                               &lt;span class="n"&gt;comments_per&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Get popular videos in the category
&lt;/span&gt;    &lt;span class="n"&gt;popular_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/bilibili-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;popular&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;videos&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;popular_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="n"&gt;bvids&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bvid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;items&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bvid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;

    &lt;span class="c1"&gt;# Pull comments on each
&lt;/span&gt;    &lt;span class="n"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;bvid&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;bvids&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;comments_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/bilibili-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_comments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;videoUrls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.bilibili.com/video/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;bvid&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxComments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;comments_per&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sortComments&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;comments_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;comment&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="k"&gt;continue&lt;/span&gt;
            &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bilibili&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;category&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;authorName&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;likeCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_bvid&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bvid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraped_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scrapedAt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note: Bilibili throttles comment depth on cloud IPs — top ~3 per video without residential proxies. For training-data scale you don't need every comment, just enough diversity, so the top-N approach is fine and cheaper.&lt;/p&gt;

&lt;p&gt;Categories worth pulling for diverse coverage: &lt;code&gt;knowledge&lt;/code&gt;, &lt;code&gt;tech&lt;/code&gt;, &lt;code&gt;game&lt;/code&gt;, &lt;code&gt;life&lt;/code&gt;, &lt;code&gt;food&lt;/code&gt;, &lt;code&gt;fashion&lt;/code&gt;, &lt;code&gt;cars&lt;/code&gt;, &lt;code&gt;entertainment&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 3 — Pulling from RedNote
&lt;/h2&gt;

&lt;p&gt;RedNote gives you longer, more curated content — good for training models on aspirational and descriptive Chinese. The seed-query approach lets you control topical distribution, important for avoiding bias toward whatever's trending the day you scrape.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;collect_rednote_corpus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seed_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;posts_per_query&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;corpus&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seed_queries&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/rednote-xiaohongshu-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;posts_per_query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;})&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rednote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="p"&gt;{}).&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nickname&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;likes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;post_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;postUrl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraped_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;post&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scrapedAt&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;

&lt;span class="c1"&gt;# Diverse seed queries spread coverage across topics
&lt;/span&gt;&lt;span class="n"&gt;seeds&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;护肤心得&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# skincare experience
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;穿搭&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;          &lt;span class="c1"&gt;# outfits
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;美食推荐&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# food recommendations
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;旅行攻略&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# travel guides
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;健身打卡&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# fitness check-in
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;读书笔记&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# reading notes
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;育儿日记&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# parenting diary
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;职场感悟&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;      &lt;span class="c1"&gt;# work reflections
&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;rednote_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;collect_rednote_corpus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;seeds&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;posts_per_query&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For richer body content per post (beyond title), pivot to &lt;code&gt;mode: post_details&lt;/code&gt; with the post URLs you want to deep-dive on.&lt;/p&gt;

&lt;h2&gt;
  
  
  Step 4 — Normalization and dedup
&lt;/h2&gt;

&lt;p&gt;All three scrapers produce platform-specific schemas; the per-step code above already brings them to a common shape:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;platform&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;weibo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bilibili&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rednote&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;topic&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;author&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;scraped_at&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ISO8601&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Enough to ship into a JSONL training format. For higher quality, layer in filtering:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;filter_corpus&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;min_chars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;max_chars&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5000&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;out&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;corpus&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;text&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;min_chars&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="n"&gt;max_chars&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hashlib&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;md5&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;hexdigest&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;h&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;
        &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;out&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;out&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For pretraining-grade quality, also add fastText / &lt;code&gt;langdetect&lt;/code&gt; to filter non-Chinese content, and a profanity / PII pass appropriate to your training context.&lt;/p&gt;

&lt;h2&gt;
  
  
  Economics at training-corpus scale
&lt;/h2&gt;

&lt;p&gt;A reasonable Chinese-language pretraining contribution might be 10M items across platforms:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Platform&lt;/th&gt;
&lt;th&gt;Items&lt;/th&gt;
&lt;th&gt;Cost @ $0.005&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Weibo&lt;/td&gt;
&lt;td&gt;5M&lt;/td&gt;
&lt;td&gt;$25,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Bilibili&lt;/td&gt;
&lt;td&gt;3M&lt;/td&gt;
&lt;td&gt;$15,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;RedNote&lt;/td&gt;
&lt;td&gt;2M&lt;/td&gt;
&lt;td&gt;$10,000&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Total&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;10M items&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;$50,000&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Apify free tier ($5/month credit) covers ~1,000 items per actor for prototyping.&lt;/p&gt;

&lt;p&gt;For comparison, hiring 2 senior engineers to build and maintain DIY Chinese-platform extraction for 6 months: $150K-300K — and you don't even get the data, just the tooling.&lt;/p&gt;

&lt;p&gt;For 100M+ items (real pretraining scale), volume pricing or a custom enterprise contract makes sense. See enterprise section below.&lt;/p&gt;

&lt;h2&gt;
  
  
  When to build vs buy
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Build it yourself if&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're scraping 100M+ items per month and have a dedicated team&lt;/li&gt;
&lt;li&gt;You need real-time streaming below 1-second latency (this pipeline is batch)&lt;/li&gt;
&lt;li&gt;Your legal team requires you to own the entire data path&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Use the hosted scrapers if&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're under 50M items per month per platform&lt;/li&gt;
&lt;li&gt;You want time-to-data measured in hours, not months&lt;/li&gt;
&lt;li&gt;You don't want to maintain three platform-specific scrapers as APIs evolve&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The actors
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;&lt;strong&gt;Weibo Scraper&lt;/strong&gt;&lt;/a&gt; — &lt;code&gt;hot_search&lt;/code&gt;, &lt;code&gt;search&lt;/code&gt;, &lt;code&gt;post_comments&lt;/code&gt;, &lt;code&gt;user_posts&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;&lt;strong&gt;Bilibili Scraper&lt;/strong&gt;&lt;/a&gt; — &lt;code&gt;search&lt;/code&gt;, &lt;code&gt;popular&lt;/code&gt;, &lt;code&gt;video_detail&lt;/code&gt;, &lt;code&gt;video_comments&lt;/code&gt;, &lt;code&gt;user_videos&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;&lt;strong&gt;RedNote (Xiaohongshu) Scraper&lt;/strong&gt;&lt;/a&gt; — six modes covering posts, profiles, comments, video&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three at $0.005/result. Pure HTTP — no browser, no proxy required for moderate volumes.&lt;/p&gt;

&lt;h2&gt;
  
  
  Enterprise / training-scale
&lt;/h2&gt;

&lt;p&gt;If you're building actual training corpora (not prototyping), DM me on any actor page or open an Issue with subject &lt;strong&gt;"Training data inquiry"&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Custom output schemas matched to your training pipeline (Parquet / Arrow / your dialect of JSONL)&lt;/li&gt;
&lt;li&gt;Volume pricing above 1M items/month per platform&lt;/li&gt;
&lt;li&gt;Dedicated proxy infrastructure for sustained throughput&lt;/li&gt;
&lt;li&gt;Schema stability SLA so your training runs don't break mid-epoch&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Issues typically get a response within 48 hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is this legal?&lt;/strong&gt; Each Actor accesses only publicly visible data — no auth, no captcha bypass, no login walls. The same data any anonymous browser user can see. Standard ToS-compliant scraping posture as of 2026. Consult your legal team for jurisdiction-specific guidance.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about rate limits?&lt;/strong&gt; The hosted Actors handle rate-limit responses with exponential backoff. For 1M+ items/day per platform, talk to me about dedicated infrastructure.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I get historical data?&lt;/strong&gt; The Actors return what's currently public. For longitudinal datasets, schedule them via Apify Schedules at the cadence you need (hourly / daily / weekly) and version-control your dataset snapshots.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Do you offer streaming / real-time?&lt;/strong&gt; Not currently. The Actors are pull-based. If you need streaming, that's a custom integration.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Other platforms?&lt;/strong&gt; I also maintain a &lt;a href="https://apify.com/zhorex/rednote-shop-scraper" rel="noopener noreferrer"&gt;RedNote Shop Scraper&lt;/a&gt; for Xiaohongshu e-commerce listings — useful if your model needs to reason about products, pricing, or commerce vocabulary.&lt;/p&gt;




&lt;h2&gt;
  
  
  Other relevant work
&lt;/h2&gt;

&lt;p&gt;If you're building Chinese intelligence at scale, the full suite:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;RedNote Scraper&lt;/a&gt; — lifestyle social&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-shop-scraper" rel="noopener noreferrer"&gt;RedNote Shop Scraper&lt;/a&gt; — Xiaohongshu e-commerce (product metadata, pricing, vendor info)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Weibo Scraper&lt;/a&gt; — microblogging, hot search, sentiment&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;Bilibili Scraper&lt;/a&gt; — video creator analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If this saved you a quarter of dev time, a 30-second review on any of the Actor pages helps a lot. ⭐&lt;/p&gt;

&lt;p&gt;Found a bug or have a feature request? Open an Issue — I usually ship fixes within 48 hours.&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>china</category>
      <category>ai</category>
    </item>
    <item>
      <title>Influencer Vetting at Scale on Xiaohongshu (RedNote): A Practical Python Guide for Brand Teams 2026</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Mon, 11 May 2026 16:14:10 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/influencer-vetting-at-scale-on-xiaohongshu-rednote-a-practical-python-guide-for-brand-teams-2026-1946</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/influencer-vetting-at-scale-on-xiaohongshu-rednote-a-practical-python-guide-for-brand-teams-2026-1946</guid>
      <description>&lt;p&gt;RedNote — known internationally as Xiaohongshu (小红书) or Little Red Book — has become the single most consequential platform for influencer marketing in China. With 300M+ monthly active users skewed female and Gen Z, it's where beauty, fashion, lifestyle, and travel brands first place a campaign before going wider. After the TikTok-uncertainty migrations of 2024–2025, RedNote also became the de facto Western fallback for many creators.&lt;/p&gt;

&lt;p&gt;If you're a brand team, agency, or media buyer working in or with China, you need a way to vet RedNote influencers at scale. Manual scrolling doesn't cut it past five creators. Here's the structured approach.&lt;/p&gt;

&lt;h2&gt;
  
  
  What "vetting at scale" actually means
&lt;/h2&gt;

&lt;p&gt;For a single influencer partnership, you typically want answers to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Reach&lt;/strong&gt;: how many followers, but more importantly how many people actually see their content (median impressions / followers)?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engagement quality&lt;/strong&gt;: average likes/comments/saves per post — and the &lt;em&gt;distribution&lt;/em&gt;. A creator with 100K followers and 50 posts averaging 500 likes is very different from one with 100K and a few viral 10K posts.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Niche fit&lt;/strong&gt;: do their tags and topics align with your category?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audience hints&lt;/strong&gt;: from bio, location, profile signals — who follows them?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Authenticity&lt;/strong&gt;: posting cadence, sponsored-content ratio, content reuse from other platforms.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data lives in two places on RedNote: &lt;strong&gt;profile metadata&lt;/strong&gt; (followers, bio, location, verified status, total likes received) and &lt;strong&gt;recent posts&lt;/strong&gt; (titles, like counts, content type, publish dates).&lt;/p&gt;

&lt;h2&gt;
  
  
  What you can extract per creator
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"userId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"5d7439b40000000001009f54"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"nickname"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BeautyBlogger123"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"avatar"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://sns-avatar-qc.xhscdn.com/..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"redId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"100123456"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"description"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Skincare reviews, K-beauty translations. Seoul-based."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"followers"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;184500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"following"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;320&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"totalLikes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1240000&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"gender"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"location"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Seoul, South Korea"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"isVerified"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"tags"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"Beauty Blogger"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"K-Beauty"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Skincare"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Per recent post:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"postId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"64be395b0000000010030b56"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"type"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"video"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Morning skincare routine for dry skin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"likes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;15234&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"scrapedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-05-06T12:00:00Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Combined, you can compute:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Median likes / post&lt;/strong&gt; (use median, not mean — viral outliers skew means)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engagement rate&lt;/strong&gt; = median likes / followers (RedNote benchmark for healthy: 2–5%, viral creators 8%+)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post frequency&lt;/strong&gt; (posts per week — burnout warning if dropping)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Content-type ratio&lt;/strong&gt; (video vs image — videos get higher reach in 2026)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Niche concentration&lt;/strong&gt; (% of recent posts matching your category keywords)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Practical Python: building a vetting batch
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;vet_creator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="c1"&gt;# Profile data
&lt;/span&gt;    &lt;span class="n"&gt;profile_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/rednote-xiaohongshu-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;profile&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userUrl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;profile_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;profile&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;profile_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="c1"&gt;# Recent posts
&lt;/span&gt;    &lt;span class="n"&gt;posts_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/rednote-xiaohongshu-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_posts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userUrl&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;profile_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;posts_run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

    &lt;span class="n"&gt;likes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;likes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;likes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
    &lt;span class="n"&gt;median_likes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;likes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;likes&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;
    &lt;span class="n"&gt;er&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;median_likes&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;followers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;followers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nickname&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;nickname&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;followers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;followers&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;verified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;isVerified&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;profile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;tags&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;post_count_recent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;median_likes&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;median_likes&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement_rate_pct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;round&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;er&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video_ratio&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nf"&gt;sum&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;posts&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;posts&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Vet a list
&lt;/span&gt;&lt;span class="n"&gt;candidates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.xiaohongshu.com/user/profile/USER_ID_1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.xiaohongshu.com/user/profile/USER_ID_2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;results&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nf"&gt;vet_creator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;u&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;u&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;candidates&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sort&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;engagement_rate_pct&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;reverse&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;results&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;nickname&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="mi"&gt;25&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  followers=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;followers&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  ER=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;engagement_rate_pct&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That gives you a sortable spreadsheet of candidates ranked by genuine engagement, not vanity follower counts.&lt;/p&gt;

&lt;h2&gt;
  
  
  Red flags to look for
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Engagement rate &amp;lt; 0.5%&lt;/strong&gt; with &amp;gt; 100K followers → likely bought or stale&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Only image posts when category is video-heavy&lt;/strong&gt; → low reach in 2026's algorithm&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Posts in the last 30 days &amp;lt; 4&lt;/strong&gt; → low retainer reliability&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No tags or generic tags&lt;/strong&gt; → low discoverability inside RedNote search&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Median likes within 10x of mean&lt;/strong&gt; → relatively consistent (good); 50x+ means single-viral-driven (risky)&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When to skip the DIY approach
&lt;/h2&gt;

&lt;p&gt;RedNote's public surface requires sustained anti-bot engineering to extract reliably at scale. The schema also evolves regularly, so a scraper that worked last month can quietly start returning empty arrays this month without raising any error you'd notice in your pipeline.&lt;/p&gt;

&lt;p&gt;That's why I maintain the &lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;&lt;strong&gt;RedNote Scraper on Apify&lt;/strong&gt;&lt;/a&gt; — six modes (search, user posts, profiles, post details, comments, video) with consistent output schemas across them. The infrastructure work (session handling, rate limiting, schema stability) is already done.&lt;/p&gt;

&lt;p&gt;Pricing is pay-per-event: &lt;strong&gt;$0.005 per result&lt;/strong&gt;. A typical influencer batch (50 candidates, profile + 50 posts each) costs about &lt;strong&gt;$13&lt;/strong&gt;. The Apify free tier ($5 monthly) covers ~1,000 items.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is the data from open profiles only?&lt;/strong&gt; Yes — public-facing profile and post data. Private/locked accounts are not accessible. Same content any anonymous browser user can see.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does it work on &lt;code&gt;xhslink.com&lt;/code&gt; short links?&lt;/strong&gt; Yes, those resolve automatically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I get post-level comments?&lt;/strong&gt; Use &lt;code&gt;mode: comments&lt;/code&gt; on individual post URLs.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about RedNote vs Xiaohongshu vs Little Red Book?&lt;/strong&gt; All the same platform — the Actor handles all three name conventions and URL formats.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is scraping Xiaohongshu legal?&lt;/strong&gt; This Actor accesses publicly visible content only. No authentication is bypassed. Always consult your local laws.&lt;/p&gt;




&lt;h2&gt;
  
  
  Full Chinese intelligence stack
&lt;/h2&gt;

&lt;p&gt;Brand teams running campaigns across Chinese platforms typically pair this with:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;RedNote Scraper&lt;/a&gt; — &lt;em&gt;(this one, social side)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-shop-scraper" rel="noopener noreferrer"&gt;RedNote Shop Scraper&lt;/a&gt; — Xiaohongshu e-commerce (products, vendors, prices)&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Weibo Scraper&lt;/a&gt; — microblogging, brand mentions, hot search&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;Bilibili Scraper&lt;/a&gt; — video creator analytics&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Vetting more than 100 creators per month?&lt;/strong&gt; I offer custom output schemas, dedicated proxy pools, SLA support, and volume discounts. DM me on Apify or open an Issue titled "Enterprise inquiry".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bug?&lt;/strong&gt; Issues are typically fixed within 48 hours.&lt;/p&gt;

&lt;p&gt;If this saved you time, a 30-second review on the Apify Store helps a lot. ⭐&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>china</category>
      <category>marketing</category>
    </item>
    <item>
      <title>Weibo's Hot Search Is the Best Real-Time Feed of Chinese Public Sentiment in 2026</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Fri, 08 May 2026 17:33:19 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/weibos-hot-search-is-the-best-real-time-feed-of-chinese-public-sentiment-in-2026-2cep</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/weibos-hot-search-is-the-best-real-time-feed-of-chinese-public-sentiment-in-2026-2cep</guid>
      <description>&lt;p&gt;Weibo's "hot search" (热搜) is the closest thing China has to a real-time barometer of public attention. It updates every few minutes, ranks topics by an opaque heat score, and is where every news cycle, celebrity scandal, and viral product launch lands first. For brands, agencies, and researchers covering China, this feed is gold — and unlike most of Weibo, it's accessible without a single cookie.&lt;/p&gt;

&lt;p&gt;This post is for anyone building a brand-monitoring, sentiment-tracking, or trend-discovery pipeline aimed at China.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why hot search matters
&lt;/h2&gt;

&lt;p&gt;Weibo (微博) is China's microblogging giant — 580M+ monthly active users. The hot search ranking is curated by Weibo's own engagement signals: a topic earns a spot when search volume, post creation, and engagement spike together within a short window.&lt;/p&gt;

&lt;p&gt;That makes hot search a &lt;strong&gt;leading indicator&lt;/strong&gt; for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;PR crises&lt;/strong&gt;: a brand mention reaches the top 50 within minutes of a viral video&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Product launches&lt;/strong&gt;: launches by Apple, Tesla, Xiaomi, etc. typically hit the top 20 within an hour&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cultural shifts&lt;/strong&gt;: holiday spikes, generational slang, viral memes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Geopolitics&lt;/strong&gt;: state-affiliated topics surface predictably; their ranking velocity tells a story&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're tracking China for any of these use cases, polling hot search every 5–15 minutes gives you sub-news-cycle response time.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you actually get
&lt;/h2&gt;

&lt;p&gt;Each hot search row exposes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;rank&lt;/strong&gt; (1–50)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;title&lt;/strong&gt; (the search term itself, in Chinese)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;hotValue&lt;/strong&gt; — an integer that approximates topical heat&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;category&lt;/strong&gt; (科技 = tech, 娱乐 = entertainment, 时尚 = fashion, etc.)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;labelName&lt;/strong&gt; — content-moderation labels: &lt;code&gt;热&lt;/code&gt; (hot), &lt;code&gt;新&lt;/code&gt; (new), &lt;code&gt;沸&lt;/code&gt; (boiling), &lt;code&gt;爆&lt;/code&gt; (exploding)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;isHot&lt;/strong&gt; flag&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;url&lt;/strong&gt; to the search results page on weibo.com&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Sample row:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"rank"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"人工智能最新突破"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"科技"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"hotValue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2847562&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"labelName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"热"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"isHot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://s.weibo.com/weibo?q=%23..."&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  A minimal Python pipeline
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;time&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;snapshot_hot_search&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/weibo-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hot_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;})&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

&lt;span class="c1"&gt;# Poll every 10 minutes and dedupe by title
&lt;/span&gt;&lt;span class="n"&gt;seen&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;snap&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;snapshot_hot_search&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;ts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;utcnow&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;isoformat&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;snap&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;title&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;seen&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
            &lt;span class="n"&gt;seen&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rank&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;first_seen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;ts&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;] rank=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rank&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;  heat=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;row&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hotValue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;time&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;sleep&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;600&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A small loop and you've built a brand-mention monitor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common patterns I see customers run
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Brand watch.&lt;/strong&gt; Match new hot-search titles against a list of brand keywords. Trigger alerts when a brand name enters top 50.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Velocity tracking.&lt;/strong&gt; Compute the rank-change velocity per topic. Topics that jump from rank 40 → 5 in under 30 minutes are early-warning signals for going viral.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Category drift.&lt;/strong&gt; Track which categories dominate hot search hour-by-hour. Useful for media planning and ad targeting timing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. Cross-platform correlation.&lt;/strong&gt; Pair Weibo hot search with Bilibili trending and RedNote search to detect cross-platform memes early. The platforms are surprisingly correlated 1–6 hours apart.&lt;/p&gt;

&lt;h2&gt;
  
  
  Going deeper: posts and comments
&lt;/h2&gt;

&lt;p&gt;Hot search gives you topics. To go deeper into actual conversation, pivot from a hot title to its underlying posts:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# After identifying a hot topic, search posts about it
&lt;/span&gt;&lt;span class="n"&gt;posts_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/weibo-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;人工智能最新突破&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That returns post-level data: text, author, like/repost/comment counts, embedded images, and post URLs. Pair with &lt;code&gt;mode: post_comments&lt;/code&gt; to harvest reactions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a hosted scraper, not raw scraping
&lt;/h2&gt;

&lt;p&gt;Weibo's public web endpoints work without login for most read paths, but they require a visitor session token (Sina Visitor System) and exponential backoff on throttling responses. A naive &lt;code&gt;requests&lt;/code&gt; script will either get throttled within 100 calls or pull empty arrays without realizing.&lt;/p&gt;

&lt;p&gt;The &lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;&lt;strong&gt;Weibo Scraper on Apify&lt;/strong&gt;&lt;/a&gt; handles session bootstrap, throttling, retries, and consistent schema across modes (&lt;code&gt;hot_search&lt;/code&gt;, &lt;code&gt;post_comments&lt;/code&gt;, &lt;code&gt;search&lt;/code&gt;, &lt;code&gt;user_posts&lt;/code&gt;). Pure HTTP — no browser, no proxy required.&lt;/p&gt;

&lt;p&gt;Pricing is pay-per-event: &lt;strong&gt;$0.005 per item&lt;/strong&gt;. 1,000 items = $5. The free Apify tier covers 1,000 items/month.&lt;/p&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Is hot search censored?&lt;/strong&gt; Some topics are rate-limited or removed by Weibo's moderation. The labelName field hints at moderation state. You'll see topics appear and disappear.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I get historical hot search?&lt;/strong&gt; Not via Weibo directly — they don't expose archives. You build your own archive by snapshotting at intervals.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What about session tokens?&lt;/strong&gt; They expire periodically. Hosted scrapers refresh them automatically; if you DIY, plan for re-auth.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is scraping Weibo legal?&lt;/strong&gt; This accesses publicly visible data. No authentication is bypassed. Always check your local laws and Weibo's ToS.&lt;/p&gt;




&lt;h2&gt;
  
  
  Building a Chinese intelligence stack?
&lt;/h2&gt;

&lt;p&gt;I maintain the full suite for production pipelines:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Weibo Scraper&lt;/a&gt; — &lt;em&gt;(this one)&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;Bilibili Scraper&lt;/a&gt; — China's YouTube, 300M MAU&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;RedNote (Xiaohongshu) Scraper&lt;/a&gt; — lifestyle social&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://apify.com/zhorex/rednote-shop-scraper" rel="noopener noreferrer"&gt;RedNote Shop Scraper&lt;/a&gt; — Xiaohongshu e-commerce&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Running 50K+ items per month?&lt;/strong&gt; I offer custom output schemas, dedicated proxy pools, SLA, and volume pricing. DM me on Apify or open an Issue titled "Enterprise inquiry".&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Found a bug?&lt;/strong&gt; Open an Issue and I usually ship fixes within 48 hours.&lt;/p&gt;

&lt;p&gt;A 30-second review on the Apify Store helps other users find this tool. ⭐&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>china</category>
      <category>marketing</category>
    </item>
    <item>
      <title>Building a Xiaohongshu (RedNote) E-commerce Scraper for RedShop Product Data</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Wed, 06 May 2026 01:56:00 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/building-a-xiaohongshu-rednote-e-commerce-scraper-for-redshop-product-data-2g7d</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/building-a-xiaohongshu-rednote-e-commerce-scraper-for-redshop-product-data-2g7d</guid>
      <description>&lt;p&gt;When Xiaohongshu (RedNote / Little Red Book / 小红书) launched RedShop — its US-facing e-commerce platform — in April 2026, I noticed every existing scraper on Apify only covered the social side: posts, profiles, comments, videos. None of them touched product listings, vendor catalogs, or pricing data.&lt;/p&gt;

&lt;p&gt;So I built one.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why a dedicated shop scraper?
&lt;/h2&gt;

&lt;p&gt;Xiaohongshu is unusual among Chinese platforms because product listings live in a separate URL space from social posts. The all-in-one social scrapers handle the &lt;code&gt;/explore/&lt;/code&gt; posts surface. RedShop products live behind &lt;code&gt;/goods-detail/&lt;/code&gt; with completely different structure.&lt;/p&gt;

&lt;p&gt;Trying to extract product data from a "social" scraper means hacky workarounds. A dedicated commerce-focused tool gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Structured product fields (price, sold count, SKU variants, vendor metadata)&lt;/li&gt;
&lt;li&gt;Native support for vendor/store browsing&lt;/li&gt;
&lt;li&gt;Cross-border vs domestic flagging&lt;/li&gt;
&lt;li&gt;Cleaner pricing model: charge per product, not per "result"&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What it extracts
&lt;/h2&gt;

&lt;p&gt;For each product:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;itemId, title, productUrl&lt;/li&gt;
&lt;li&gt;salePrice, originalPrice, discountPct, currency (CNY for domestic, USD for cross-border)&lt;/li&gt;
&lt;li&gt;soldCount, wantCount (popularity signals)&lt;/li&gt;
&lt;li&gt;cover, images&lt;/li&gt;
&lt;li&gt;vendor (sellerId, name, rating)&lt;/li&gt;
&lt;li&gt;category path&lt;/li&gt;
&lt;li&gt;skus (variants with prices and stock)&lt;/li&gt;
&lt;li&gt;crossBorder flag and shippingOrigin&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Three modes
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Mode&lt;/th&gt;
&lt;th&gt;What it does&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;product_search&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Search products by keyword, sort by price/sales, filter by price range&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;vendor_products&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Full catalog from a specific seller&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;product_detail&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Deep dive on specific product URLs (full SKU breakdown)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  Real-world use cases
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;DTC brands&lt;/strong&gt;: monitor your own listings and competitor pricing on China's #1 social commerce platform&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Dropshippers and resellers&lt;/strong&gt;: discover trending Chinese products before they hit Amazon or Etsy&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cross-border arbitrage&lt;/strong&gt;: identify SKUs popular in China that haven't reached Western markets yet&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Investment analysts&lt;/strong&gt;: track e-commerce activity for Chinese consumer brands&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Sourcing agents&lt;/strong&gt;: scout Chinese products at scale for clients in cosmetics, fashion, or home goods&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Combined with the &lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;RedNote All-in-One Scraper&lt;/a&gt; (social side), you can map products to the influencers tagging them — extremely valuable for influencer-product correlation studies.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to use it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/rednote-shop-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;product_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skincare&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sortBy&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sales&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;minPrice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxPrice&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; — ¥&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;salePrice&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (sold &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;soldCount&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Output sample
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"itemId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"642a1b3c0000000023019f7e"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Skincare Set - Hydrating Toner + Serum + Moisturizer"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"salePrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;199.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"originalPrice"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;299.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"discountPct"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;33.4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"CNY"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"soldCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;12500&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"wantCount"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;5400&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"vendor"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="nl"&gt;"sellerId"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"BeautyBrand Official"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"rating"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;4.8&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Beauty / Skincare / Sets"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"skus"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[{&lt;/span&gt;&lt;span class="nl"&gt;"spec"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Normal Skin"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mf"&gt;199.00&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"stock"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1200&lt;/span&gt;&lt;span class="p"&gt;}],&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"crossBorder"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"shippingOrigin"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"China"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Pricing
&lt;/h2&gt;

&lt;p&gt;Pay-per-event:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;$0.0075&lt;/strong&gt; per product scraped&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;$0.025&lt;/strong&gt; per vendor info record&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical costs:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Search 100 products: ~$0.75&lt;/li&gt;
&lt;li&gt;Full vendor catalog (200 products): ~$1.53&lt;/li&gt;
&lt;li&gt;5 competitor vendors with 100 products each: ~$3.88&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  FAQ
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Does it work for cross-border products?&lt;/strong&gt;&lt;br&gt;
Yes — products are explicitly flagged in the output (&lt;code&gt;crossBorder: true/false&lt;/code&gt;) so you can filter domestic vs international listings.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Can I track price changes over time?&lt;/strong&gt;&lt;br&gt;
Schedule the actor to run daily/weekly via Apify Schedules. The dataset versioning gives you a price history for any product or vendor.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Does it need a proxy?&lt;/strong&gt;&lt;br&gt;
Residential proxies are recommended for reliable results. The default config uses Apify's residential pool.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Is there an official Xiaohongshu shop API?&lt;/strong&gt;&lt;br&gt;
No — Xiaohongshu doesn't offer a commerce API for international developers. This actor is the practical alternative.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://apify.com/zhorex/rednote-shop-scraper" rel="noopener noreferrer"&gt;RedNote Shop Scraper on Apify&lt;/a&gt; — works with Apify's free plan ($5/month credits cover hundreds of products at no cost).&lt;/p&gt;

&lt;p&gt;If you build something useful with it, drop a comment — always interested in seeing how people use commerce data.&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>python</category>
      <category>ecommerce</category>
      <category>china</category>
    </item>
    <item>
      <title>Google Ads can spend up to 2x your daily budget. I built a Chrome extension that catches it before it happens.</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Thu, 30 Apr 2026 15:17:28 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/google-ads-can-spend-up-to-2x-your-daily-budget-i-built-a-chrome-extension-that-catches-it-before-j0</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/google-ads-can-spend-up-to-2x-your-daily-budget-i-built-a-chrome-extension-that-catches-it-before-j0</guid>
      <description>&lt;p&gt;If you've ever opened Google Ads and noticed your campaign spent way more than the daily budget you set, you're not imagining it. Google's documentation explicitly says they may spend up to &lt;strong&gt;twice your daily budget&lt;/strong&gt; on any given day, evening it out across the month. That's not a bug — it's how their pacing engine has always worked.&lt;/p&gt;

&lt;p&gt;What changed in March 2026: Google now aggressively targets &lt;strong&gt;100% of your monthly limit&lt;/strong&gt; — which is 30.4× your daily budget. Even with ad scheduling. So if your campaigns only run 22 days a month (weekdays only, for example), Google can push up to &lt;strong&gt;38% more spend per active day&lt;/strong&gt; than you'd expect from your daily budget setting.&lt;/p&gt;

&lt;p&gt;Most PPC managers don't notice until the damage is done. The Campaigns tab in Google Ads doesn't tell you whether you're on pace or headed for overspend. You'd need a spreadsheet, a calendar, and a calculator open in another window — or a SaaS tool that costs $49 to $749 per month.&lt;/p&gt;

&lt;p&gt;I got tired of the spreadsheet route. So I built a Chrome extension that does it inside Google Ads, in real time, for free up to 3 campaigns. Walking through the build because the technical approach is interesting and the pricing math vs SaaS tools is genuinely lopsided.&lt;/p&gt;

&lt;h2&gt;
  
  
  What budget pacing actually requires
&lt;/h2&gt;

&lt;p&gt;The math is simple. For each campaign:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;expected_spend_today = daily_budget × (days_elapsed_in_month / total_days_in_month)
pacing_ratio = actual_spend_today / expected_spend_today

# pacing_ratio &amp;lt; 1.10 → on pace
# pacing_ratio 1.10–1.20 → slight overspend
# pacing_ratio &amp;gt; 1.20 → overspend risk
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole core logic. SaaS tools wrap this in dashboards, alerts, multi-account aggregation, and reporting. But the underlying calculation is six lines of code.&lt;/p&gt;

&lt;p&gt;The reason SaaS tools charge $49+/month isn't the math — it's the data plumbing. They connect to the Google Ads API (OAuth, refresh tokens, quota management), run server-side jobs to pull your accounts on a schedule, store results in a database, render charts. Real infrastructure cost.&lt;/p&gt;

&lt;p&gt;But here's the thing: &lt;strong&gt;your campaign data is already visible on your Google Ads screen&lt;/strong&gt;. Names, budgets, costs, statuses — the information is sitting in the DOM right there. If you're already looking at Google Ads, why does anyone need to call an API to tell you what you're already looking at?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Chrome extension approach
&lt;/h2&gt;

&lt;p&gt;I built AdPacer as a Manifest V3 Chrome extension that reads the campaign data from the Google Ads page DOM and overlays three pacing indicators directly on the interface you're already using. Architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Content script&lt;/strong&gt; runs on &lt;code&gt;ads.google.com&lt;/code&gt; URLs&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;MutationObserver&lt;/strong&gt; detects when the campaigns table renders or updates (Google Ads is a heavy SPA so this matters)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DOM parsing&lt;/strong&gt; extracts campaign name, daily budget, current spend per row&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pacing math&lt;/strong&gt; runs locally on the extracted values&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;DOM injection&lt;/strong&gt; adds the colored pacing bars and projected-spend badges next to each campaign row&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Notifications API&lt;/strong&gt; for the periodic overspend checks&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Zero API calls. Zero authentication flows. Zero backend. Zero data leaves the user's browser. Everything runs in the page's content-script context.&lt;/p&gt;

&lt;p&gt;The privacy implication is meaningful: AdPacer cannot exfiltrate your Google Ads data even if it wanted to. There's no network request to anywhere. SaaS tools, however privacy-conscious their privacy policies are, send your campaign data to their servers as a fundamental part of how they work.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you actually see in Google Ads after install
&lt;/h2&gt;

&lt;p&gt;Three additions to the standard Campaigns tab:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Pacing bars&lt;/strong&gt; — a color-coded bar next to each campaign:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Green:&lt;/strong&gt; on pace, within 10% of expected spend&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Yellow:&lt;/strong&gt; ahead of pace, 10–20% over expected&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Red:&lt;/strong&gt; overspend risk, 20%+ over expected&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;2. Projected end-of-month spend&lt;/strong&gt; — a badge showing what your monthly spend will be if you continue at the current daily run rate. Updates as the page data updates. No spreadsheet required.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Browser notifications&lt;/strong&gt; — when any campaign crosses your threshold (configurable from 10% to 25%). Checks automatically every 30 minutes. Catch problems early instead of at month-end reconciliation.&lt;/p&gt;

&lt;p&gt;That's it. Install, open Google Ads, see your pacing.&lt;/p&gt;

&lt;h2&gt;
  
  
  Pricing — and why this is structured the way it is
&lt;/h2&gt;

&lt;p&gt;I deliberately wanted to make this accessible to freelancers and small teams, not enterprise-priced.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tier&lt;/th&gt;
&lt;th&gt;Price&lt;/th&gt;
&lt;th&gt;Limit&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Free&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;td&gt;Up to 3 campaigns. All core features. No credit card, no trial expiration.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Pro&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$14/mo&lt;/td&gt;
&lt;td&gt;Unlimited campaigns, custom thresholds, priority support.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Agency&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;$29/mo&lt;/td&gt;
&lt;td&gt;Multi-account support, PDF pacing reports, team sharing.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The free tier covers most freelancers managing 1-3 client accounts at a time, or small e-commerce teams running a couple of brand/generic/shopping campaigns. Pro is for in-house PPC managers running 5-50 campaigns. Agency is for teams managing multiple clients.&lt;/p&gt;

&lt;p&gt;For comparison with the SaaS landscape:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Tool&lt;/th&gt;
&lt;th&gt;Lowest tier&lt;/th&gt;
&lt;th&gt;Notes&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TrueClicks&lt;/td&gt;
&lt;td&gt;$49/mo&lt;/td&gt;
&lt;td&gt;Broader PPC management&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Optmyzr&lt;/td&gt;
&lt;td&gt;$129/mo&lt;/td&gt;
&lt;td&gt;Optimization suite&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;WordStream&lt;/td&gt;
&lt;td&gt;$299/mo+&lt;/td&gt;
&lt;td&gt;Enterprise tier&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;AdPacer&lt;/td&gt;
&lt;td&gt;$0–$14/mo&lt;/td&gt;
&lt;td&gt;Pacing only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;If you need full PPC management — bid optimization, A/B testing, audience suggestions, the whole stack — the SaaS tools are doing a lot more than pacing. But if all you actually need is "tell me when a campaign is going to overspend," paying $49-299/month for that single feature is overkill.&lt;/p&gt;

&lt;h2&gt;
  
  
  Who I built this for
&lt;/h2&gt;

&lt;p&gt;PPC managers running Google Ads daily who want instant budget visibility without context-switching to another tool. Freelancers managing 1-5 client accounts where SaaS pricing eats too much of the margin. Agency teams who need quick pacing checks across multiple campaigns. E-commerce advertisers watching ROAS and budget efficiency in real time.&lt;/p&gt;

&lt;p&gt;If you're an enterprise team running 200+ campaigns with complex bid strategies, this isn't for you — you probably already have an Optmyzr-class tool. If you're somewhere between "spreadsheet" and "expensive SaaS," this fills the gap.&lt;/p&gt;

&lt;h2&gt;
  
  
  What it doesn't do (yet)
&lt;/h2&gt;

&lt;p&gt;Being honest about scope:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;No Microsoft Ads / Bing Ads support&lt;/strong&gt; yet (Google Ads only)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No Meta / TikTok Ads&lt;/strong&gt; (different DOMs, different challenges, would be a separate extension)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No historical pacing trends&lt;/strong&gt; beyond current month&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;No bid suggestions or campaign optimization&lt;/strong&gt; (that's a different problem space)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Pacing is a single, focused use case. The extension does that one thing well rather than trying to be a half-decent everything tool.&lt;/p&gt;

&lt;h2&gt;
  
  
  Install link
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://chromewebstore.google.com/detail/adpacer-%E2%80%94-budget-pacing-f/mfgliiabejphemhkhlnapbebmkfhfjfm" rel="noopener noreferrer"&gt;AdPacer on the Chrome Web Store&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Free tier covers up to 3 campaigns with no credit card and no trial expiration — install and see if it solves your problem before paying anything.&lt;/p&gt;

&lt;p&gt;If you're a PPC manager and the spending pattern Google introduced in March 2026 has been causing you headaches, this is the lowest-friction way to catch overspend before it happens. If you're a developer reading this for the technical approach: yes, the entire thing runs client-side via DOM parsing — no API key, no backend, no data leaves the browser.&lt;/p&gt;

&lt;p&gt;Happy to answer questions about either side.&lt;/p&gt;

</description>
      <category>chrome</category>
      <category>marketing</category>
      <category>productivity</category>
      <category>javascript</category>
    </item>
    <item>
      <title>How to scrape Weibo (微博) data with Python in 2026 — the Sina Visitor System and how to handle it</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Thu, 30 Apr 2026 14:58:19 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/how-to-scrape-weibo-wei-bo-data-with-python-in-2026-the-sina-visitor-system-and-how-to-handle-it-1j6g</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/how-to-scrape-weibo-wei-bo-data-with-python-in-2026-the-sina-visitor-system-and-how-to-handle-it-1j6g</guid>
      <description>&lt;p&gt;Weibo is China's Twitter — the platform where Chinese public opinion forms, brand crises break first, and government statements land. 580M+ monthly active users, mostly mainstream demographics. If you're doing China market intelligence, brand monitoring, or PR analytics, Weibo is one of the platforms you can't skip.&lt;/p&gt;

&lt;p&gt;The challenge: Weibo's developer API requires a Chinese business license, has severe rate limits, and exposes very limited data. For Western teams, web scraping is the practical option. The interesting twist is Weibo's Sina Visitor System — an auth flow that makes anonymous access possible for some endpoints but not others. Understanding which is which matters for what you can actually scrape.&lt;/p&gt;

&lt;p&gt;This article covers the technical landscape (with real Python code) and points to a hosted scraper if you'd rather skip the maintenance.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Weibo serves
&lt;/h2&gt;

&lt;p&gt;A Weibo post is structured similarly to a tweet but with longer character limits and more structured engagement signals:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Post text&lt;/strong&gt; (140 to 2,000 characters depending on user level)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Repost chain&lt;/strong&gt; — Weibo's quote-tweet equivalent, central to virality tracking&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Engagement metrics&lt;/strong&gt; — &lt;code&gt;attitudes_count&lt;/code&gt; (likes), &lt;code&gt;comments_count&lt;/code&gt;, &lt;code&gt;reposts_count&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Hashtags and mentions&lt;/strong&gt;&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Geolocation&lt;/strong&gt; if disclosed by user&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Author profile&lt;/strong&gt; — follower count, verification status, verified reason (e.g., "新浪科技 official Weibo")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Media&lt;/strong&gt; — images, videos&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A Weibo user profile gives you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;User ID (numeric)&lt;/li&gt;
&lt;li&gt;Screen name (display name)&lt;/li&gt;
&lt;li&gt;Description / bio&lt;/li&gt;
&lt;li&gt;Followers / friends counts&lt;/li&gt;
&lt;li&gt;Statuses count (total posts)&lt;/li&gt;
&lt;li&gt;Verification status with reason text — this is gold for identifying official accounts vs personal vs corporate&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For monitoring use cases, the metric that matters most depends on your goal:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Crisis monitoring&lt;/strong&gt;: track &lt;code&gt;comments_count&lt;/code&gt; and repost velocity. A spike in either signals viral attention.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Brand presence&lt;/strong&gt;: track post frequency from verified accounts in your category.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;KOL identification&lt;/strong&gt;: filter by &lt;code&gt;verified=true&lt;/code&gt; + follower count above a threshold.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  The Sina Visitor System
&lt;/h2&gt;

&lt;p&gt;This is the key technical concept for scraping Weibo without a Chinese business license.&lt;/p&gt;

&lt;p&gt;When you visit Weibo without logging in, Sina automatically issues you a "visitor cookie" via what they call the Sina Visitor System (SVS). This cookie lets you access limited public data — specifically:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hot search / trending topics&lt;/strong&gt;: full access&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post comments&lt;/strong&gt;: full access for any public post&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Post viewing&lt;/strong&gt;: limited&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For these endpoints, scraping is straightforward — get a visitor cookie, hit the AJAX endpoint, parse JSON.&lt;/p&gt;

&lt;p&gt;What the visitor cookie does NOT give you:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Search by keyword&lt;/strong&gt; (returns hot timeline as a fallback instead of true search results)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;User posts beyond profile basics&lt;/strong&gt; (you get the profile, not the user's post history)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For those, you need a real logged-in cookie — specifically the &lt;code&gt;SUB&lt;/code&gt; cookie value from a logged-in browser session. We'll get to that.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 1: Build it yourself
&lt;/h2&gt;

&lt;p&gt;The Sina Visitor System flow looks roughly like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;

&lt;span class="c1"&gt;# Step 1: Hit the visitor system to get a tid (temporary ID)
&lt;/span&gt;&lt;span class="n"&gt;visitor_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://passport.weibo.com/visitor/genvisitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;post&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;visitor_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cb&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gen_callback&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fp&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;os&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;browser&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Chrome&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;fonts&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;undefined&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;screenInfo&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1920*1080*24&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;plugins&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;:&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="s"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;# The response is a JSONP-wrapped JSON. Strip the wrapper, parse, extract tid.
&lt;/span&gt;
&lt;span class="c1"&gt;# Step 2: Use tid to get the SUB visitor cookie
&lt;/span&gt;&lt;span class="n"&gt;incarnate_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://passport.weibo.com/visitor/visitor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;incarnate_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;incarnate&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;t&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tid&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;w&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;c&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;100&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;# Response sets cookies. Extract SUB and SUBP from response.cookies.
&lt;/span&gt;
&lt;span class="c1"&gt;# Step 3: Use those cookies to call AJAX endpoints
&lt;/span&gt;&lt;span class="n"&gt;hot_search_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://weibo.com/ajax/side/hotSearch&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;hot_search_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;cookies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sub&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUBP&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;subp&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="c1"&gt;# data["data"]["realtime"] is the hot search list
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the rough shape. In practice you'll handle:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Rate limit responses (HTTP 418, 429) with exponential backoff&lt;/li&gt;
&lt;li&gt;Cookie expiration (visitor cookies last hours, not days)&lt;/li&gt;
&lt;li&gt;AJAX endpoint changes (Weibo periodically reshuffles paths)&lt;/li&gt;
&lt;li&gt;Anti-scraping fingerprint checks (less aggressive than RedNote, but still present)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For the keyword-search and user-posts endpoints, you'll need a real &lt;code&gt;SUB&lt;/code&gt; cookie from a logged-in account:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get SUB from your browser DevTools → Application → Cookies → weibo.com
# Look for the cookie named "SUB"
&lt;/span&gt;&lt;span class="n"&gt;sub_cookie&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUB=_2A25Fxxxxxx...&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;httpx&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://weibo.com/ajax/side/searchAll&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;q&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;人工智能&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
    &lt;span class="n"&gt;cookies&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUB&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;sub_cookie&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cookies typically last several days before expiring, depending on Weibo's session policies.&lt;/p&gt;

&lt;h3&gt;
  
  
  DIY cost breakdown
&lt;/h3&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Cost&lt;/th&gt;
&lt;th&gt;Estimate&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Initial setup (visitor system, hot search, comments)&lt;/td&gt;
&lt;td&gt;4-8 hours&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;User session cookie management&lt;/td&gt;
&lt;td&gt;1-2 hours/week&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Maintenance when Weibo changes endpoints&lt;/td&gt;
&lt;td&gt;2-4 hours, every 2-3 months&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;No proxy needed for most endpoints&lt;/td&gt;
&lt;td&gt;$0&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Weibo is genuinely the easiest of the major Chinese platforms to scrape if you stay within visitor-system endpoints. RedNote and Bilibili both have more complex auth.&lt;/p&gt;

&lt;h2&gt;
  
  
  Approach 2: Use a hosted scraper
&lt;/h2&gt;

&lt;p&gt;If you don't want to maintain visitor-system handling and cookie management, the &lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;&lt;code&gt;zhorex/weibo-scraper&lt;/code&gt;&lt;/a&gt; Apify actor handles it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_API_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Hot search (no cookie needed)
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/weibo-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;hot_search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;topic&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;#&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;rank&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;title&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; (heat: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;topic&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;hotValue&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;)&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output format:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"rank"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"title"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"人工智能最新突破"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"category"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"科技"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"hotValue"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;2847562&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"labelName"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"热"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"isHot"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;true&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"url"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"https://s.weibo.com/weibo?q=..."&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"scrapedAt"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"2026-04-25T12:00:00Z"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For brand monitoring, search mode is what you want — though note the search-vs-cookie tradeoff:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Without cookie: returns hot timeline as fallback
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/weibo-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CeraVe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# With cookie: returns true keyword-matched results
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/weibo-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;search&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;searchQuery&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CeraVe&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;50&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cookieString&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SUB=your_logged_in_cookie&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The hosted actor handles the visitor system, exponential backoff, and rate limit recovery internally. Pricing: $5 per 1,000 results.&lt;/p&gt;

&lt;p&gt;Honest stats on the actor right now: 4 paying users, 11 free-tier users, 92.5% success rate, 3,768 result extractions to date. Average issue response time when something breaks: under a few hours.&lt;/p&gt;

&lt;h2&gt;
  
  
  When DIY vs hosted
&lt;/h2&gt;

&lt;p&gt;DIY makes sense when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You're processing &amp;gt; 1M posts/month (per-result cost adds up)&lt;/li&gt;
&lt;li&gt;You have ops capacity to refresh &lt;code&gt;SUB&lt;/code&gt; cookies regularly&lt;/li&gt;
&lt;li&gt;You need to scrape behind login at scale&lt;/li&gt;
&lt;li&gt;You have specific endpoints not covered by hosted scrapers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Hosted makes sense when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You don't have a dedicated scraper engineer&lt;/li&gt;
&lt;li&gt;Volume is moderate (&amp;lt; 500k posts/month)&lt;/li&gt;
&lt;li&gt;You want the visitor-system handling to be someone else's problem&lt;/li&gt;
&lt;li&gt;You're prototyping and want to validate the use case before committing&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  What you do with the data downstream
&lt;/h2&gt;

&lt;p&gt;Sentiment analysis on Chinese text is the obvious next layer. Off-the-shelf Chinese BERT models work reasonably for Weibo's discourse style — Weibo posts tend to be more formal than RedNote slang, so general Chinese sentiment models accuracy is higher (typical 75-85% on neutral/positive/negative classification).&lt;/p&gt;

&lt;p&gt;For brand crisis detection, the signal you usually want is *&lt;em&gt;velocity&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>webscraping</category>
      <category>china</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Coin-per-view: the Bilibili metric that beats subscriber count for creator vetting</title>
      <dc:creator>Sami</dc:creator>
      <pubDate>Wed, 29 Apr 2026 17:48:32 +0000</pubDate>
      <link>https://dev.to/sami_8858131362756585e4f4/coin-per-view-the-bilibili-metric-that-beats-subscriber-count-for-creator-vetting-477j</link>
      <guid>https://dev.to/sami_8858131362756585e4f4/coin-per-view-the-bilibili-metric-that-beats-subscriber-count-for-creator-vetting-477j</guid>
      <description>&lt;p&gt;If you've ever sponsored a YouTube creator and been disappointed by the ROI, you've already lived through what subscriber count actually measures: not engagement, not influence, not purchase intent. Just historical clicks on a follow button. Many of those followers stopped opening videos two years ago. Some are inactive accounts. Some followed for a single piece of content that has nothing to do with your brand.&lt;/p&gt;

&lt;p&gt;This is universally true on creator platforms, but it's especially true on Bilibili — China's YouTube. With 300M+ monthly active users skewed Gen Z and millennials, Bilibili is where Chinese creator marketing happens. And Bilibili exposes three engagement signals that YouTube doesn't, which together let you cut through the noise of follower counts and identify creators whose audiences actually engage.&lt;/p&gt;

&lt;p&gt;The single most useful one is &lt;strong&gt;coin-per-view ratio&lt;/strong&gt;. This post explains what it is, why it matters, what threshold to use, and how to compute it for any Chinese creator in a few lines of code.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why follower count is a lying signal
&lt;/h2&gt;

&lt;p&gt;Three reasons follower counts mislead in creator marketing:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Followers are a lagging indicator.&lt;/strong&gt; Someone followed a creator in 2023 because they liked one video. That doesn't tell you whether they still watch in 2026, whether they engage, or whether they trust the creator's recommendations enough to buy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Followers are gameable.&lt;/strong&gt; Not everyone games them, but enough creators do that you can't trust raw counts without other signals. Bot followers, follow-for-follow campaigns, paid follower services. China specifically has a robust market for these.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. The follower-to-engagement ratio varies wildly.&lt;/strong&gt; A creator with 100k followers and 1M average views per video has fundamentally different audience economics than another creator with 100k followers and 5k average views per video. Both have the same "follower count" — the engagement quality is the actual signal.&lt;/p&gt;

&lt;p&gt;This is why every serious creator marketing tool talks about "engagement rate" — which on YouTube is usually computed as (likes + comments) / views. It's better than raw follower count, but on Bilibili you can do meaningfully better.&lt;/p&gt;

&lt;h2&gt;
  
  
  The three Bilibili-native metrics
&lt;/h2&gt;

&lt;p&gt;Bilibili was designed by anime fans for anime fans, and the engagement system reflects values around quality and creator support that YouTube's flat "like" button never captured. Three metrics that come back from any Bilibili video scrape:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Danmaku (弹幕)&lt;/strong&gt; — real-time scrolling comments overlaid on the video as users watch. Think livestream chat, but for pre-recorded video. The danmaku count tells you how many people were engaged enough mid-watch to type something. It's a leading indicator of viewing time and attention.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Favorites (收藏)&lt;/strong&gt; — equivalent to "save for later" or YouTube's bookmark. Strong long-term value signal: high favorites relative to views means people return to this video. Tutorials, references, and definitive content score high here.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Coins (投币)&lt;/strong&gt; — Bilibili's tipping system. This is the interesting one. Each user gets a small daily allocation of coins (typically 5 per day for active users), and they can "throw" them at videos they want to support. Because coins are scarce by design — you only have a few to spend, ever — coin counts are a strong genuine-appreciation signal.&lt;/p&gt;

&lt;p&gt;A user gives a coin to a video they love. They give a coin to a creator they want to keep making content. They don't give a coin to a video they passively watched and forgot. The cost is real (relative to the user's daily allocation), so the signal is real.&lt;/p&gt;

&lt;h2&gt;
  
  
  Coin-per-view ratio: the single best signal
&lt;/h2&gt;

&lt;p&gt;If I had to pick one metric to evaluate a Bilibili creator, it would be &lt;strong&gt;median coin-per-view ratio across their last 20-30 videos&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;The math is simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;coin_per_view&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;coin_count&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;view_count&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;  &lt;span class="c1"&gt;# express as percentage
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What I've found from looking at hundreds of Bilibili creators across categories:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Coin/View %&lt;/th&gt;
&lt;th&gt;Audience quality&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&amp;lt; 0.5%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Passive viewers. Casual scrolling traffic, not engaged.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;0.5% – 1%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Average. Normal Bilibili content, decent audience.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1% – 2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Strong. Genuinely engaged audience. Worth sponsoring.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;&amp;gt; 2%&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Exceptional. Users actively spending limited resources on this content.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Above 2% is rare. It typically indicates either: (a) genuinely high-quality educational/tutorial content that people return to, (b) a creator with a deeply loyal niche audience, or (c) content that struck a strong emotional/cultural nerve.&lt;/p&gt;

&lt;p&gt;For creator vetting, my heuristic is: &lt;strong&gt;if median coin-per-view is below 1%, the audience is more passive than the follower count suggests; sponsorship ROI will probably disappoint.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to compute this for any creator
&lt;/h2&gt;

&lt;p&gt;The data you need: a creator's recent videos with their view and coin counts. Bilibili exposes this through their public API — no auth required. You can use the open-source &lt;code&gt;bilibili-api&lt;/code&gt; Python library, or call their &lt;code&gt;/x/space/wbi/arc/search&lt;/code&gt; endpoint directly.&lt;/p&gt;

&lt;p&gt;If you'd rather skip the API integration entirely, I built a hosted scraper on Apify Store: &lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;&lt;code&gt;zhorex/bilibili-scraper&lt;/code&gt;&lt;/a&gt;. $5 per 1,000 results, free tier covers ~1,000 results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;statistics&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;median&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_API_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Get a creator's last 30 videos
# user_id (mid) is the number in their profile URL: space.bilibili.com/{mid}
&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;zhorex/bilibili-scraper&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mode&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_videos&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;userIds&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;546195&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;   &lt;span class="c1"&gt;# 老番茄 (a well-known Bilibili gamer)
&lt;/span&gt;    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;maxResults&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;videos&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;video&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;videos&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Compute coin-per-view ratio per video
&lt;/span&gt;&lt;span class="n"&gt;ratios&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;videos&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;views&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;viewCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;views&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;  &lt;span class="c1"&gt;# skip videos with too few views to be meaningful
&lt;/span&gt;        &lt;span class="k"&gt;continue&lt;/span&gt;
    &lt;span class="n"&gt;coins&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;coinCount&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;ratios&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;coins&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;views&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Videos analyzed: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratios&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Median coin-per-view: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratios&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Best video coin-per-view: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;max&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratios&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;:&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;%&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Categorize
&lt;/span&gt;&lt;span class="n"&gt;median_ratio&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;median&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ratios&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;median_ratio&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EXCEPTIONAL&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;median_ratio&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;1.0&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;STRONG&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;elif&lt;/span&gt; &lt;span class="n"&gt;median_ratio&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AVERAGE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;label&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;PASSIVE&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Audience quality: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;label&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Run this against any Bilibili creator's user ID and you have a concrete answer about audience engagement quality.&lt;/p&gt;

&lt;h2&gt;
  
  
  A workflow for vetting creators at scale
&lt;/h2&gt;

&lt;p&gt;If you're building a creator marketing program for the Chinese market, the workflow that works for the teams I've seen using this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Gather candidates.&lt;/strong&gt; From competitor sponsorship lists, from category trending, or from agency recommendations. Aim for 30-50 candidates per round.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Pull their recent video portfolios.&lt;/strong&gt; Use &lt;code&gt;user_videos&lt;/code&gt; mode to get the last 20-30 videos per creator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Compute aggregate metrics.&lt;/strong&gt; For each creator: median coin-per-view, median favorite-per-view, median danmaku-per-view, view consistency (standard deviation).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Filter on quality threshold.&lt;/strong&gt; Drop anyone with median coin-per-view below 1%. This usually cuts the candidate list by 40-60%.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Manual review of the survivors.&lt;/strong&gt; Watch a sample of their videos. Check for content fit. Evaluate sponsorship history (do their sponsored posts feel native or forced?).&lt;/li&gt;
&lt;li&gt;&lt;strong&gt;Negotiate from the qualified shortlist.&lt;/strong&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Total cost using a hosted scraper: ~$5-10 in scraping for a 50-creator vetting round. Compared to agency rates for the same work ($500-2000 per round), the math is obvious once you do it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Cross-platform creator vetting
&lt;/h2&gt;

&lt;p&gt;Bilibili is not the whole story. If you're vetting creators for a comprehensive China presence:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bilibili&lt;/strong&gt; for video content (gaming, tech, anime, education)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;RedNote (Xiaohongshu)&lt;/strong&gt; for product-discovery content (beauty, fashion, lifestyle, food)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Weibo&lt;/strong&gt; for public discourse and broad reach campaigns&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each platform has different engagement signals. Bilibili has coins; RedNote has saves (similarly scarce intent-to-buy signal); Weibo has reposts and verified-account hierarchy. A creator strong on one isn't necessarily strong on others.&lt;/p&gt;

&lt;p&gt;I maintain scrapers for all three on Apify Store under the &lt;a href="https://apify.com/zhorex" rel="noopener noreferrer"&gt;zhorex profile&lt;/a&gt;, with consistent output schemas across the suite. Same pricing model ($5/1000 results), same Apify infrastructure. If you're doing cross-platform creator analytics, the consistency saves integration time.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this approach fails
&lt;/h2&gt;

&lt;p&gt;Two cases where coin-per-view ratio is a misleading signal:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;1. Brand-new creators with very few videos.&lt;/strong&gt; If a creator has uploaded 3 videos and one went viral with high coins, the ratio looks artificial. Wait until you have 15-20 videos to compute median.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Live-stream-focused creators.&lt;/strong&gt; Bilibili lets creators upload archived live streams. Coin economics are different in livestream context (gifts replace coins). For livestream-heavy creators, you need different analysis.&lt;/p&gt;

&lt;p&gt;For everyone else, coin-per-view ratio is the single best signal I've found for vetting Bilibili creator quality at scale.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this won't tell you
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;Whether the audience is geographically right for your campaign (need follower demographics, which require auth)&lt;/li&gt;
&lt;li&gt;Whether the creator has done sponsorships before that flopped (need to scrape their content for promo patterns)&lt;/li&gt;
&lt;li&gt;Whether their audience overlaps with your target customer profile (need cross-reference with other platforms)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Treat coin-per-view as the engagement-quality filter. Everything else still requires manual review or additional data sources.&lt;/p&gt;




&lt;p&gt;If you're working on creator marketing for the Chinese market and want to compare notes on what works — drop a comment. I write about Chinese platform analytics (Bilibili, RedNote, Weibo) and the build-vs-buy tradeoffs around them.&lt;/p&gt;

&lt;p&gt;Hosted Bilibili scraper: &lt;a href="https://apify.com/zhorex/bilibili-scraper" rel="noopener noreferrer"&gt;apify.com/zhorex/bilibili-scraper&lt;/a&gt;&lt;br&gt;
Other Chinese platform scrapers in the same suite: &lt;a href="https://apify.com/zhorex/rednote-xiaohongshu-scraper" rel="noopener noreferrer"&gt;RedNote&lt;/a&gt; for product-discovery content, &lt;a href="https://apify.com/zhorex/weibo-scraper" rel="noopener noreferrer"&gt;Weibo&lt;/a&gt; for public discourse.&lt;/p&gt;

</description>
      <category>china</category>
      <category>marketing</category>
      <category>datascience</category>
      <category>analytics</category>
    </item>
  </channel>
</rss>
