If you've ever tried to scrape RedNote (Xiaohongshu), you know the pain. Request signing that rotates monthly, TLS fingerprinting that blocks requests immediately, residential proxies required, the whole tour. Weibo isn't quite as bad but you still need the Sina Visitor System dance to even hit a public endpoint.
Bilibili is the outlier. No API key. No browser. No proxy. No request signing rotation worth worrying about. Pure HTTP. Runs in 256MB RAM.
If you only need to monitor one Chinese platform — say you're tracking a gaming brand launch, or doing creator research, or analyzing tech trends — Bilibili is where you should start. This post walks through how to scrape it from scratch in Python, what data you actually get, and when to switch from DIY to a hosted scraper.
Why Bilibili is unusually scrape-friendly
Bilibili (哔哩哔哩) is China's YouTube — 300M+ monthly active users, skewed Gen Z and millennials, dominant in anime, gaming, tech, and educational content. From a scraping perspective, three things make it different from RedNote and Weibo:
1. Internal HTTP APIs are mostly stable. Bilibili exposes JSON endpoints for search, video metadata, user info, popular/trending, and comments. Most don't require auth for public content. The endpoints change rarely (months between meaningful updates) compared to RedNote where signing rotates monthly.
2. No TLS fingerprinting. Plain httpx or requests works. You don't need curl_cffi or any Chrome impersonation library to get past anti-bot.
3. Generous rate limits from non-Chinese IPs. I've sustained 1-2 requests per second from a single datacenter IP without getting throttled. RedNote would have banned the IP within minutes at that rate.
There's one caveat I'll cover at the end: comments scraping is throttled from datacenter IPs. Everything else works fine.
A 30-line Bilibili scraper
Here's the smallest useful Bilibili scraper I'd actually run in production:
import httpx
HEADERS = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36",
"Referer": "https://www.bilibili.com/",
}
def get_video_detail(bvid: str) -> dict:
"""Fetch full metadata for a Bilibili video by BVID."""
url = "https://api.bilibili.com/x/web-interface/view"
response = httpx.get(url, params={"bvid": bvid}, headers=HEADERS, timeout=30)
response.raise_for_status()
return response.json().get("data", {})
def get_popular(category_rid: int = 0, page_size: int = 20) -> list:
"""Fetch trending videos. category_rid=0 means all categories."""
url = "https://api.bilibili.com/x/web-interface/popular"
params = {"ps": page_size}
if category_rid:
params["rid"] = category_rid
response = httpx.get(url, params=params, headers=HEADERS, timeout=30)
response.raise_for_status()
return response.json().get("data", {}).get("list", [])
# Use it
trending = get_popular(page_size=10)
for video in trending:
stat = video.get("stat", {})
print(f"{video.get('title')[:60]}")
print(f" Views: {stat.get('view'):,} | Coins: {stat.get('coin'):,} | "
f"Favorites: {stat.get('favorite'):,} | Danmaku: {stat.get('danmaku'):,}")
That's it. No auth, no proxy, no signing. You can paste this into a Python REPL and have working data in 5 seconds.
The five things you can scrape
Bilibili's web client exposes endpoints for five modes. Each is useful for different use cases:
| Mode | Endpoint | Use case |
|---|---|---|
| Search | /x/web-interface/wbi/search/type |
Find videos by keyword (Chinese or English). Filterable by sort order, duration, date range. |
| Video detail | /x/web-interface/view |
Full metadata for a specific video including all engagement metrics |
| Comments | /x/v2/reply/main |
Comments on a video (caveats below) |
| User videos | /x/space/wbi/arc/search |
All recent uploads from a creator |
| Popular | /x/web-interface/popular |
Trending feed, optionally filtered by category |
The search endpoint requires a small WBI signing scheme — slightly more involved than the others. Open-source library bilibili-api on GitHub handles it if you don't want to reverse-engineer it yourself. For everything else, plain HTTP works.
What data you actually get back
Each video from Bilibili comes with three categories of engagement metrics:
Standard metrics (similar to YouTube):
- View count
- Like count
- Share count
- Reply count (comments)
Bilibili-specific metrics (these don't exist on YouTube):
- Danmaku count (弹幕) — the count of real-time scrolling comments overlaid on the video as users watch
- Coin count (投币) — Bilibili's tipping system. Users get a few coins per day and "throw" them at videos. Coins are scarce by design.
- Favorite count (收藏) — equivalent to "save for later"
These extra metrics are non-trivial. A video with 1M views, 50k coins, 30k favorites is meaningfully different from 1M views, 1k coins, 5k favorites — even with identical view counts. The first has an audience that's actively engaged enough to spend their daily coin allocation; the second has passive viewers.
If you're doing creator analytics or content strategy, the engagement quality signals coins/favorites give you are worth the integration effort.
The comments caveat
There's one Bilibili-side restriction worth knowing: /x/v2/reply/main is throttled when called from datacenter IPs (AWS, GCP, Azure, etc.). You'll get the top ~3 pinned comments per video and then nothing. Full pagination requires either:
- Authenticated session cookies, or
- Residential IPs
If you need full comments at scale, this is the one part where you'll run into infrastructure cost. Other modes are unaffected.
Performance you can expect
Real numbers from running this on Apify's serverless infrastructure (256MB RAM, no proxy, no auth):
| Mode | Input | Duration | Throughput |
|---|---|---|---|
| Search | max=50 | ~7-8 seconds | ~7 items/sec |
| Popular | max=40 | ~5 seconds | ~8 items/sec |
| Video detail | 10 BVIDs | ~5 seconds | ~2 items/sec (with tag enrichment) |
| User videos | 3 users, max=15 | ~4 seconds | ~4 items/sec |
Fast enough for monitoring use cases. For heavy bulk extraction (millions of videos) you'd want to parallelize across multiple workers — easy because there's no IP throttling on most endpoints.
DIY vs hosted
DIY is genuinely viable for Bilibili. The maintenance burden is low — endpoints don't change much, there's no signing rotation eating your time. If you only need Bilibili and you're already comfortable with Python HTTP, build it yourself.
But if you're already monitoring multiple Chinese platforms (RedNote + Weibo + Bilibili is the typical full-suite use case), a hosted scraper that handles all three with consistent output schema is worth the convenience. I built and maintain one on Apify Store: zhorex/bilibili-scraper. Honest current state:
- 16 users
- 9 monthly active
- 100% success rate over 2,635 result extractions
- Average issue response: 4.4 hours
- $5 per 1,000 results (Apify free tier covers ~1,000/month at no cost)
It supports all five modes above with consistent output JSON, handles WBI signing for search internally, and is paired with companion scrapers for the other Chinese platforms — RedNote (Xiaohongshu) and Weibo. Same pricing model across the suite.
Using it from Python:
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_API_TOKEN")
run = client.actor("zhorex/bilibili-scraper").call(run_input={
"mode": "search",
"searchQuery": "AI tutorial",
"sortOrder": "click", # most-viewed first
"durationFilter": "medium", # 10-30 min videos
"maxResults": 50,
})
for video in client.dataset(run["defaultDatasetId"]).iterate_items():
print(f"{video['title']} — {video['viewCount']:,} views, {video['coinCount']:,} coins")
When to use Bilibili specifically
Even if you're already covering YouTube, Twitter, etc., Bilibili captures things those platforms don't:
- Chinese gaming and esports content — game launches, walkthroughs, and esports event reactions live here
- Tutorial and educational content — Knowledge category is huge; replaces YouTube tutorials for Chinese audiences
- Anime and otaku culture — central hub for the Chinese anime community
- Creator economy in China — the coin/favorite metrics give better creator quality signals than subscriber counts (more on that in a follow-up post)
If your use case touches Chinese-market gaming, anime, tech, or education, Bilibili is irreplaceable.
FAQ
Is scraping Bilibili legal?
Public-data scraping legality varies by jurisdiction. Bilibili's ToS prohibits automated access. The scraping approach treats public web pages as accessible (the same content any logged-out browser visitor can see). Consult legal counsel for your specific use case. Not legal advice.
Do I need a Chinese IP?
For most endpoints, no. Bilibili is globally accessible. The exception is comments scraping (covered above). Some licensed video content (anime, dramas) may be geo-restricted but metadata and engagement metrics are accessible from any IP.
What's a BVID?
Bilibili's video ID format. Looks like BV1YXDfBUETP. Replaced the older numeric aid format. URLs use BVIDs: bilibili.com/video/BV1YXDfBUETP.
How does this compare to YouTube Data API?
YouTube has standardized API access for verified channel owners with quota limits. Bilibili has no equivalent for international developers. For Chinese-market analytics, scraping is the only practical option. The upside: more granular engagement signals (coins, favorites, danmaku) than YouTube exposes.
Why did bilibili-api (the Python lib) work for me a year ago and break now?
Probably the WBI signing scheme rotated. Check the library's commit history; if there's been a recent fix, update. If the project looks abandoned, consider switching libraries or going hosted.
If you're working on Chinese-market analytics, brand monitoring, or creator research and want to compare notes — drop a comment. I write about the build-vs-buy tradeoffs for Chinese platform scraping (RedNote, Weibo, Bilibili).
Hosted actor: apify.com/zhorex/bilibili-scraper
Top comments (0)