Sami

Posted on Apr 25 • Edited on May 6

How to scrape RedNote (Xiaohongshu) with Python in 2026 — the auth/signing problem and how to handle it

#python #webscraping #datascience #china

RedNote (Xiaohongshu, 小红书, sometimes "Little Red Book" or just XHS) is the platform a lot of Western teams realized they needed to monitor in 2024-2025, when the TikTok regulatory mess in the US sent millions of users — and brand attention — toward Chinese platforms. It's now China's #1 lifestyle and product-discovery network, with 300M+ monthly active users and a search-driven discovery model that makes it different from every other Chinese social platform.

The problem: there's no official public API. Western teams who try to monitor it usually end up either (a) paying enterprise vendors $20-50k/year for limited China coverage, or (b) trying to scrape it themselves and discovering that RedNote has one of the more aggressive anti-scraping stacks in Chinese social.

This article walks through the actual technical challenges and shows you both DIY and hosted approaches with real Python code. I've shipped a hosted RedNote scraper on Apify that I'll mention later — but the goal here is for you to understand the problem space well enough to make an informed build-vs-buy decision, not to sell you anything.

What RedNote actually serves

Before we go technical: what data does RedNote expose, and what's actually useful?

A RedNote post is structured roughly like this:

Title (often very short, sometimes empty)
Body text — long-form description with product mentions, hashtags, location tags
Image carousel — 1-9 images. Critical: a non-trivial portion of product info lives in image text overlays, not in the body
Engagement metrics — likes, saves, comments, shares
Author profile — username, avatar, follower/following counts, bio, verification, location
Tags / categories — hashtags and platform-assigned categories

For most monitoring use cases, the metric that matters more than likes is saves. Saves on RedNote are the closest equivalent to "I want to buy this later" — they correlate with purchase intent. Likes on RedNote are casual engagement, similar to Twitter likes.

Profile data is structured similarly:

User ID, RedNote ID (red ID), nickname, avatar
Bio / description
Follower / following counts
Location, gender, profile tags
Total likes received across all posts
Verification status

The technical challenges (why this is harder than scraping Twitter)

If you've scraped Western social platforms, your default toolkit is probably httpx or requests plus maybe a residential proxy. RedNote is going to break each of those defaults.

Challenge 1: TLS fingerprinting

RedNote uses TLS fingerprinting (specifically JA3/JA4) to identify and block requests that don't come from real browsers. The requests library has a Python-specific TLS fingerprint that RedNote's bot-detection layer recognizes immediately.

The standard fix is to use curl_cffi, which lets you spoof a Chrome or Safari TLS fingerprint:

from curl_cffi import requests as curl_requests

# Spoof Chrome 120's TLS fingerprint
response = curl_requests.get(
    "https://www.xiaohongshu.com/explore",
    impersonate="chrome120"
)

This alone gets you past the first layer of detection.

Challenge 2: Request signing

RedNote signs every API request with a value called x-s (sometimes seen as xs) plus other parameters like x-t and x-s-common. These are computed client-side from a JavaScript function in their web app.

The signing function changes roughly monthly. When it changes, every scraper using the old signing logic breaks until someone reverse-engineers the new function.

Here's roughly what you need to do:

# Pseudo-code — actual signing logic is more complex

import time
import json
import hashlib

def generate_signing_headers(url_path: str, body: dict) -> dict:
    """
    The actual logic is reverse-engineered from RedNote's web client JS.
    This requires reading their obfuscated bundle and reproducing it in Python.
    """
    timestamp = str(int(time.time() * 1000))
    body_str = json.dumps(body, separators=(',', ':')) if body else ""

    # Real implementation involves:
    # - Specific input concatenation order
    # - Custom hashing scheme (not standard HMAC)
    # - Several "magic constants" that change when RedNote rotates
    # - Sometimes a captcha-derived token

    raw = f"url={url_path}&data={body_str}&t={timestamp}"
    x_s = hashlib.md5(raw.encode()).hexdigest()  # This is NOT the actual algorithm

    return {
        "x-s": x_s,
        "x-t": timestamp,
        # x-s-common is computed separately
    }

The actual signing algorithm is more complex than what I've shown. There are open-source libraries that have reverse-engineered it (xhs-api and similar on GitHub) — they get you most of the way there, but expect to patch them when RedNote rotates.

Challenge 3: IP-level rate limiting and datacenter blocking

RedNote blocks requests from datacenter IPs (AWS, GCP, Azure, DigitalOcean, etc.) within minutes. You need residential proxies, ideally with Chinese geolocation or at least Asia-Pacific.

Even with residential IPs, there's a per-IP rate limit. Realistic throughput is around 10-20 requests per minute per IP before you start getting 412/418 errors and eventually IP bans.

Challenge 4: SPA / dynamic rendering for some endpoints

Search and the explore feed are loaded via AJAX after initial page load, but a few endpoints (some user pages, certain post types) only render their data in the Vue.js application state. You either need to extract data from the inline <script> tag (look for window.__INITIAL_STATE__) or render with Playwright.

Challenge 5: Login walls on certain features

True keyword-filtered search requires login. Without login, you get the explore feed (trending/recommended for your keyword), which is useful but not the same. This is a structural product limitation, not a scraping limitation — you can scrape it the same way logged-in users see it, you just need to either provide cookies or accept the explore-feed fallback.

Approach 1: Build it yourself

If you have ops capacity to maintain it (someone who can read JavaScript and reverse-engineer signing functions monthly), DIY is feasible. Here's a minimal example using curl_cffi plus an open-source signing library.

First install:

pip install curl_cffi xhs

Note: xhs is one of several open-source libraries on GitHub that wrap RedNote's API. Check their commit history before depending on one — the abandoned ones break monthly.

from xhs import XhsClient
from xhs.exceptions import DataFetchError

# You need to provide your own cookies and signing function URL
# The 'sign' function comes from the library's reverse-engineered JS
def sign(uri, data=None, a1="", web_session=""):
    # Implementation provided by the library
    # When RedNote rotates, you'll need to update this
    pass

client = XhsClient(
    cookie="abRequestId=...; webBuild=...; xsecappid=xhs-pc-web; ...",
    sign=sign
)

try:
    # Get a user's posts
    user_id = "5cfbc3f10000000018023ebb"
    posts = client.get_user_notes(user_id)

    for post in posts.get("notes", []):
        print(f"Title: {post.get('display_title')}")
        print(f"Likes: {post.get('interact_info', {}).get('liked_count')}")
        print("---")

except DataFetchError as e:
    print(f"RedNote rejected the request: {e}")
    # Common causes:
    # - Signing function out of date (update from upstream)
    # - Cookie expired (re-login)
    # - IP throttled (rotate proxy)

The cookie you need comes from logging into RedNote in a browser and copying the relevant cookies from DevTools. The cookies expire — typically a few days — so you'll need to refresh them periodically.

Here's the honest cost breakdown for DIY:

Cost component	Estimate
Initial setup (researching libraries, getting first scrape working)	8-16 hours
Residential proxy (Bright Data, Oxylabs, etc.)	$50-200/month for moderate volume
Per-incident maintenance when RedNote rotates	4-8 hours, 1-2x/month
Ongoing: cookie refresh, error handling	1-2 hours/week

If you have a developer whose time is worth $50-100/hour, DIY is around $400-1000/month all-in for moderate scraping volumes.

Approach 2: Use a hosted scraper

The build-vs-buy math changes if you don't have someone on the team who can read reverse-engineered JavaScript and patch signing logic. Hosted Apify Actors handle that for you.

Several developers (including me) maintain RedNote scrapers on Apify. Mine is:

apify.com/zhorex/rednote-xiaohongshu-scraper
$5 per 1,000 results
14 paying users currently, 38 on free tier
Average issue response time: 1.6 hours
88.8% success rate (the gap is mostly RedNote-side transient errors)

Using it from Python:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_API_TOKEN")

# Search
run = client.actor("zhorex/rednote-xiaohongshu-scraper").call(run_input={
    "mode": "search",
    "searchQuery": "skincare routine",
    "maxResults": 50,
    "filterByMinLikes": 100  # Only return posts with 100+ likes
})

# Iterate over results
for post in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(f"Title: {post['title']}")
    print(f"Likes: {post['likes']}")
    print(f"Author: {post['author']['nickname']}")
    print(f"URL: {post['postUrl']}")
    print("---")

The output JSON is flat:

{
  "mode": "search",
  "postId": "69d269310000000023017e07",
  "postUrl": "https://www.xiaohongshu.com/explore/69d269310000000023017e07",
  "type": "normal",
  "title": "Morning skincare routine for dry skin",
  "images": ["https://sns-webpic-qc.xhscdn.com/..."],
  "likes": 15234,
  "author": {
    "userId": "575d32285e87e733f0162c0a",
    "nickname": "BeautyQueen",
    "avatar": "https://sns-avatar-qc.xhscdn.com/..."
  },
  "scrapedAt": "2026-04-25T21:14:30Z"
}

This is one option among several on Apify Store. EasyApi has the most users by volume; OrbitData Labs has a different all-in-one approach. Pricing is roughly the same across them ($5/1000 ± $1). Differences are in:

Output schema (some return RedNote's raw nested API response, some flatten it)
Update frequency (some are abandoned and break for weeks at a time)
Mode coverage (some only do search; others handle profiles, comments, videos, etc.)
Issue response time

If you're evaluating, run a free-tier test on 2-3 of them with the same input and compare what you get back. The free tier costs you nothing.

When does each approach make sense?

DIY (build it yourself):

You have a Chinese-language ops team and can monitor breakages
You're processing > 1M posts/month (the per-result cost of hosted starts to add up)
You need to scrape behind login (which means you need cookies from logged-in accounts you control)
You have specific data needs that no hosted scraper covers

Hosted Apify actor:

You don't have a dedicated scraper engineer
Volume is variable or moderate (< 500k posts/month)
You want to outsource the cat-and-mouse with RedNote's anti-bot updates
You're prototyping and want to validate the approach before committing

The middle ground that often makes sense: use a hosted actor for production data flow, build a thin DIY layer for any specific endpoints the hosted version doesn't cover. The hosted scraper handles the maintenance burden on the parts that break most often (search, profile, posts), and you keep custom DIY logic for the edges.

What you do with the data downstream

Scraping is one third of the problem. The other two:

Sentiment analysis on Chinese text. Off-the-shelf Chinese BERT models (like bert-base-chinese from Huggingface) are a starting point but accuracy varies wildly by domain. RedNote slang, in particular, doesn't appear in the training data of general Chinese sentiment models — fine-tuning on RedNote-specific labeled samples gets you significant accuracy lift if accuracy matters.

Image text extraction. A non-trivial portion of product mentions on RedNote live in image text overlays (Chinese users frequently put product names visible in images, not in the post body). PaddleOCR is the open-source standard for Chinese OCR. Slow (~30 seconds per image) but reliable. Adds significant cost to processing pipelines but you'll miss a measurable percentage of product mentions without it.

Both of these are downstream of scraping — solve scraping first, then layer.

FAQ

Is scraping RedNote legal?

Public-data scraping legality varies by jurisdiction. RedNote's ToS prohibits automated access (as do most platform ToS). The Apify approach (and most public-scraping infrastructure) treats public web pages as accessible, the same way Google's crawler would. You should consult legal counsel for your specific use case. Not legal advice.

How fast can I scrape RedNote?

Realistic sustained throughput per IP is around 10-20 requests per minute before triggering rate limits. With residential proxy rotation and proper backoff, you can scale this horizontally — Apify's actor handles this internally. For DIY, plan for ~1 result per second per IP as a conservative number.

Do I need a Chinese IP?

Not strictly required, but residential IPs (Asian residential preferred) have notably higher success rates than US/European residential. Datacenter IPs are blocked outright.

What's xsec_token and why does it matter?

When users share posts via the RedNote app or copy URLs, those URLs include an xsec_token query parameter that authenticates the link request. Some scrapers don't handle URLs with xsec_token correctly and return errors. If you're scraping URLs collected from real users, make sure your tooling supports this.

Can I scrape video files from RedNote posts?

Yes. Video URLs are returned in the post metadata. Direct download from those URLs works without authentication for public videos.

How often does the request signing change?

Roughly monthly, sometimes more frequently around major RedNote app updates. If you're doing DIY, plan to dedicate 4-8 hours per rotation to update your signing function, or rely on an actively maintained library that pushes updates fast.

What's the difference between RedNote and Xiaohongshu?

They're the same platform. "Xiaohongshu" (小红书) is the Chinese name and means "Little Red Book". "RedNote" is the English brand they pushed during the 2024-2025 TikTok migration to be more accessible to global users. Same app, same data, same API endpoints.

If you're working on China market intelligence, brand monitoring, or competitive research and want to compare notes — drop me a comment. I write about Chinese platform scraping (Weibo, Bilibili, RedNote) and the build-vs-buy trade-offs around them.

The actor I mentioned: apify.com/zhorex/rednote-xiaohongshu-scraper. Free tier covers ~1,000 results, which is enough to validate against your specific use case before committing.

DEV Community