DEV Community

Gary Lee
Gary Lee

Posted on

How to Scrape RedNote (Xiaohongshu) Without Coding

If you've tried to pull data from RedNote — the English name for Xiaohongshu (小红书) — you already know it's one of the harder social platforms to scrape. There's no public API, the mobile and web apps are heavily obfuscated, and most "tutorials" stop at a curl command that breaks within a week.

This post covers why RedNote is hard to scrape, the three realistic ways to do it, and a no-code path if you don't want to maintain a scraper yourself.

Why RedNote is harder than TikTok or Instagram

A few things make Xiaohongshu a pain compared to other platforms:

  1. Signed request headers. Every API call to edith.xiaohongshu.com needs valid x-s, x-t, and x-s-common headers. These are generated by an obfuscated JS function (window._webmsxyw) that changes periodically. Replay a captured header and you get a 461 / sign-error within minutes.
  2. Aggressive anti-bot. Hit the same endpoint a few times from a datacenter IP and you'll get a sliding-captcha or a silent empty response. Residential proxies + pacing are basically mandatory.
  3. No official API. Unlike YouTube or (historically) Twitter, there's no developer program. Everything is reverse-engineered from the web/app.
  4. Fast-moving frontend. The note detail payload structure changes, fields get renamed, and noteId ↔ xsec_token coupling means you often can't fetch a note without a fresh token from the feed it appeared in. So the real problem isn't writing the first request — it's keeping it working.

Option 1 — Roll your own (most control, most maintenance)

The DIY stack usually looks like:

  • A headless browser (Playwright) to log in and grab the signing context, or a reverse-engineered JS signer ported to Python/Node.
  • A residential proxy pool with rotation.
  • Retry + captcha-handling logic.
  • A parser that survives field renames. This works, and gives you full control. The catch: you're now maintaining an anti-bot arms race. Most teams I've seen spend more time fixing the signer after a Xiaohongshu update than using the data. Fine if scraping is your product — overkill if you just need the data.

Option 2 — Generic scraping platforms (Apify, Bright Data)

Marketplaces like Apify have community "actors" for Xiaohongshu, and Bright Data sells a managed dataset/scraper. This offloads the maintenance.

Trade-offs:

  • Cost. Bright Data in particular gets expensive fast at volume.
  • Coverage gaps. Community actors break when Xiaohongshu updates and the fix depends on whoever maintains that actor.
  • RedNote specifically is thin. Most actors are TikTok/Instagram-first; Xiaohongshu support tends to lag. Option 3 — A managed API (no code)

If you just want clean JSON without running browsers or babysitting a signer, a managed scraping API is the no-code path. You send a profile URL or note ID, you get structured data back. Someone else eats the anti-bot maintenance.

Things to check before picking one:

  • Does it actually cover RedNote/Xiaohongshu? Many "social scraping APIs" advertise TikTok + Instagram and quietly omit Xiaohongshu. Test the endpoint you actually need.
  • Profiles, posts, and comments? Comments are where most competitor/audience analysis happens, and they're the first thing cheap APIs drop.
  • Output format. You want flat, predictable JSON — not a raw HTML dump you have to parse again.
  • Pricing model. Per-request beats per-compute-second for predictable cost. We build SpiderHubs partly to fill the RedNote gap — one API across TikTok, Instagram, YouTube, Douyin and Xiaohongshu, returning profiles, posts and comments as clean JSON, positioned as an affordable Apify / Bright Data alternative. (Disclosure: I work on it.) But the checklist above applies to whatever you pick.

A no-code workflow if you just need the data once

You don't always need an API. If it's a one-off pull:

  1. Find the creator/topic feed you care about.
  2. Use a managed scraper or no-code monitoring tool to pull the latest posts + engagement into a sheet/JSON.
  3. Set it to re-run daily if you're tracking competitors over time — the daily delta is usually what you actually want, not a one-time dump. That last point is the real reason most people scrape Xiaohongshu: tracking competitors and trending content over time, not a single snapshot. Whatever route you pick, design for the recurring pull, not the first request.

SpiderHubs | 小红书·抖音·TikTok 爆款数据自动监控 SaaS

SpiderHubs 是面向内容创作者、品牌营销和数据分析师的自媒体数据监控 SaaS:每天自动爬取小红书、抖音、TikTok、YouTube、Instagram、X/Twitter 等主流平台的 Top 博主与竞品内容,支持原始视频、无水印素材、文案与评论批量导出,零账号风险。

favicon spiderhubs.com

TL;DR

  • RedNote is hard because of signed headers (x-s/x-t), aggressive anti-bot, and no official API.
  • DIY = full control + permanent maintenance.
  • Apify/Bright Data = less maintenance, but cost + thin Xiaohongshu coverage.
  • Managed API = no code; just verify it actually covers Xiaohongshu (profiles + posts + comments) and returns clean JSON.
  • Whatever you choose, build for the daily recurring pull, not the one-time request.

What's your current setup for Xiaohongshu data — DIY signer, Apify, or something else? Curious what's holding up best after their recent updates.

Top comments (0)