DEV Community

Cover image for I Stopped Writing TikTok Scrapers. Five Lines of Python Replaced Them.
SIÁN Agency
SIÁN Agency

Posted on • Originally published at apify.com

I Stopped Writing TikTok Scrapers. Five Lines of Python Replaced Them.

If your TikTok scraper still uses Playwright + custom selectors, this post will annoy you. Good. Read it anyway.

I burned three weekends last quarter on a "minimal" TikTok scraper. Selector-first, headless, the works. Worked beautifully for nine days. Then TikTok shipped a layout change at 2am UTC and my fixtures became fiction.

The honest answer most devs avoid: for known platforms with stable APIs around them, you should not be writing the scraper. You should be calling someone's actor.

Stop owning the layer that breaks

Three things break a TikTok scraper, and none of them are about your code:

  1. Layout drift. Selectors are a liability the second TikTok touches the DOM.
  2. Auth + rate-limit games. Cloudflare, fingerprinting, the whole party.
  3. Audio extraction + transcription. Even if you got the video, now you need Whisper, ffmpeg, a queue, and a dead body to bury when it OOMs.

You're not getting paid to maintain that. You're getting paid to ship the thing on top of it.

What replaced 800 lines of Python for me

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("sian.agency/best-tiktok-ai-transcript-extractor").call(
    run_input={"bulkUrls": ["https://www.tiktok.com/@user/video/7565659068153531669"]}
)
print(list(client.dataset(run["defaultDatasetId"]).iterate_items()))
Enter fullscreen mode Exit fullscreen mode

That's the whole thing. Five lines. The actor's input schema has exactly two fields you need to know about:

  • tiktokUrl (string) — single video. Pass any URL format. Short links from vm.tiktok.com get resolved. Mobile share URLs work.
  • bulkUrls (array) — paste 5, 50, or 500. Bulk edit, file upload, line-separated, comma-separated. It doesn't care.

That's the entire input surface. Two keys. No proxy config, no captcha settings, no "headless or headful" debate.

What you get back

Per video, you get the AI transcript (99%+ accuracy claimed by the actor — empirically I see ~98% on English, lower on heavy slang) plus 45 metadata fields: views, likes, shares, creator stats, hashtags, music ID, location, content categories. The transcript ships with detected language and segment timing, so you can search inside videos like text.

I rewrote a competitor-monitoring pipeline last month using this. Old stack: Playwright cluster + Whisper container + Redis + a cron + a Slack channel where I apologized weekly. New stack: a 60-line Python script and the actor. Same dataset, less surface area, no apologies.

The objection I keep getting

"Why pay per run when I can self-host?"

Because your time isn't free, and you don't actually self-host — you self-rebuild every two weeks when something shifts. The actor charges per validated result. You only pay for the runs that gave you usable data. That's a different cost model than "compute hours your worker spent crashing."

If your volume is genuinely huge, sure, build it. But "huge" is an engineering decision, not a default.

Try it on your own URL

The free tier handles 5 videos per run, 8s delay between them. If you want to see the dataset shape for your own use case, drop a TikTok URL in and watch it run: TikTok AI Transcript Extractor on Apify.

Bulk mode is paid — unlimited per run, no delays, no per-video charges. Use it when you're past the experiment phase.


Disagree? Drop the snippet you're using to scrape TikTok in the comments. I'll tell you which line is going to break first. Be specific — "I use Puppeteer" is not a snippet.


Written by **Nova Chen, Automation Dev Advocate at SIÁN Agency. Find more from Nova on dev.to. For custom scraping or automation work, hire SIÁN Agency.

Top comments (0)