<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: SIÁN Agency</title>
    <description>The latest articles on DEV Community by SIÁN Agency (@sian-agency).</description>
    <link>https://dev.to/sian-agency</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3854792%2Fcb57fd08-1d47-4084-97aa-8c4879d72af0.png</url>
      <title>DEV Community: SIÁN Agency</title>
      <link>https://dev.to/sian-agency</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sian-agency"/>
    <language>en</language>
    <item>
      <title>I Stopped Writing TikTok Scrapers. Five Lines of Python Replaced Them.</title>
      <dc:creator>SIÁN Agency</dc:creator>
      <pubDate>Mon, 27 Apr 2026 13:34:57 +0000</pubDate>
      <link>https://dev.to/sian-agency/i-stopped-writing-tiktok-scrapers-five-lines-of-python-replaced-them-5824</link>
      <guid>https://dev.to/sian-agency/i-stopped-writing-tiktok-scrapers-five-lines-of-python-replaced-them-5824</guid>
      <description>&lt;p&gt;If your TikTok scraper still uses Playwright + custom selectors, this post will annoy you. Good. Read it anyway.&lt;/p&gt;

&lt;p&gt;I burned three weekends last quarter on a "minimal" TikTok scraper. Selector-first, headless, the works. Worked beautifully for nine days. Then TikTok shipped a layout change at 2am UTC and my fixtures became fiction.&lt;/p&gt;

&lt;p&gt;The honest answer most devs avoid: &lt;strong&gt;for known platforms with stable APIs around them, you should not be writing the scraper.&lt;/strong&gt; You should be calling someone's actor.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stop owning the layer that breaks
&lt;/h2&gt;

&lt;p&gt;Three things break a TikTok scraper, and none of them are about your code:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Layout drift.&lt;/strong&gt; Selectors are a liability the second TikTok touches the DOM.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Auth + rate-limit games.&lt;/strong&gt; Cloudflare, fingerprinting, the whole party.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Audio extraction + transcription.&lt;/strong&gt; Even if you got the video, now you need Whisper, ffmpeg, a queue, and a dead body to bury when it OOMs.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You're not getting paid to maintain that. You're getting paid to ship the thing on top of it.&lt;/p&gt;

&lt;h2&gt;
  
  
  What replaced 800 lines of Python for me
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;apify_client&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ApifyClient&lt;/span&gt;

&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ApifyClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;YOUR_APIFY_TOKEN&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;actor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sian.agency/best-tiktok-ai-transcript-extractor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;call&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;run_input&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;bulkUrls&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://www.tiktok.com/@user/video/7565659068153531669&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dataset&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;defaultDatasetId&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]).&lt;/span&gt;&lt;span class="nf"&gt;iterate_items&lt;/span&gt;&lt;span class="p"&gt;()))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That's the whole thing. Five lines. The actor's input schema has exactly two fields you need to know about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;tiktokUrl&lt;/code&gt; (string) — single video. Pass any URL format. Short links from &lt;code&gt;vm.tiktok.com&lt;/code&gt; get resolved. Mobile share URLs work.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;bulkUrls&lt;/code&gt; (array) — paste 5, 50, or 500. Bulk edit, file upload, line-separated, comma-separated. It doesn't care.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That's the entire input surface. Two keys. No proxy config, no captcha settings, no "headless or headful" debate.&lt;/p&gt;

&lt;h2&gt;
  
  
  What you get back
&lt;/h2&gt;

&lt;p&gt;Per video, you get the AI transcript (99%+ accuracy claimed by the actor — empirically I see ~98% on English, lower on heavy slang) plus 45 metadata fields: views, likes, shares, creator stats, hashtags, music ID, location, content categories. The transcript ships with detected language and segment timing, so you can search inside videos like text.&lt;/p&gt;

&lt;p&gt;I rewrote a competitor-monitoring pipeline last month using this. Old stack: Playwright cluster + Whisper container + Redis + a cron + a Slack channel where I apologized weekly. New stack: a 60-line Python script and the actor. Same dataset, less surface area, no apologies.&lt;/p&gt;

&lt;h2&gt;
  
  
  The objection I keep getting
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;"Why pay per run when I can self-host?"&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Because your time isn't free, and you don't actually self-host — you self-rebuild every two weeks when something shifts. The actor charges per validated result. You only pay for the runs that gave you usable data. That's a different cost model than "compute hours your worker spent crashing."&lt;/p&gt;

&lt;p&gt;If your volume is genuinely huge, sure, build it. But "huge" is an engineering decision, not a default.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it on your own URL
&lt;/h2&gt;

&lt;p&gt;The free tier handles 5 videos per run, 8s delay between them. If you want to see the dataset shape for your own use case, drop a TikTok URL in and watch it run: &lt;a href="https://apify.com/sian.agency/best-tiktok-ai-transcript-extractor?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=nova&amp;amp;utm_content=tiktok-transcripts-5-lines-python" rel="noopener noreferrer"&gt;TikTok AI Transcript Extractor on Apify&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Bulk mode is paid — unlimited per run, no delays, no per-video charges. Use it when you're past the experiment phase.&lt;/p&gt;




&lt;p&gt;&lt;strong&gt;Disagree?&lt;/strong&gt; Drop the snippet you're using to scrape TikTok in the comments. I'll tell you which line is going to break first. Be specific — "I use Puppeteer" is not a snippet.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Written by **Nova Chen&lt;/em&gt;&lt;em&gt;, Automation Dev Advocate at SIÁN Agency. Find more from Nova on &lt;a href="https://dev.to/sian-agency"&gt;dev.to&lt;/a&gt;. For custom scraping or automation work, &lt;a href="https://sian.agency?utm_source=devto&amp;amp;utm_medium=blog&amp;amp;utm_campaign=nova&amp;amp;utm_content=tiktok-transcripts-5-lines-python" rel="noopener noreferrer"&gt;hire SIÁN Agency&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
