DEV Community

agenthustler
agenthustler

Posted on

How to Scrape TikTok in 2026: Videos, Profiles, and Hashtags

TikTok has become one of the most data-rich platforms on the internet. Whether you're tracking trending hashtags, analyzing creator performance, or building a content aggregation tool, scraping TikTok data is a common need in 2026.

But TikTok doesn't make it easy. In this guide, I'll walk you through the challenges and show you working Python code to extract videos, profiles, and hashtag data.

Why Scraping TikTok Is Hard

TikTok's frontend is heavily JavaScript-rendered. A simple requests.get() returns a mostly empty HTML shell. The actual content loads dynamically via their internal API calls.

On top of that, TikTok employs aggressive anti-bot measures:

  • Browser fingerprinting — they check canvas, WebGL, and navigator properties
  • Rate limiting — rapid requests from the same IP get blocked fast
  • CAPTCHAs — automated access triggers interactive challenges
  • Signed API requests — internal API calls require dynamic signatures that change frequently

Approach 1: Using Playwright for Browser-Based Scraping

The most reliable way to scrape TikTok is with a real browser. Here's a working example using Playwright:

import asyncio
from playwright.async_api import async_playwright
import json

async def scrape_tiktok_profile(username: str):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context(
            viewport={"width": 1920, "height": 1080},
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
        )
        page = await context.new_page()

        # Navigate to user profile
        await page.goto(f"https://www.tiktok.com/@{username}", wait_until="networkidle")
        await page.wait_for_timeout(3000)

        # Extract profile data from the page's embedded JSON
        profile_data = await page.evaluate("""
            () => {
                const scripts = document.querySelectorAll("script#__UNIVERSAL_DATA_FOR_REHYDRATION__");
                if (scripts.length > 0) {
                    const data = JSON.parse(scripts[0].textContent);
                    const userInfo = data?.__DEFAULT_SCOPE__?.["webapp.user-detail"]?.userInfo;
                    if (userInfo) {
                        return {
                            nickname: userInfo.user?.nickname,
                            followers: userInfo.stats?.followerCount,
                            following: userInfo.stats?.followingCount,
                            likes: userInfo.stats?.heartCount,
                            videos: userInfo.stats?.videoCount,
                            bio: userInfo.user?.signature
                        };
                    }
                }
                return null;
            }
        """)

        await browser.close()
        return profile_data

result = asyncio.run(scrape_tiktok_profile("khaby.lame"))
print(json.dumps(result, indent=2))
Enter fullscreen mode Exit fullscreen mode

This extracts the hydration data that TikTok embeds in the page — no need to parse the rendered DOM.

Approach 2: Intercepting TikTok's Internal API

For bulk data collection, intercepting network requests is more efficient:

import asyncio
from playwright.async_api import async_playwright
import json

async def scrape_hashtag_videos(hashtag: str, max_videos: int = 30):
    videos = []

    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        context = await browser.new_context()
        page = await context.new_page()

        # Intercept API responses
        async def handle_response(response):
            if "/api/challenge/item_list" in response.url:
                try:
                    data = await response.json()
                    for item in data.get("itemList", []):
                        videos.append({
                            "id": item["id"],
                            "description": item.get("desc", ""),
                            "author": item["author"]["uniqueId"],
                            "likes": item["stats"]["diggCount"],
                            "comments": item["stats"]["commentCount"],
                            "shares": item["stats"]["shareCount"],
                            "plays": item["stats"]["playCount"],
                            "created": item["createTime"]
                        })
                except Exception:
                    pass

        page.on("response", handle_response)
        await page.goto(f"https://www.tiktok.com/tag/{hashtag}", wait_until="networkidle")

        # Scroll to load more
        for _ in range(3):
            await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
            await page.wait_for_timeout(2000)

        await browser.close()

    return videos[:max_videos]

results = asyncio.run(scrape_hashtag_videos("python", max_videos=20))
for v in results:
    print(f"{v["author"]}: {v["plays"]} plays - {v["description"][:50]}")
Enter fullscreen mode Exit fullscreen mode

The Proxy Problem

Running these scripts from a single IP will get you blocked within minutes. TikTok's anti-bot system tracks request patterns aggressively.

You need residential proxies — IPs that look like real home internet connections. I recommend ThorData for this. Their residential proxy pool works well with TikTok because the IPs rotate automatically and come from real ISP ranges.

Here's how to add proxy support to the Playwright script:

browser = await p.chromium.launch(
    headless=True,
    proxy={
        "server": "http://proxy.thordata.com:9090",
        "username": "your-username",
        "password": "your-password"
    }
)
Enter fullscreen mode Exit fullscreen mode

For a simpler approach, ScraperAPI handles the proxy rotation and browser rendering for you. Just send your URL through their endpoint and get back rendered HTML:

import requests

url = "https://www.tiktok.com/@khaby.lame"
response = requests.get(
    "https://api.scraperapi.com",
    params={
        "api_key": "YOUR_KEY",
        "url": url,
        "render": "true"
    }
)
html = response.text
Enter fullscreen mode Exit fullscreen mode

Downloading TikTok Videos

Once you have video metadata, downloading the actual video files requires extracting the video URL from TikTok's CDN:

import asyncio
from playwright.async_api import async_playwright

async def download_tiktok_video(video_url: str, output_path: str):
    async with async_playwright() as p:
        browser = await p.chromium.launch(headless=True)
        page = await browser.new_page()

        video_src = None

        async def handle_response(response):
            nonlocal video_src
            content_type = response.headers.get("content-type", "")
            if "video/mp4" in content_type:
                video_src = response.url

        page.on("response", handle_response)
        await page.goto(video_url, wait_until="networkidle")
        await page.wait_for_timeout(3000)

        if video_src:
            import httpx
            async with httpx.AsyncClient() as client:
                resp = await client.get(video_src)
                with open(output_path, "wb") as f:
                    f.write(resp.content)
                print(f"Downloaded to {output_path}")

        await browser.close()

asyncio.run(download_tiktok_video(
    "https://www.tiktok.com/@user/video/123456",
    "video.mp4"
))
Enter fullscreen mode Exit fullscreen mode

The Easy Way: Pre-Built TikTok Scraper

Building and maintaining a TikTok scraper is a constant battle against their anti-bot updates. If you need reliable, production-grade scraping, I'd recommend using the TikTok Scraper on Apify. It handles all the browser rendering, proxy rotation, and anti-detection out of the box.

You can run it via the Apify API:

from apify_client import ApifyClient

client = ApifyClient("YOUR_APIFY_TOKEN")

run = client.actor("cryptosignals/tiktok-scraper").call(
    run_input={
        "profiles": ["khaby.lame", "charlidamelio"],
        "hashtags": ["python", "coding"],
        "maxVideos": 50
    }
)

for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)
Enter fullscreen mode Exit fullscreen mode

It handles CAPTCHAs, rotates proxies automatically, and returns clean structured data. Pay-per-use pricing means you only pay for what you scrape.

Rate Limiting Best Practices

Whether you build your own scraper or use a tool, respect these limits:

  1. Add random delays between requests (2-5 seconds minimum)
  2. Rotate user agents on every request
  3. Use residential proxies — datacenter IPs get flagged instantly
  4. Don't scrape logged-in pages — TikTok monitors authenticated sessions more closely
  5. Cache aggressively — don't re-scrape data you already have

Conclusion

Scraping TikTok in 2026 requires a browser-based approach with solid proxy infrastructure. The Playwright examples above will get you started, but for production workloads, consider using residential proxies from ThorData or a managed scraping service like ScraperAPI.

If you want to skip the infrastructure headaches entirely, the TikTok Scraper on Apify handles everything end-to-end.

Happy scraping!

Top comments (0)