How to Scrape TikTok: Videos, Profiles, and Trending Content

#python #webdev #tutorial #programming

TikTok's rapid growth makes it a prime target for data analysis. This guide covers practical approaches to collecting TikTok data for research and analytics.

The Challenge with TikTok

TikTok has aggressive anti-scraping measures:

Heavy JavaScript rendering
Device fingerprinting
Encrypted API parameters
Frequent anti-bot updates

Approach 1: Web Endpoint Data Extraction

TikTok's web app embeds data you can extract:

import requests, re, json, time

class TikTokScraper:
    BASE_URL = "https://www.tiktok.com"

    def __init__(self):
        self.session = requests.Session()
        self.session.headers.update({
            "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) "
                          "AppleWebKit/537.36 Chrome/120.0.0.0 Safari/537.36",
            "Referer": "https://www.tiktok.com/",
        })

    def get_user_info(self, username):
        url = f"{self.BASE_URL}/@{username}"
        response = self.session.get(url)
        if '__UNIVERSAL_DATA_FOR_REHYDRATION__' in response.text:
            match = re.search(
                r'<script id="__UNIVERSAL_DATA_FOR_REHYDRATION__"[^>]*>(.*?)</script>',
                response.text
            )
            if match:
                return json.loads(match.group(1))
        return None

Approach 2: Playwright for Dynamic Content

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Scraping Trending Content

# Implementation is proprietary (that IS the moat).
# Skip the build — use our ready-made Apify actor:
# see the CTA below for the link (fpr=yw6md3).

Data Storage and Analysis

import csv
from datetime import datetime
from collections import Counter

def save_tiktok_data(videos, filename="tiktok_data.csv"):
    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.DictWriter(f, fieldnames=["url", "description", "scraped_at"])
        writer.writeheader()
        for video in videos:
            video["scraped_at"] = datetime.now().isoformat()
            writer.writerow(video)
    print(f"Saved {len(videos)} videos to {filename}")

def analyze_content_themes(videos):
    import re
    all_hashtags = []
    for video in videos:
        desc = video.get("description", "")
        tags = re.findall(r"#(\w+)", desc)
        all_hashtags.extend(tags)
    return Counter(all_hashtags).most_common(20)

Handling Anti-Bot Measures

TikTok requires sophisticated proxy rotation. ScraperAPI provides JavaScript rendering with automatic proxy rotation. For residential IPs, ThorData is a solid choice.

Monitor your TikTok scraper's health with ScrapeOps — TikTok changes its defenses frequently, so you'll want immediate alerts when your scraper breaks.

Ethical Considerations

Only scrape publicly available content
Never scrape private accounts or DMs
Respect rate limits
Don't use scraped data for harassment
Consider using TikTok's official Research API if you qualify
Comply with GDPR and CCPA

Conclusion

TikTok scraping is technically challenging but possible with the right tools. Use browser automation for reliability, rotate proxies for sustainability, and always respect both the platform and its users' privacy.