TikTok has become one of the most data-rich platforms on the internet. Whether you're tracking trending hashtags, analyzing creator performance, or building a content aggregation tool, scraping TikTok data is a common need in 2026.
But TikTok doesn't make it easy. In this guide, I'll walk you through the challenges and show you working Python code to extract videos, profiles, and hashtag data.
Why Scraping TikTok Is Hard
TikTok's frontend is heavily JavaScript-rendered. A simple requests.get() returns a mostly empty HTML shell. The actual content loads dynamically via their internal API calls.
On top of that, TikTok employs aggressive anti-bot measures:
- Browser fingerprinting — they check canvas, WebGL, and navigator properties
- Rate limiting — rapid requests from the same IP get blocked fast
- CAPTCHAs — automated access triggers interactive challenges
- Signed API requests — internal API calls require dynamic signatures that change frequently
Approach 1: Using Playwright for Browser-Based Scraping
The most reliable way to scrape TikTok is with a real browser. Here's a working example using Playwright:
import asyncio
from playwright.async_api import async_playwright
import json
async def scrape_tiktok_profile(username: str):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context(
viewport={"width": 1920, "height": 1080},
user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36"
)
page = await context.new_page()
# Navigate to user profile
await page.goto(f"https://www.tiktok.com/@{username}", wait_until="networkidle")
await page.wait_for_timeout(3000)
# Extract profile data from the page's embedded JSON
profile_data = await page.evaluate("""
() => {
const scripts = document.querySelectorAll("script#__UNIVERSAL_DATA_FOR_REHYDRATION__");
if (scripts.length > 0) {
const data = JSON.parse(scripts[0].textContent);
const userInfo = data?.__DEFAULT_SCOPE__?.["webapp.user-detail"]?.userInfo;
if (userInfo) {
return {
nickname: userInfo.user?.nickname,
followers: userInfo.stats?.followerCount,
following: userInfo.stats?.followingCount,
likes: userInfo.stats?.heartCount,
videos: userInfo.stats?.videoCount,
bio: userInfo.user?.signature
};
}
}
return null;
}
""")
await browser.close()
return profile_data
result = asyncio.run(scrape_tiktok_profile("khaby.lame"))
print(json.dumps(result, indent=2))
This extracts the hydration data that TikTok embeds in the page — no need to parse the rendered DOM.
Approach 2: Intercepting TikTok's Internal API
For bulk data collection, intercepting network requests is more efficient:
import asyncio
from playwright.async_api import async_playwright
import json
async def scrape_hashtag_videos(hashtag: str, max_videos: int = 30):
videos = []
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
context = await browser.new_context()
page = await context.new_page()
# Intercept API responses
async def handle_response(response):
if "/api/challenge/item_list" in response.url:
try:
data = await response.json()
for item in data.get("itemList", []):
videos.append({
"id": item["id"],
"description": item.get("desc", ""),
"author": item["author"]["uniqueId"],
"likes": item["stats"]["diggCount"],
"comments": item["stats"]["commentCount"],
"shares": item["stats"]["shareCount"],
"plays": item["stats"]["playCount"],
"created": item["createTime"]
})
except Exception:
pass
page.on("response", handle_response)
await page.goto(f"https://www.tiktok.com/tag/{hashtag}", wait_until="networkidle")
# Scroll to load more
for _ in range(3):
await page.evaluate("window.scrollTo(0, document.body.scrollHeight)")
await page.wait_for_timeout(2000)
await browser.close()
return videos[:max_videos]
results = asyncio.run(scrape_hashtag_videos("python", max_videos=20))
for v in results:
print(f"{v["author"]}: {v["plays"]} plays - {v["description"][:50]}")
The Proxy Problem
Running these scripts from a single IP will get you blocked within minutes. TikTok's anti-bot system tracks request patterns aggressively.
You need residential proxies — IPs that look like real home internet connections. I recommend ThorData for this. Their residential proxy pool works well with TikTok because the IPs rotate automatically and come from real ISP ranges.
Here's how to add proxy support to the Playwright script:
browser = await p.chromium.launch(
headless=True,
proxy={
"server": "http://proxy.thordata.com:9090",
"username": "your-username",
"password": "your-password"
}
)
For a simpler approach, ScraperAPI handles the proxy rotation and browser rendering for you. Just send your URL through their endpoint and get back rendered HTML:
import requests
url = "https://www.tiktok.com/@khaby.lame"
response = requests.get(
"https://api.scraperapi.com",
params={
"api_key": "YOUR_KEY",
"url": url,
"render": "true"
}
)
html = response.text
Downloading TikTok Videos
Once you have video metadata, downloading the actual video files requires extracting the video URL from TikTok's CDN:
import asyncio
from playwright.async_api import async_playwright
async def download_tiktok_video(video_url: str, output_path: str):
async with async_playwright() as p:
browser = await p.chromium.launch(headless=True)
page = await browser.new_page()
video_src = None
async def handle_response(response):
nonlocal video_src
content_type = response.headers.get("content-type", "")
if "video/mp4" in content_type:
video_src = response.url
page.on("response", handle_response)
await page.goto(video_url, wait_until="networkidle")
await page.wait_for_timeout(3000)
if video_src:
import httpx
async with httpx.AsyncClient() as client:
resp = await client.get(video_src)
with open(output_path, "wb") as f:
f.write(resp.content)
print(f"Downloaded to {output_path}")
await browser.close()
asyncio.run(download_tiktok_video(
"https://www.tiktok.com/@user/video/123456",
"video.mp4"
))
The Easy Way: Pre-Built TikTok Scraper
Building and maintaining a TikTok scraper is a constant battle against their anti-bot updates. If you need reliable, production-grade scraping, I'd recommend using the TikTok Scraper on Apify. It handles all the browser rendering, proxy rotation, and anti-detection out of the box.
You can run it via the Apify API:
from apify_client import ApifyClient
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("cryptosignals/tiktok-scraper").call(
run_input={
"profiles": ["khaby.lame", "charlidamelio"],
"hashtags": ["python", "coding"],
"maxVideos": 50
}
)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
print(item)
It handles CAPTCHAs, rotates proxies automatically, and returns clean structured data. Pay-per-use pricing means you only pay for what you scrape.
Rate Limiting Best Practices
Whether you build your own scraper or use a tool, respect these limits:
- Add random delays between requests (2-5 seconds minimum)
- Rotate user agents on every request
- Use residential proxies — datacenter IPs get flagged instantly
- Don't scrape logged-in pages — TikTok monitors authenticated sessions more closely
- Cache aggressively — don't re-scrape data you already have
Conclusion
Scraping TikTok in 2026 requires a browser-based approach with solid proxy infrastructure. The Playwright examples above will get you started, but for production workloads, consider using residential proxies from ThorData or a managed scraping service like ScraperAPI.
If you want to skip the infrastructure headaches entirely, the TikTok Scraper on Apify handles everything end-to-end.
Happy scraping!
Top comments (0)