TikTok is the dominant short-form video platform, and its data is goldmine for trend analysis, influencer research, and content strategy. But the landscape has changed significantly — here's what you can actually collect in 2026 and how to do it.
What Data Is Publicly Available?
You CAN collect:
- Public video metadata (likes, views, shares, comments count)
- Hashtag trends and usage volume
- User profile info (bio, follower counts, post counts)
- Sound/music usage across videos
- Comment text on public videos
You CANNOT (and shouldn't) collect:
- Private account data
- DM/message content
- Data behind login walls without consent
- Data for purposes violating TikTok's terms
Collecting Hashtag Trends
import requests
from bs4 import BeautifulSoup
import json
import time
class TikTokCollector:
"""Collect public TikTok data for trend analysis."""
def __init__(self):
self.session = requests.Session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64)',
'Accept': 'text/html,application/xhtml+xml',
})
def get_hashtag_data(self, hashtag):
"""Get public metadata for a hashtag."""
url = f"https://www.tiktok.com/tag/{hashtag}"
resp = self.session.get(url)
# TikTok embeds JSON data in the page
soup = BeautifulSoup(resp.text, 'html.parser')
script_tag = soup.find('script', id='__UNIVERSAL_DATA_FOR_REHYDRATION__')
if script_tag:
data = json.loads(script_tag.string)
return self._extract_hashtag_info(data)
return None
def _extract_hashtag_info(self, data):
"""Parse hashtag metadata from page data."""
try:
default_scope = data.get('__DEFAULT_SCOPE__', {})
challenge_info = default_scope.get('webapp.challenge-detail', {})
challenge = challenge_info.get('challengeInfo', {})
return {
'name': challenge.get('challenge', {}).get('title', ''),
'views': challenge.get('stats', {}).get('viewCount', 0),
'video_count': challenge.get('stats', {}).get('videoCount', 0),
'description': challenge.get('challenge', {}).get('desc', ''),
}
except (KeyError, TypeError):
return None
Tracking Trending Sounds
def track_trending_content(collector, hashtags):
"""Monitor multiple hashtags for trend analysis."""
results = []
for tag in hashtags:
data = collector.get_hashtag_data(tag)
if data:
results.append(data)
print(f"#{tag}: {data['views']:,} views, {data['video_count']:,} videos")
time.sleep(3) # Be respectful with requests
return results
# Track trending topics
collector = TikTokCollector()
trending = track_trending_content(collector, [
'python', 'coding', 'techreview', 'startup', 'aitools'
])
Building a Trend Detection Pipeline
The real value comes from tracking trends over time:
import pandas as pd
from datetime import datetime
class TrendTracker:
def __init__(self, db_path='tiktok_trends.csv'):
self.db_path = db_path
def record_snapshot(self, hashtags, collector):
"""Take a snapshot of hashtag metrics."""
timestamp = datetime.now().isoformat()
records = []
for tag in hashtags:
data = collector.get_hashtag_data(tag)
if data:
records.append({
'timestamp': timestamp,
'hashtag': data['name'],
'views': data['views'],
'video_count': data['video_count'],
})
time.sleep(2)
df = pd.DataFrame(records)
df.to_csv(self.db_path, mode='a', header=False, index=False)
return df
def detect_spikes(self, hashtag, threshold=1.5):
"""Detect unusual growth in a hashtag."""
df = pd.read_csv(self.db_path,
names=['timestamp', 'hashtag', 'views', 'video_count'])
tag_data = df[df.hashtag == hashtag].copy()
tag_data['views_diff'] = tag_data['views'].diff()
tag_data['growth_rate'] = tag_data['views_diff'] / tag_data['views'].shift(1)
spikes = tag_data[tag_data['growth_rate'] > threshold]
return spikes
Scaling TikTok Data Collection
DIY scraping of TikTok is notoriously difficult — they use sophisticated bot detection, dynamic rendering, and frequently change their page structure. For reliable, ongoing collection, the TikTok Scraper on Apify handles all of this complexity and outputs structured data ready for analysis.
Using Proxies for TikTok
TikTok aggressively blocks datacenter IPs. Residential proxies from services like ScrapeOps are essential for any serious data collection:
def configure_proxy(session):
"""Set up proxy rotation for TikTok requests."""
SCRAPEOPS_KEY = "your_key"
proxy_url = f"https://proxy.scrapeops.io/v1/?api_key={SCRAPEOPS_KEY}&url="
return proxy_url
# Use proxy for requests
proxy_base = configure_proxy(collector.session)
Legal Considerations
TikTok data collection in 2026 exists in a gray area:
- Public data is generally fair game for research and analysis
- Rate limiting is mandatory — don't hammer their servers
- Personal data requires compliance with GDPR/CCPA
- Commercial use should be reviewed by legal counsel
- Don't circumvent authentication — stick to public endpoints
Conclusion
TikTok data collection in 2026 is possible but requires careful handling. Focus on public metadata — hashtag trends, view counts, and content patterns. This data drives real business decisions: content calendars, influencer identification, and market trend detection. Start with the code examples above for small-scale analysis, then use managed scraping services for production pipelines.
Top comments (0)