YouTube hosts over 800 million videos. Whether you're building a competitor analysis tool, tracking trends, or collecting training data — extracting YouTube data programmatically is a common need.
In this guide, I'll walk you through the practical ways to scrape YouTube in 2026: what data you can get, Python code examples, and when to use the official API vs web scraping.
What YouTube Data Can You Extract?
You can collect:
- Video metadata: title, description, view count, likes, upload date, duration, tags
- Channel info: subscriber count, total videos, channel description, creation date
- Comments: text, author, likes, reply count, timestamps
- Search results: videos matching keywords, filters by date/relevance/views
- Playlists: video lists, playlist metadata
Method 1: YouTube Data API v3
The official API is the cleanest option for structured data.
Setup
import requests
API_KEY = 'YOUR_YOUTUBE_API_KEY'
BASE_URL = 'https://www.googleapis.com/youtube/v3'
def get_video_details(video_id: str) -> dict:
"""Fetch video metadata via YouTube Data API."""
params = {
'part': 'snippet,statistics,contentDetails',
'id': video_id,
'key': API_KEY
}
response = requests.get(f'{BASE_URL}/videos', params=params)
response.raise_for_status()
items = response.json().get('items', [])
if not items:
return {}
item = items[0]
return {
'title': item['snippet']['title'],
'description': item['snippet']['description'],
'views': int(item['statistics'].get('viewCount', 0)),
'likes': int(item['statistics'].get('likeCount', 0)),
'comments': int(item['statistics'].get('commentCount', 0)),
'duration': item['contentDetails']['duration'],
'published_at': item['snippet']['publishedAt'],
'channel': item['snippet']['channelTitle'],
'tags': item['snippet'].get('tags', []),
}
video = get_video_details('dQw4w9WgXcQ')
print(f"{video['title']} — {video['views']:,} views")
Fetching Comments
def get_comments(video_id: str, max_results: int = 100) -> list:
"""Fetch top-level comments for a video."""
comments = []
params = {
'part': 'snippet',
'videoId': video_id,
'maxResults': min(max_results, 100),
'order': 'relevance',
'key': API_KEY
}
while len(comments) < max_results:
response = requests.get(
f'{BASE_URL}/commentThreads', params=params
)
data = response.json()
for item in data.get('items', []):
snippet = item['snippet']['topLevelComment']['snippet']
comments.append({
'author': snippet['authorDisplayName'],
'text': snippet['textDisplay'],
'likes': snippet['likeCount'],
'published_at': snippet['publishedAt'],
})
next_page = data.get('nextPageToken')
if not next_page:
break
params['pageToken'] = next_page
return comments[:max_results]
API Limitations
The YouTube API has a 10,000 unit daily quota (free tier). Each search costs 100 units, video details cost 1-3 units. For large-scale collection, you'll hit limits fast.
Method 2: Web Scraping with Python
When the API quota isn't enough, or you need data the API doesn't expose (like exact subscriber counts or revenue estimates), web scraping fills the gap.
Basic Approach with yt-dlp
yt-dlp is the most reliable tool for extracting YouTube metadata without API keys:
import subprocess
import json
def scrape_video_metadata(url: str) -> dict:
"""Extract video metadata using yt-dlp."""
cmd = [
'yt-dlp',
'--dump-json',
'--no-download',
'--no-warnings',
url
]
result = subprocess.run(cmd, capture_output=True, text=True)
if result.returncode != 0:
raise RuntimeError(f'yt-dlp failed: {result.stderr}')
data = json.loads(result.stdout)
return {
'title': data.get('title'),
'description': data.get('description'),
'views': data.get('view_count'),
'likes': data.get('like_count'),
'duration': data.get('duration'),
'upload_date': data.get('upload_date'),
'channel': data.get('channel'),
'channel_subscribers': data.get('channel_follower_count'),
'tags': data.get('tags', []),
'categories': data.get('categories', []),
'thumbnail': data.get('thumbnail'),
}
video = scrape_video_metadata('https://www.youtube.com/watch?v=dQw4w9WgXcQ')
print(json.dumps(video, indent=2))
Scraping Channel Videos
def scrape_channel_videos(channel_url: str, max_videos: int = 50) -> list:
"""Get all video metadata from a channel."""
cmd = [
'yt-dlp',
'--dump-json',
'--no-download',
'--flat-playlist',
'--playlist-end', str(max_videos),
f'{channel_url}/videos'
]
result = subprocess.run(cmd, capture_output=True, text=True)
videos = []
for line in result.stdout.strip().split('\n'):
if line:
data = json.loads(line)
videos.append({
'id': data.get('id'),
'title': data.get('title'),
'views': data.get('view_count'),
'duration': data.get('duration'),
'url': data.get('url'),
})
return videos
Method 3: Using a Managed Scraper
For production workloads where you need reliability and scale, a managed scraping solution saves you from maintaining infrastructure.
YouTube Scraper on Apify handles the heavy lifting — it extracts video metadata, channel info, comments, and search results with built-in proxy rotation and retry logic. You just provide the URLs and get structured JSON back.
Handling Anti-Scraping Measures
YouTube actively blocks automated requests. Here's what works in 2026:
Proxy Rotation
Using residential proxies is essential for any volume:
import requests
# Using ThorData residential proxies
# Sign up: https://affiliate.thordata.com/0a0x4nzu7tvv
proxies = {
'http': 'http://user:pass@proxy.thordata.com:9090',
'https': 'http://user:pass@proxy.thordata.com:9090',
}
response = requests.get(
'https://www.youtube.com/watch?v=dQw4w9WgXcQ',
proxies=proxies,
headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36'}
)
Alternatively, ScraperAPI handles proxy rotation and CAPTCHA solving automatically:
import requests
SCRAPERAPI_KEY = 'YOUR_KEY'
url = 'https://www.youtube.com/watch?v=dQw4w9WgXcQ'
response = requests.get(
f'http://api.scraperapi.com?api_key={SCRAPERAPI_KEY}&url={url}'
)
html = response.text
Rate Limiting
import time
import random
def polite_request(url: str, session: requests.Session) -> requests.Response:
"""Make a request with random delay to avoid detection."""
time.sleep(random.uniform(2, 5))
return session.get(url)
Storing the Data
For any serious scraping project, dump results to a structured format:
import csv
import json
def save_to_csv(videos: list, filename: str = 'youtube_data.csv'):
"""Save scraped video data to CSV."""
if not videos:
return
keys = videos[0].keys()
with open(filename, 'w', newline='', encoding='utf-8') as f:
writer = csv.DictWriter(f, fieldnames=keys)
writer.writeheader()
writer.writerows(videos)
def save_to_json(videos: list, filename: str = 'youtube_data.json'):
"""Save scraped video data to JSON."""
with open(filename, 'w', encoding='utf-8') as f:
json.dump(videos, f, indent=2, ensure_ascii=False)
YouTube API vs Scraping: When to Use What
| Factor | YouTube API | Web Scraping |
|---|---|---|
| Quota | 10K units/day | Unlimited (with proxies) |
| Data quality | Structured JSON | Requires parsing |
| Setup | API key required | No auth needed |
| Cost | Free (within quota) | Proxy costs |
| Reliability | High | Breaks with site changes |
| Best for | Small-medium projects | Large-scale collection |
Legal Considerations
YouTube's ToS restricts automated access. The YouTube API has its own terms of service. For web scraping:
- Respect
robots.txt - Don't overload servers
- Use data responsibly
- Check local laws (GDPR, CCPA apply to personal data in comments)
Wrapping Up
The best approach depends on your scale:
- < 10K requests/day: Use the YouTube Data API
-
Moderate scale: Use
yt-dlpwith rate limiting - Production scale: Use a managed YouTube scraper with built-in proxy rotation
- Custom needs: Build your own scraper with residential proxies
Pick the method that matches your volume and reliability needs, and always be respectful of the platform's resources.
Top comments (0)