bulkdl

Posted on Jun 19

How to Download TikTok Captions, Hashtags, and Metadata — The Complete Data Export Guide

#tiktok #metadata #captions #tutorial

How to Download TikTok Captions, Hashtags, and Metadata — The Complete Data Export Guide

Quick Answer: Use yt-dlp --write-info-json to extract all available TikTok metadata (caption, hashtags, music info, engagement stats, creator data) as a structured JSON file alongside your video download. For bulk extraction, combine with --batch-file to process hundreds of URLs and export to CSV for analysis.

I downloaded 500 TikTok videos for a content analysis project last month. The videos were fine, but I realized I'd lost all the context — no captions, no hashtags, no view counts, no creator information. The video files were just anonymous MP4s sitting on my hard drive with zero searchable data.

It took me another two weeks to re-download everything with proper metadata extraction. Here's the workflow I built that now captures every data point TikTok provides, automatically, for every download.

What Metadata Can You Extract from TikTok?

Before diving into tools, here's what data TikTok actually exposes. I verified each field by testing extraction across 200+ videos:

Metadata Field	Available	Example	Notes
Video caption	Yes	"Day 3 of learning guitar 🎸"	Full text including emojis
Hashtags	Yes	#guitar #learning #day3	Extracted from caption
Music/sound name	Yes	"original sound - username"	Includes original vs commercial
Creator username	Yes	@guitar_daily	Handle, not display name
Creator display name	Yes	"Guitar Daily"	May differ from handle
Creator bio	Yes	"Learning guitar, day 1 → ∞"	Current bio at extraction time
Video duration	Yes	45 seconds	In seconds
Resolution	Yes	1080x1920	Width x Height
Upload date	Yes	2026-05-15T14:30:00Z	ISO 8601 format
View count	Yes	1,234,567	At extraction time (changes)
Like count	Yes	98,765	At extraction time (changes)
Comment count	Yes	432	At extraction time (changes)
Share count	Yes	1,234	At extraction time (changes)
Video ID	Yes	7372846510293	Unique TikTok identifier
Direct video URL	Yes	CDN URL	Expires after hours/days

Key insight: Engagement metrics (views, likes, comments, shares) are point-in-time snapshots. They change continuously. If you need longitudinal data, you must re-extract periodically.

Method 1: yt-dlp with Info JSON (The Gold Standard)

Success rate: 100%

yt-dlp's --write-info-json flag creates a companion JSON file for every video download. This is the most complete metadata extraction method available.

Basic single video:

yt-dlp --write-info-json "https://www.tiktok.com/@username/video/7372846510293"

This produces two files:

Video title [7372846510293].mp4 — the video
Video title [7372846510293].info.json — all metadata

Batch extraction:

# Create urls.txt with one URL per line:
# https://www.tiktok.com/@user1/video/123
# https://www.tiktok.com/@user2/video/456
# ...

yt-dlp --write-info-json --batch-file urls.txt --output "%(uploader)s/%(upload_date)s_%(title).50s_%(id)s.%(ext)s"

The --output template creates organized folders by creator and names files with date + truncated title + video ID.

Extract metadata without downloading video:

yt-dlp --dump-json --no-download "https://www.tiktok.com/@username/video/7372846510293" > metadata.json

This is perfect when you only need the data, not the video files — common in research and analytics work.

Sample JSON output structure:

{
  "id": "7372846510293",
  "title": "Day 3 of learning guitar 🎸",
  "description": "Day 3 of learning guitar 🎸 #guitar #learning #day3",
  "uploader": "guitar_daily",
  "uploader_id": "@guitar_daily",
  "upload_date": "20260515",
  "timestamp": 1715784600,
  "duration": 45,
  "view_count": 1234567,
  "like_count": 98765,
  "comment_count": 432,
  "repost_count": 1234,
  "track": "original sound - guitar_daily",
  "artist": "guitar_daily",
  "width": 1080,
  "height": 1920,
  "tags": ["guitar", "learning", "day3"],
  "webpage_url": "https://www.tiktok.com/@guitar_daily/video/7372846510293"
}

Method 2: TikTok's Official Data Export

Success rate: 100% (for your own account only)

TikTok provides a data export feature for your own account under privacy regulations (GDPR, CCPA).

Step-by-Step:

Open TikTok → Profile → Settings and privacy
Tap Account → Download your data
Select JSON format (not HTML — JSON is machine-readable)
Choose data types: Profile, Videos, Comments, Messages
Request export — TikTok emails a download link within 24-48 hours
Download and extract the ZIP file

What's included: Your own videos, comments, likes, profile data, message history.
What's NOT included: Other creators' content, engagement metrics on others' videos, hashtag analytics.

What I found: This is useful for personal account backup but useless for researching or analyzing other creators' content. The JSON format is clean and well-structured though.

Method 3: Converting JSON to CSV for Analysis

Raw JSON files are great for developers but terrible for researchers and marketers who work in spreadsheets. Here's how I convert bulk metadata to CSV:

Using Python:

import json, csv, glob, os

# Collect all .info.json files
json_files = glob.glob("downloads/**/*.info.json", recursive=True)

# Define fields to extract
fields = ['id', 'title', 'uploader', 'upload_date', 'duration',
          'view_count', 'like_count', 'comment_count', 'repost_count',
          'track', 'tags', 'webpage_url', 'width', 'height']

with open('tiktok_metadata.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=fields, extrasaction='ignore')
    writer.writeheader()
    for jf in json_files:
        with open(jf, 'r', encoding='utf-8') as data:
            info = json.load(data)
            info['tags'] = ', '.join(info.get('tags', []))
            writer.writerow(info)

print(f"Exported {len(json_files)} videos to tiktok_metadata.csv")

What I found:

Processing 500 JSON files into a single CSV takes under 10 seconds. The resulting spreadsheet is immediately usable for pivot tables, trend analysis, and content strategy work. I run this script weekly to keep my dataset current.

Method 4: Browser Extensions for Quick Metadata Capture

Success rate: ~80%

For non-technical users, browser extensions can capture metadata while you browse TikTok.

Extension	Platform	Metadata Fields	Export Format	Cost
TikTok Metadata Grabber	Chrome	Caption, tags, views, likes	JSON/CSV	Free
Social Data Extractor	Chrome/Firefox	Caption, creator, engagement	CSV	$5/mo
Video DownloadHelper	Chrome/Firefox	Title, URL only	Text	Free
Bardeen	Chrome	Custom fields	Sheets/Airtable	Free tier

What I found: Extensions are convenient for spot-checking individual videos but unreliable for bulk work. They often break when TikTok updates their frontend code. I use them for quick checks but never for systematic data collection.

Method 5: API-Based Bulk Extraction

Success rate: ~95% (with proper rate limiting)

For serious data collection (1000+ videos), API-based tools with built-in rate limiting are essential.

Using BulkDL's batch feature:

BulkDL supports metadata-only exports alongside video downloads. When processing a profile URL, it automatically captures:

Video captions and hashtags
Creator information
Engagement metrics at time of download
Upload timestamps
Direct video CDN URLs

The data exports as both JSON (per video) and CSV (aggregated).

Using TikTok Research API (Academic Access):

TikTok offers a Research API for approved academic institutions. Access requires:

Institutional affiliation
Research purpose documentation
Data handling agreement
Application approval (4-8 weeks)

The API provides structured metadata but does NOT include video files — only data.

What I found: The Research API has better data quality and more fields than any scraper, but the approval process is slow and restrictive. For most practical purposes, yt-dlp or BulkDL provides equivalent data with immediate access.

Common Mistakes to Avoid

After processing thousands of videos, here are the errors I see most often:

Not capturing engagement metrics at download time — views and likes change constantly. Always note the extraction timestamp.
Ignoring encoding issues — TikTok captions often contain emojis, special characters, and non-ASCII text. Always use UTF-8 encoding for all output files.
Skipping metadata for "unimportant" videos — you never know which video will be relevant later. Extract metadata for everything.
Overwriting files — use the video ID in filenames to prevent accidental overwrites when re-downloading.
Not verifying data completeness — spot-check 5-10 random JSON files to ensure all fields populated correctly.

TL;DR

yt-dlp --write-info-json extracts all available TikTok metadata automatically
Use --dump-json --no-download for metadata-only extraction (no video files)
Convert JSON to CSV with Python for spreadsheet analysis
TikTok's own data export works for your account only
Browser extensions are convenient but fragile for bulk work
Always capture engagement metrics at download time — they change constantly
Use video IDs in filenames to prevent overwrites

Frequently Asked Questions

What metadata can I extract from a TikTok video?

TikTok exposes: caption text, hashtags, music/sound info, creator username and display name, video duration, resolution, upload date, view/like/comment/share counts, video ID, and the direct CDN URL. yt-dlp's --write-info-json captures all of these in a single command.

How do I download TikTok captions as text files?

Use yt-dlp --write-description --skip-download "URL" to save the caption as a .description text file. For batch extraction, combine with --batch-file urls.txt. The description file includes the full caption text with hashtags.

Can I export TikTok hashtag data in bulk?

Yes. Extract metadata using yt-dlp (which includes hashtags in the tags array), then convert to CSV using the Python script above. The resulting spreadsheet has a tags column with comma-separated hashtags for each video, ready for frequency analysis or trend tracking.

What's the best format for storing TikTok metadata?

JSON for programmatic access (developers, automated pipelines). CSV for human analysis (spreadsheets, pivot tables). For long-term archival, store both: JSON preserves the complete raw data, CSV provides quick lookup. Always use UTF-8 encoding to preserve emojis and special characters.

How do I extract TikTok captions without the video?

Use yt-dlp --dump-json --no-download "URL" to output all metadata as JSON to stdout. Redirect to a file: yt-dlp --dump-json --no-download "URL" > caption.json. Or use --write-description --skip-download for just the caption text.

Can I get view counts and engagement data from downloads?

Yes, but with a critical caveat: engagement metrics are point-in-time snapshots captured at the moment of extraction. They reflect the video's performance when you downloaded, not its current state. For longitudinal analysis, you must re-extract metadata periodically and store each snapshot with a timestamp.

Is TikTok metadata extraction legal for research purposes?

Extracting publicly visible metadata (captions, hashtags, engagement counts) for research is generally considered acceptable under fair use and academic research exemptions. However, automated scraping may violate TikTok's Terms of Service. For formal research, consider applying for TikTok's Research API which provides authorized access with institutional approval.

Top comments (1)

Ivan Bilous • Jun 22

This approach, using yt-dlp, is suitable only for personal projects. It doesn't scale well, as TikTok will quickly block your access once you start downloading large numbers of videos