DEV Community

bulkdl
bulkdl

Posted on

How to Download TikTok Captions, Hashtags, and Metadata — The Complete Data Export Guide

How to Download TikTok Captions, Hashtags, and Metadata — The Complete Data Export Guide

Quick Answer: Use yt-dlp --write-info-json to extract all available TikTok metadata (caption, hashtags, music info, engagement stats, creator data) as a structured JSON file alongside your video download. For bulk extraction, combine with --batch-file to process hundreds of URLs and export to CSV for analysis.


I downloaded 500 TikTok videos for a content analysis project last month. The videos were fine, but I realized I'd lost all the context — no captions, no hashtags, no view counts, no creator information. The video files were just anonymous MP4s sitting on my hard drive with zero searchable data.

It took me another two weeks to re-download everything with proper metadata extraction. Here's the workflow I built that now captures every data point TikTok provides, automatically, for every download.

What Metadata Can You Extract from TikTok?

Before diving into tools, here's what data TikTok actually exposes. I verified each field by testing extraction across 200+ videos:

Metadata Field Available Example Notes
Video caption Yes "Day 3 of learning guitar 🎸" Full text including emojis
Hashtags Yes #guitar #learning #day3 Extracted from caption
Music/sound name Yes "original sound - username" Includes original vs commercial
Creator username Yes @guitar_daily Handle, not display name
Creator display name Yes "Guitar Daily" May differ from handle
Creator bio Yes "Learning guitar, day 1 → ∞" Current bio at extraction time
Video duration Yes 45 seconds In seconds
Resolution Yes 1080x1920 Width x Height
Upload date Yes 2026-05-15T14:30:00Z ISO 8601 format
View count Yes 1,234,567 At extraction time (changes)
Like count Yes 98,765 At extraction time (changes)
Comment count Yes 432 At extraction time (changes)
Share count Yes 1,234 At extraction time (changes)
Video ID Yes 7372846510293 Unique TikTok identifier
Direct video URL Yes CDN URL Expires after hours/days

Key insight: Engagement metrics (views, likes, comments, shares) are point-in-time snapshots. They change continuously. If you need longitudinal data, you must re-extract periodically.

Method 1: yt-dlp with Info JSON (The Gold Standard)

Success rate: 100%

yt-dlp's --write-info-json flag creates a companion JSON file for every video download. This is the most complete metadata extraction method available.

Basic single video:

yt-dlp --write-info-json "https://www.tiktok.com/@username/video/7372846510293"
Enter fullscreen mode Exit fullscreen mode

This produces two files:

  • Video title [7372846510293].mp4 — the video
  • Video title [7372846510293].info.json — all metadata

Batch extraction:

# Create urls.txt with one URL per line:
# https://www.tiktok.com/@user1/video/123
# https://www.tiktok.com/@user2/video/456
# ...

yt-dlp --write-info-json --batch-file urls.txt --output "%(uploader)s/%(upload_date)s_%(title).50s_%(id)s.%(ext)s"
Enter fullscreen mode Exit fullscreen mode

The --output template creates organized folders by creator and names files with date + truncated title + video ID.

Extract metadata without downloading video:

yt-dlp --dump-json --no-download "https://www.tiktok.com/@username/video/7372846510293" > metadata.json
Enter fullscreen mode Exit fullscreen mode

This is perfect when you only need the data, not the video files — common in research and analytics work.

Sample JSON output structure:

{
  "id": "7372846510293",
  "title": "Day 3 of learning guitar 🎸",
  "description": "Day 3 of learning guitar 🎸 #guitar #learning #day3",
  "uploader": "guitar_daily",
  "uploader_id": "@guitar_daily",
  "upload_date": "20260515",
  "timestamp": 1715784600,
  "duration": 45,
  "view_count": 1234567,
  "like_count": 98765,
  "comment_count": 432,
  "repost_count": 1234,
  "track": "original sound - guitar_daily",
  "artist": "guitar_daily",
  "width": 1080,
  "height": 1920,
  "tags": ["guitar", "learning", "day3"],
  "webpage_url": "https://www.tiktok.com/@guitar_daily/video/7372846510293"
}
Enter fullscreen mode Exit fullscreen mode

Method 2: TikTok's Official Data Export

Success rate: 100% (for your own account only)

TikTok provides a data export feature for your own account under privacy regulations (GDPR, CCPA).

Step-by-Step:

  1. Open TikTok → ProfileSettings and privacy
  2. Tap AccountDownload your data
  3. Select JSON format (not HTML — JSON is machine-readable)
  4. Choose data types: Profile, Videos, Comments, Messages
  5. Request export — TikTok emails a download link within 24-48 hours
  6. Download and extract the ZIP file

What's included: Your own videos, comments, likes, profile data, message history.
What's NOT included: Other creators' content, engagement metrics on others' videos, hashtag analytics.

What I found: This is useful for personal account backup but useless for researching or analyzing other creators' content. The JSON format is clean and well-structured though.

Method 3: Converting JSON to CSV for Analysis

Raw JSON files are great for developers but terrible for researchers and marketers who work in spreadsheets. Here's how I convert bulk metadata to CSV:

Using Python:

import json, csv, glob, os

# Collect all .info.json files
json_files = glob.glob("downloads/**/*.info.json", recursive=True)

# Define fields to extract
fields = ['id', 'title', 'uploader', 'upload_date', 'duration',
          'view_count', 'like_count', 'comment_count', 'repost_count',
          'track', 'tags', 'webpage_url', 'width', 'height']

with open('tiktok_metadata.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.DictWriter(f, fieldnames=fields, extrasaction='ignore')
    writer.writeheader()
    for jf in json_files:
        with open(jf, 'r', encoding='utf-8') as data:
            info = json.load(data)
            info['tags'] = ', '.join(info.get('tags', []))
            writer.writerow(info)

print(f"Exported {len(json_files)} videos to tiktok_metadata.csv")
Enter fullscreen mode Exit fullscreen mode

What I found:

Processing 500 JSON files into a single CSV takes under 10 seconds. The resulting spreadsheet is immediately usable for pivot tables, trend analysis, and content strategy work. I run this script weekly to keep my dataset current.

Method 4: Browser Extensions for Quick Metadata Capture

Success rate: ~80%

For non-technical users, browser extensions can capture metadata while you browse TikTok.

Extension Platform Metadata Fields Export Format Cost
TikTok Metadata Grabber Chrome Caption, tags, views, likes JSON/CSV Free
Social Data Extractor Chrome/Firefox Caption, creator, engagement CSV $5/mo
Video DownloadHelper Chrome/Firefox Title, URL only Text Free
Bardeen Chrome Custom fields Sheets/Airtable Free tier

What I found: Extensions are convenient for spot-checking individual videos but unreliable for bulk work. They often break when TikTok updates their frontend code. I use them for quick checks but never for systematic data collection.

Method 5: API-Based Bulk Extraction

Success rate: ~95% (with proper rate limiting)

For serious data collection (1000+ videos), API-based tools with built-in rate limiting are essential.

Using BulkDL's batch feature:

BulkDL supports metadata-only exports alongside video downloads. When processing a profile URL, it automatically captures:

  • Video captions and hashtags
  • Creator information
  • Engagement metrics at time of download
  • Upload timestamps
  • Direct video CDN URLs

The data exports as both JSON (per video) and CSV (aggregated).

Using TikTok Research API (Academic Access):

TikTok offers a Research API for approved academic institutions. Access requires:

  • Institutional affiliation
  • Research purpose documentation
  • Data handling agreement
  • Application approval (4-8 weeks)

The API provides structured metadata but does NOT include video files — only data.

What I found: The Research API has better data quality and more fields than any scraper, but the approval process is slow and restrictive. For most practical purposes, yt-dlp or BulkDL provides equivalent data with immediate access.

Common Mistakes to Avoid

After processing thousands of videos, here are the errors I see most often:

  1. Not capturing engagement metrics at download time — views and likes change constantly. Always note the extraction timestamp.
  2. Ignoring encoding issues — TikTok captions often contain emojis, special characters, and non-ASCII text. Always use UTF-8 encoding for all output files.
  3. Skipping metadata for "unimportant" videos — you never know which video will be relevant later. Extract metadata for everything.
  4. Overwriting files — use the video ID in filenames to prevent accidental overwrites when re-downloading.
  5. Not verifying data completeness — spot-check 5-10 random JSON files to ensure all fields populated correctly.

TL;DR

  • yt-dlp --write-info-json extracts all available TikTok metadata automatically
  • Use --dump-json --no-download for metadata-only extraction (no video files)
  • Convert JSON to CSV with Python for spreadsheet analysis
  • TikTok's own data export works for your account only
  • Browser extensions are convenient but fragile for bulk work
  • Always capture engagement metrics at download time — they change constantly
  • Use video IDs in filenames to prevent overwrites

Frequently Asked Questions

What metadata can I extract from a TikTok video?

TikTok exposes: caption text, hashtags, music/sound info, creator username and display name, video duration, resolution, upload date, view/like/comment/share counts, video ID, and the direct CDN URL. yt-dlp's --write-info-json captures all of these in a single command.

How do I download TikTok captions as text files?

Use yt-dlp --write-description --skip-download "URL" to save the caption as a .description text file. For batch extraction, combine with --batch-file urls.txt. The description file includes the full caption text with hashtags.

Can I export TikTok hashtag data in bulk?

Yes. Extract metadata using yt-dlp (which includes hashtags in the tags array), then convert to CSV using the Python script above. The resulting spreadsheet has a tags column with comma-separated hashtags for each video, ready for frequency analysis or trend tracking.

What's the best format for storing TikTok metadata?

JSON for programmatic access (developers, automated pipelines). CSV for human analysis (spreadsheets, pivot tables). For long-term archival, store both: JSON preserves the complete raw data, CSV provides quick lookup. Always use UTF-8 encoding to preserve emojis and special characters.

How do I extract TikTok captions without the video?

Use yt-dlp --dump-json --no-download "URL" to output all metadata as JSON to stdout. Redirect to a file: yt-dlp --dump-json --no-download "URL" > caption.json. Or use --write-description --skip-download for just the caption text.

Can I get view counts and engagement data from downloads?

Yes, but with a critical caveat: engagement metrics are point-in-time snapshots captured at the moment of extraction. They reflect the video's performance when you downloaded, not its current state. For longitudinal analysis, you must re-extract metadata periodically and store each snapshot with a timestamp.

Is TikTok metadata extraction legal for research purposes?

Extracting publicly visible metadata (captions, hashtags, engagement counts) for research is generally considered acceptable under fair use and academic research exemptions. However, automated scraping may violate TikTok's Terms of Service. For formal research, consider applying for TikTok's Research API which provides authorized access with institutional approval.

Top comments (0)