I spend a lot of time researching topics before writing about them. YouTube is one of my best sources — experts share detailed knowledge in videos that often isn't available in written form.
The problem: watching videos is slow. I read 3x faster than people speak. So I built a Python script that automates my entire research workflow.
The Goal
Given a topic, I want to:
- Find the top YouTube videos about it
- Extract their transcripts
- Generate a research summary combining all sources
- Output key points, quotes, and an article outline
Prerequisites
pip install youtube-transcript-api google-api-python-client openai
You'll need:
- A YouTube Data API key (free from Google Cloud Console)
- An OpenAI API key
Step 1: Find Relevant Videos
from googleapiclient.discovery import build
def search_youtube(query, max_results=5):
youtube = build('youtube', 'v3', developerKey=API_KEY)
request = youtube.search().list(
part='snippet',
q=query,
type='video',
maxResults=max_results,
order='relevance',
videoDuration='medium', # 4-20 minutes
relevanceLanguage='en'
)
response = request.execute()
videos = []
for item in response['items']:
videos.append({
'id': item['id']['videoId'],
'title': item['snippet']['title'],
'channel': item['snippet']['channelTitle'],
})
return videos
Step 2: Extract Transcripts
from youtube_transcript_api import YouTubeTranscriptApi
def get_transcript(video_id):
try:
transcript = YouTubeTranscriptApi.get_transcript(video_id)
return ' '.join([entry['text'] for entry in transcript])
except Exception as e:
print(f"Could not get transcript for {video_id}: {e}")
return None
For production use, I'd recommend ScripTube (scriptube.me) which handles edge cases and formatting automatically. But for a personal script, the library works.
Step 3: AI-Powered Research Summary
from openai import OpenAI
client = OpenAI()
def generate_research_summary(topic, transcripts):
sources = ""
for t in transcripts:
sources += f"\n\n--- Source: {t['title']} by {t['channel']} ---\n"
sources += t['text'][:3000] # Truncate for token limits
response = client.chat.completions.create(
model="gpt-4",
messages=[{
"role": "system",
"content": "You are a research assistant. Synthesize information from multiple sources."
}, {
"role": "user",
"content": f"""Topic: {topic}
Here are transcripts from {len(transcripts)} YouTube videos on this topic:
{sources}
Please provide:
1. A comprehensive summary of the key points across all sources
2. Where the sources agree
3. Where they disagree or offer different perspectives
4. 5-8 notable quotes (with attribution)
5. A suggested article outline for a blog post on this topic
6. Questions that weren't answered by any source"""
}]
)
return response.choices[0].message.content
Step 4: Putting It Together
def research_topic(topic):
print(f"Researching: {topic}")
print("=" * 50)
# Find videos
print("Searching YouTube...")
videos = search_youtube(topic, max_results=5)
print(f"Found {len(videos)} videos")
# Extract transcripts
print("Extracting transcripts...")
transcripts = []
for video in videos:
text = get_transcript(video['id'])
if text:
transcripts.append({
'title': video['title'],
'channel': video['channel'],
'text': text
})
print(f" ✓ {video['title']}")
else:
print(f" ✗ {video['title']} (no transcript)")
# Generate summary
print("\nGenerating research summary...")
summary = generate_research_summary(topic, transcripts)
# Save output
filename = f"research_{topic.replace(' ', '_')}.md"
with open(filename, 'w') as f:
f.write(f"# Research: {topic}\n\n")
f.write(f"## Sources\n")
for t in transcripts:
f.write(f"- {t['title']} by {t['channel']}\n")
f.write(f"\n## Summary\n\n{summary}")
print(f"\nSaved to {filename}")
return summary
# Run it
research_topic("content repurposing strategies")
Results
Running this on "content repurposing strategies" gave me:
- 5 video transcripts totaling ~25,000 words of expert knowledge
- A synthesized summary highlighting where experts agree and disagree
- 7 quotable insights with attribution
- A complete article outline
Total time: about 2 minutes (mostly API calls).
Writing the blog post from this research: about 45 minutes.
Extending the Script
Ideas for improvement:
- Add caching to avoid re-extracting transcripts you've already processed
- Filter videos by minimum view count for quality
- Export to Notion or Google Docs instead of markdown
- Build a simple web UI with Streamlit
The core insight: YouTube contains more expert knowledge than any library, and it's all accessible via transcripts. Automating the extraction turns YouTube from a time sink into a research superpower.
Top comments (0)