TL;DR — SubDownload turns any YouTube video into searchable text: transcripts, AI summaries, a personal knowledge base, plus an MCP server, REST API, and Agent Skills so Claude / Cursor / ChatGPT / 40+ MCP clients can read YouTube directly. Try it — free, no signup needed for the basic tool.
The shape problem with YouTube
I run my dev work alongside Claude, Cursor, and a handful of agent loops. Every input source I rely on — code, docs, papers, RSS, my own notes — pipes into an agent. Except YouTube.
I watch a lot of long-form content for learning: founder interviews, conference talks, infra deep-dives, podcasts. The best stuff often lives in 90-minute videos. And every minute of that content is locked behind playback. My agent can't see any of it.
The standard advice is "watch less, be selective." Wrong frame. I want to consume more of this content — just not by sitting through every minute in real time while I can only do that one thing.
The actual problem isn't quantity. It's that videos exist in only one shape: playback. They're not searchable, not summarizable on demand, not addressable by an agent.
What SubDownload does
It changes the shape:
- Transcripts for any YouTube video, including ones without official captions (AI ASR fallback covers Loom recordings, conference talks, niche uploads — the half my workflow used to lose)
- AI summary in seconds — read the gist of a 90-minute talk in 60 seconds, decide whether it gets a real watch
- Personal knowledge base — every video you process lands in a searchable library you own, with tags, favorites, cross-video query
-
MCP server at
api.subdownload.com/mcp— plug into Claude Desktop, Cursor, ChatGPT, or any of the 40+ MCP-capable clients - REST API + full OpenAPI 3.1 if you want to roll your own scripts
-
Agent Skills:
npx @subdown/skill@latestfor one-command install
Why MCP first
I shipped MCP before a Chrome extension or a polished SaaS UI because it's the cleanest way for an agent to read videos on demand. The agent says "search this channel, fetch this transcript, summarize this part" — done. No copy-paste, no wrapper UI per host, no "open YouTube in a tab and switch back."
OAuth 2.1 with Dynamic Client Registration (RFC 7591) is supported, so adding the server to Claude Desktop is one config block plus a one-time browser auth — then it stays connected.
What this looks like in practice
A few uses I lean on weekly:
- Find a half-remembered phrase from an interview I watched last month. I ask Claude in 30 seconds and it lands me on the exact clip.
- Triage a 90-minute talk. Claude reads the transcript, gives me a timestamped outline, I decide which 10 minutes to actually watch.
- Cross-video synthesis. "What do these five founder interviews say about pricing?" — agent reads multiple transcripts, returns a comparison.
- Caption-less videos. Loom screencasts, brand-new uploads, niche channels — AI transcription fills the gap automatically.
Try it
Free to try, no signup needed for the basic tool: subdownload.com
For MCP with Claude Desktop / Cursor / ChatGPT, sign in once for an API key and follow your client's MCP setup docs. (Intentionally not pasting a JSON config — MCP client formats keep evolving and any snippet I share will go stale within weeks.)
What I want to hear
- If you're already running an agent loop, what video → text gap have you hit?
- Anyone running a homemade MCP server for a video host I haven't covered? Always curious about edge cases.
- If something breaks, ping me — I'm the only one shipping this.
Top comments (0)