Hey π β sharing something I've been working on lately, in case it's useful to anyone doing similar stuff.
Quick context: I use Claude a lot for digging through YouTube β conference talks, tutorials, the occasional 2-hour podcast. And the workflow was always the same annoying dance: open the video, click "Show transcript," copy the wall of text, paste it into Claude, ask my question. Every. Single. Time. And you lose the timestamps, so when Claude says "they mention X around the middle," you can't easily jump back and check.
At some point it bugged me enough that I built a little thing to fix it. It's an MCP server β basically a small program that hands Claude (or Cursor / Windsurf) a set of YouTube tools so it can justβ¦ do the fetching itself. I called it Scribefy. (Heads up: it's my own project, so take the plug with the appropriate grain of salt β but the how-it-works bits apply to any YouTube MCP server.)
What it actually does
Once it's plugged in, the assistant gets four tools:
-
search_videosβ search YouTube -
get_video_metadataβ title, channel, length, whether it even has captions -
get_related_videosβ the "up next" list -
extract_transcriptβ the full transcript, with timestamps
So now I just paste a URL and say "summarize this with timestamps," or even "find me 3 videos on X and tell me where they disagree" β and it figures out the rest. No copy-paste.
Who it's actually handy for
It started as a "scratch my own itch" thing, but the use cases turned out way broader than I expected. A few that keep coming up:
- Students β drop in a 90-minute lecture and get the 10 things you actually need for the exam, with timestamps to the parts worth rewatching. Beats scrubbing the progress bar hunting for that one slide you half-remember.
- Researchers β treat a stack of talks/interviews like a mini literature review: "what do these five videos agree and disagree on?" β and get quotes you can cite back to the exact second.
- Content creators β see what's already been said on a topic before you film, or turn one of your own videos into show notes, a blog post, and a thread in one pass. (The repurpose-one-video-into-five-formats move.)
- Beginner traders β the forex/trading corner of YouTube is endless hour-long analysis videos. Instead of rewatching, you can pull out the actual setup, rules, and levels a video lays out and ask follow-up questions. (Not financial advice, obviously β just a much faster way to digest the educational stuff.)
Different people, same core trick: let the AI read the video so you don't have to sit through all of it.
If you want to poke at it
It's an npx package, same config in Claude Desktop, Cursor, and Windsurf:
{
"mcpServers": {
"scribefy": {
"command": "npx",
"args": ["-y", "scribefy-mcp"],
"env": { "SCRIBEFY_API_KEY": "sk_live_β¦" }
}
}
}
(Claude Desktop β Settings β Developer β Edit Config. Cursor β ~/.cursor/mcp.json. Windsurf β ~/.codeium/windsurf/mcp_config.json, then Refresh.) Restart and the tools show up. There's a key + a couple of free credits to mess around with.
The part I actually think is neat
It's not really about one transcript. It's that the assistant can research a whole topic on its own β search for candidates, glance at the metadata to throw out the 3-hour ones, then only pull transcripts for the handful worth reading. Search and metadata are free, so poking around a topic stays cheap and you only "spend" on the videos you actually use.
And the timestamps matter more than I expected β Claude can go "at 12:40 they argueβ¦" and I can click straight there to check, instead of trusting a vibe-summary of a 40-minute video.
The honest bits (because I hate posts that skip these)
- It's YouTube-only, and the video needs captions (auto-generated counts). It does not do its own speech-to-text, so a caption-less video is a no-go.
- There are free open-source YouTube MCP servers too. If you're cost-sensitive and don't mind the occasional breakage when YouTube changes something under the hood, those are a totally legit route. Mine's a hosted/paid thing that trades money for reliability (runs through a residential proxy so it doesn't get bot-blocked) plus the extra search/metadata tools.
So β not magic, not for everyone. But if YouTube is part of how you learn or work, having the AI do the watching-and-pulling has been a genuinely nice upgrade.
Where I'm at
It's pretty fresh β just got it listed in the official MCP registry and I'm slowly telling people. Honestly I'm still figuring out what folks actually want it to do, so if you try it (or you've built something similar), I'd love to hear what tools you'd want your agent to have for YouTube.
- Repo: https://github.com/MKirovBG/scribefy-mcp
- Setup notes for each assistant (incl. a ChatGPT route): https://scribefy.app/guides/youtube-transcripts-for-ai-assistants
Anyway β that's the thing. Back to building. π οΈ
Top comments (1)
I would love to get you feedback and comments !! ππ―