DEV Community

Muhammad Zaid
Muhammad Zaid

Posted on

I Built an Open-Source YouTube Scraper for Python, No API Key Needed

Every time I needed YouTube data for a project (search results, channel videos, transcripts) I ended up installing 3 different libraries, none of which played well together or supported async.

So I built tubescrape: a single Python package that handles search, channels, transcripts, and playlists by talking directly to YouTube's internal InnerTube API.

No API key. No OAuth. No quotas. Just pip install tubescrape.

What It Does

from tubescrape import YouTube

with YouTube() as yt:
    # Search with filters
    results = yt.search('python tutorial', max_results=5, type='video', duration='long')

    # Browse a channel
    videos = yt.get_channel_videos('@lexfridman', max_results=10)
    shorts = yt.get_channel_shorts('@mkbhd')

    # Get transcript and save as subtitles
    transcript = yt.get_transcript('dQw4w9WgXcQ')
    transcript.save('subtitles.srt')

    # Translate transcript
    transcript = yt.get_transcript('dQw4w9WgXcQ', translate_to='es')

    # Scrape playlist
    playlist = yt.get_playlist('PLrAXtmErZgOeiKm4sgNOknGvNjby9efdf')
Enter fullscreen mode Exit fullscreen mode

Every result is a frozen dataclass with .to_dict() for instant JSON serialization.

Three Interfaces

1. Python SDK

Import the library, call methods, get structured data back. Every result object serializes to a dictionary with .to_dict(), so you can dump it to JSON, store it in a database, or pass it to an API.

import json
from tubescrape import YouTube

yt = YouTube()
search = yt.search('python programming', max_results=10)
print(json.dumps(search.to_dict(), indent = 4))
Enter fullscreen mode Exit fullscreen mode

2. CLI

pip install "tubescrape[cli]"
tubescrape search "python" -n 5
tubescrape transcript dQw4w9WgXcQ --format srt --save output.srt
Enter fullscreen mode Exit fullscreen mode

3. REST API

pip install "tubescrape[api]"
tubescrape serve --port 8000
# Swagger docs at http://localhost:8000/docs
Enter fullscreen mode Exit fullscreen mode

Async Support

Every method has an async variant. You can use it in FastAPI, Discord bots, or any async application. Run multiple requests concurrently with asyncio.gather:

import asyncio
from tubescrape import YouTube

async def main():
    async with YouTube() as yt:
        r1, r2, r3 = await asyncio.gather(
            yt.asearch('python'),
            yt.asearch('javascript'),
            yt.asearch('rust'),
        )

asyncio.run(main())
Enter fullscreen mode Exit fullscreen mode

How It Works

tubescrape uses YouTube's InnerTube API. The same internal API that the YouTube website and mobile apps use. No HTML scraping, no Selenium, no headless browsers. Just structured HTTP requests and JSON responses.

This approach is more reliable than HTML scraping because the API response format is more stable than the DOM.

The Stack

The only core dependency is httpx (an async-capable HTTP client). The CLI adds click and rich as optional dependencies, and the REST API adds FastAPI and uvicorn. The package is built with hatchling, has 146 tests covering unit, integration, and parser logic, and CI runs on GitHub Actions across Ubuntu, Windows, and macOS with Python 3.10 through 3.13.

Links

GitHub: github.com/zaidkx37/tubescrape
PyPI: pypi.org/project/tubescrape
Docs: Full Usage Guide

MIT licensed. Feedback and contributions welcome.

If it's useful to you, a star on GitHub would mean a lot.

Top comments (0)