Solve Computer Science

Posted on Jun 19, 2025

YouTube channel mirror on Jekyll - part 2

#youtube #python #ai #chatgpt

🧩 The problem

Continuing the previous post we'll now transform the shell script into a Python one. We want an integration with Python to pull feeds from YouTube directly, so we must, first of all, transform the original instructions into Python API calls. This will enable yt-dlp to directly download the necessary videos avoiding the whole channel every time, which might result in IP throttling or temporary bans.

yt-dlp is also capable of working with text files in two ways:

to know which files have already been downloaded by saving their ids
to store the options instead of manually passing them via the shell or API

We're not going to use these two features today.

⚠️ Warning

⚠️⚠️ Before continuing, please only mirror content you have permission to... ⚠️⚠️

✅ The solution

🤖 Vibe coding

I'm lazy so I asked GPT o4-mini (via Duck.ai) how to do it. Here I redacted some output so it's more straight forward to read.

original shell command

yt-dlp "${URL}" \
    --verbose \
    --fixup detect_or_warn \
    --prefer-ffmpeg \
    --sub-langs "en,it" \
    --write-subs \
    --embed-subs \
    --write-auto-sub \
    --prefer-free-formats \
    --no-call-home \
    --no-overwrites \
    --recode webm \
    --add-metadata \
    --write-thumbnail \
    --convert-thumbnails png \
    --exec "echo \"%(title)s\" > ${DST_DIR}/%(id)s/title.txt" \
    --exec "echo \"%(description)s\" > ${DST_DIR}/%(id)s/description.txt" \
    --output "${DST_DIR}/%(id)s/%(id)s.%(ext)s"

Starting from this command, and setting URL='my youtube channel url', please transform the shell command to the Python calls

o4-mini

Here’s how you can do the same download in Python using yt-dlp’s API. You define essentially the same options in a dict, register a small progress hook to emit your “exec” behavior, and then call .download().

import os
from yt_dlp import YoutubeDL

# ========== configuration ==========

URL     = 'https://www.youtube.com/c/YourChannelNameHere'
DST_DIR = '/path/to/your/download/folder'

# Make sure the DST_DIR exists
os.makedirs(DST_DIR, exist_ok=True)

# ========== progress hook to write title & description ==========
def write_meta_hook(d):
    """
    This hook is called multiple times; we care about when download
    of a single video is finished so we can write title/description.
    """
    if d['status'] == 'finished':
        info = d.get('info_dict', {})
        video_id    = info.get('id')
        title       = info.get('title', '')
        description = info.get('description', '')

        # Create a per-video subfolder, mirror --output structure
        vid_dir = os.path.join(DST_DIR, video_id)
        os.makedirs(vid_dir, exist_ok=True)

        # write out title.txt and description.txt
        with open(os.path.join(vid_dir, 'title.txt'), 'w', encoding='utf-8') as f:
            f.write(title)
        with open(os.path.join(vid_dir, 'description.txt'), 'w', encoding='utf-8') as f:
            f.write(description)

# ========== yt-dlp options ==========
ydl_opts = {
    # general
    'verbose':             True,
    'no_overwrites':       True,
    'nocallhome':          True,
    'add_metadata':        True,
    'fixup':               'detect_or_warn',
    'prefer_ffmpeg':       True,
    'recode_video':        'webm',

    # subtitles
    'sub_langs':           ['en', 'it'],
    'writesubtitles':      True,   # --write-subs
    'writeautomaticsub':   True,   # --write-auto-sub
    'embedsubtitles':      True,

    # formats
    'prefer_free_formats': True,

    # thumbnails
    'writethumbnail':      True,
    'convert_thumbnails':  'png',

    # output template
    'outtmpl': os.path.join(DST_DIR, '%(id)s', '%(id)s.%(ext)s'),

    # post hooks
    'progress_hooks': [write_meta_hook],
}

# ========== run the download ==========
with YoutubeDL(ydl_opts) as ydl:
    ydl.download([URL])

🔎 Clues

I then go and double check if all the options in the generated code actually exist. Surprise! They do not.

By opening the yt-dlp source code, specifically the YoutubeDL class, you'll see all the supported API options.

🆘 Trying to help the bot help me

Me

i'll tell you the missing options from the yt-dlp code. you need to fix them:

'recode_video', 'embedsubtitles', 'convert_thumbnails',

o4-mini

The three fixed option names are:

recode_video: your --recode webm remux into WebM

embed_subtitles: your --embed-subs

convert_thumbnails: your --convert-thumbnails png

Turns out these fixes are wrong.

😌 Script that saved the day

There's one interesting utility in the yt-dlp repository called cli_to_api.py which is able to translate shell options into Python ones. One of the maintainers suggests to use that script for this use case. Here's how to run it:

clone the repository

  git clone https://github.com/yt-dlp/yt-dlp

install locally

  cd yt-dlp
  python3 -m venv .venv
  . .venv/bin/activate
  pip install .

run the script

  cd devscripts
  python -m cli_to_api "--verbose"

This is the output for the --verbose option

The arguments passed translate to:

[debug] Override config: ['--verbose']
{'verbose': True}

Combining these with the CLI defaults gives:

[debug] Override config: ['--verbose']
{'extract_flat': 'discard_in_playlist',
 'fragment_retries': 10,
 'ignoreerrors': 'only_download',
 'postprocessors': [{'key': 'FFmpegConcat',
                     'only_multi_video': True,
                     'when': 'playlist'}],
 'retries': 10,
 'verbose': True}

You can of course pass multiple options, each one between quotes:

python -m cli_to_api "--output" "DST_DIR/%(id)s/%(id)s.%(ext)s"

Which results in:

The arguments passed translate to:

{'outtmpl': {'default': 'DST_DIR/%(id)s/%(id)s.%(ext)s'}}

Combining these with the CLI defaults gives:

{'extract_flat': 'discard_in_playlist',
 'fragment_retries': 10,
 'ignoreerrors': 'only_download',
 'outtmpl': {'default': 'DST_DIR/%(id)s/%(id)s.%(ext)s'},
 'postprocessors': [{'key': 'FFmpegConcat',
                     'only_multi_video': True,
                     'when': 'playlist'}],
 'retries': 10}

🎯 Result

The write_meta_hook provided by GPT seems correct enough. However, to keep things simple I decided to translate the original shell exec options verbatim. I also like pathlib more than os to manage paths.

Here's the complete result:

import pathlib
import sys
from yt_dlp import YoutubeDL

DST_DIR = '/srv/http/videos'

ydl_opts: dict = {
    'verbose':             True,
    'no_overwrites':       True,
    'call_home':           False,
    'add_metadata':        True,
    'fixup':               'detect_or_warn',
    'prefer_ffmpeg':       True,
    'subtitleslangs':      ['en', 'it'],
    'writesubtitles':      True,
    'writeautomaticsub':   True,
    'prefer_free_formats': True,
    'writethumbnail':      True,
    'final_ext':           'webm',
    'outtmpl': {
        'default': str(pathlib.Path(DST_DIR,'%(id)s','%(id)s.%(ext)s'))
    },
    'postprocessors': [
        {
            # --convert-thumbnails png
            'format': 'png',
            'key': 'FFmpegThumbnailsConvertor',
            'when': 'before_dl'
        },
        {
            # --recode webm
        'key': 'FFmpegVideoConvertor',
            'preferedformat': 'webm'
        },
        {
            'exec_cmd': ["cat > " + str(pathlib.Path(DST_DIR, '%(id)s', 'title.txt')) + " << 'EOF'\n"
                         '%(title)s\n'
                         'EOF'],
            'key': 'Exec',
            'when': 'after_move'
        },
        {
            'exec_cmd': ["cat > " + str(pathlib.Path(DST_DIR, '%(id)s', 'description.txt')) + " << 'EOF'\n"
                         '%(description)s\n'
                         'EOF'],
            'key': 'Exec',
            'when': 'after_move'
        },
    ],
}

if __name__ == '__main__':
    # RSS feed code here: TODO
    # This will replace the single URL passed to the `download` method.

    url: str = sys.argv[1]

    with YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])

The script works exactly the same as the one using the shell:

python -m mirror_yt "video or channel URL"

🎉 Conclusion

As you read, this is yet another evidence that vibe coding does not always work 100%.

Next time we'll integrate the YouTube RSS feeds into the script like I did in the first post of this series.

You can comment here and check my YouTube channel.

Top comments (2)

Daniel Martin • Dec 13 '25 • Edited

This was a really interesting read—thanks for taking the time to document the whole journey instead of just dropping the final script.

I’ve run into the same issue when moving from yt-dlp CLI to the Python API. On the surface it looks like you can just translate flags 1:1, but once you dig into YoutubeDL options, things get messy fast. The cli_to_api.py tip is gold—honestly surprised more people don’t mention it when working with yt-dlp programmatically. I only found it after breaking things a few times myself.

Your takeaway about vibe coding really resonates. GPT is great for scaffolding and ideas, but when you’re dealing with tools that have a lot of implicit behavior (like yt-dlp postprocessors), you still have to read the source or at least validate every option. I’ve learned to treat AI-generated config dicts as “educated guesses” rather than solutions.

This also ties into how a lot of users approach YouTube access in general. Many people rely on modified apps for convenience—such as ad-free playback, background play, or offline downloads—without realizing how fragile those setups can be. For example, solutions like this version often depend on patched clients or unofficial endpoints that can break overnight after a backend or app update. That’s very similar to what happens when yt-dlp options are used without fully understanding their side effects.

The key difference is control. With yt-dlp, you own the logic and can inspect, debug, or adapt it as YouTube changes things. With mod-based apps, failures are usually silent, and users are left waiting for someone else to fix them.

Also nice call on switching to pathlib. It makes these kinds of scripts much easier to reason about, especially once you start expanding things (RSS feeds, scheduling, etc.). That level of transparency and maintainability is exactly what most mod-based solutions lack.

Curious to see how you handle the RSS integration next—are you planning to dedupe entries manually, or rely on yt-dlp’s internal mechanisms once you wire that in?

Amelia Smith • Jan 25 • Edited

This was a really good read — especially the part where you showed why the initial GPT-generated options were misleading instead of just saying “AI got it wrong.” The cli_to_api.py tip is gold. I’ve used yt-dlp for years but somehow never dug into that script, and it honestly clears up so much confusion when moving from shell flags to API usage.

I’ve run into the same issue with options that look correct (embed_subtitles, convert_thumbnails, etc.) but silently do nothing because the API keys are different or handled via postprocessors. Your approach of translating the original CLI behavior verbatim feels way safer than trying to re-invent the config in Python.

Also agree on pathlib — once you switch, going back to os.path feels painful 😅
The Exec postprocessor trick for writing title/description is clever too. I usually dump metadata via JSON and parse it later, but your method keeps the mirror structure really clean for static site generators like Jekyll.

Interestingly, this kind of tooling clarity is becoming even more relevant as short-form platforms like TikTok — and niche variations such as TikTok 18 — continue to grow. With creators and developers increasingly automating content workflows, having a precise understanding of how CLI tools map into APIs makes a real difference, especially when building pipelines for archiving, indexing, or republishing short-video content.

Platforms and aggregators such as Reels18 highlight how these ecosystems are expanding beyond traditional usage, making reliable automation, metadata handling, and clean content structuring even more important for developers working in media tooling and content infrastructure.