How the “YouTube to MP4” App Works
A technical deep dive into the pipeline, components, and packaging
- High-level overview
The YouTube to mp4.exe app is a self-contained Windows program that takes a YouTube URL and produces an .mp4 file on disk. Behind the scenes it’s a Python application wrapped into a standalone .exe. The main building blocks are:
1) GUI layer: A resizable desktop interface built with customtkinter (a themed version of Tkinter).
2) Download core: yt_dlp, an advanced YouTube/media downloader library.
3) Transcode / merge layer: ffmpeg, a command-line multimedia tool used to stitch audio and video into one MP4.
4) Packaging layer: A PyInstaller / auto-py-to-exe bundle that ships Python, yt_dlp, and ffmpeg in one file so it runs on a Windows machine without needing Python installed.
The flow looks like this:
[User pastes URL + picks quality/bitrate in GUI]
|
v
[yt_dlp probes YouTube]
|
v
[App picks the right video stream +
best matching audio]
|
v
[Download video-only + audio-only]
|
v
[ffmpeg merges them]
|
v
[Final MP4 saved]
At very low resolutions (like 360p), YouTube often provides “progressive” MP4s that already contain both audio and video in one file. In that case the pipeline is shorter: no merge step is needed. At higher qualities (720p, 1080p, 1440p, 2160p/4K), YouTube usually splits audio and video into separate tracks, so you have to download both and mux them together.
The app automates all of that so the user just clicks Download.
- The GUI layer (CustomTkinter)
Tech used
-
tkinteris Python’s standard library GUI toolkit. -
customtkinteris an enhanced theming layer on top of tkinter that gives you modern-looking widgets, dark mode styling, sliders, switches, etc.
From inspecting the bundled executable, we can see customtkinter modules (customtkinter.windows.widgets.ctk_button, ctk_slider, ctk_label, etc.) embedded in the binary. That tells us the interface isn’t just bare Tkinter — it’s using CustomTkinter widgets for cleaner layout and nicer visuals.
What the interface does
The GUI typically provides:
- A text field to paste the YouTube URL.
- Controls to choose output quality (e.g. 360p → 4K).
- A bitrate / quality slider for video.
- Toggle-style options (for example: “video+audio MP4” vs “audio-only MP3/M4A”, or similar optional features).
- A start/download button.
- A progress area or status label to show what’s happening.
There was also work done so the window is resizable, because long labels / switches on the far right were getting cut off in earlier fixed-size versions. You can see evidence of that in an internal string:
YouTube_yt_dlp_GUI_v2_resizable
So “v2_resizable” is basically UX polish:
- The window can stretch horizontally.
- Text on the far-right no longer gets clipped.
- Sliders and toggle switches stay visible.
Event flow
When you click Download:
1) The GUI grabs all the current settings (URL, chosen resolution/bitrate, etc.).
2) It calls into the download function (a Python function in your app).
3) While download/transcode is happening, the GUI can update the status text (“Downloading video…”, “Merging audio…”, etc.) instead of freezing silently.
That last part usually uses either:
-
after()callbacks in Tkinter to poll progress, or - threading so the UI doesn’t lock while
yt_dlpandffmpegare running.
- Format discovery (getting the available qualities)
When you paste a YouTube link, the app doesn’t just blindly download. Step one is “probe the video.”
Under the hood this is yt_dlp.YoutubeDL().extract_info(url, download=False).
That call:
- Connects to YouTube.
- Collects metadata (title, duration, channel, etc.).
- Enumerates all available formats: every resolution, container type, codec, bitrate, whether it has audio, etc.
The result is basically a big Python dict. A simplified version of what one entry in formats might look like:
{
"format_id": "248",
"ext": "webm",
"vcodec": "vp9",
"acodec": "none",
"height": 1080,
"fps": 30,
"tbr": 2500.12, # approx video bitrate in kbps
"filesize": 12345678,
"url": "https://r3---sn-abc123.googlevideo.com/videoplayback?..."
}
Some formats are audio-only (acodec is set, vcodec is none).
Some are video-only (vcodec is set, acodec is none).
Some (usually ≤480p) are “progressive,” meaning they include both audio and video in one file.
Your GUI uses this data to populate:
- The resolution dropdown / slider.
- The bitrate slider (because
tbrgives an approximate bitrate). - Potentially which formats are even allowed. For example, if the user chooses 4K but the video only goes up to 1080p, the app can warn them or silently fall back.
- Choosing the right streams
Once the user picks a target quality, the app applies selection logic. In English:
1) If a progressive MP4 exists at or below the requested quality:
- Grab that single file. Done.
2) Else (adaptive streaming path):
- Find the best video-only stream that matches the requested resolution or bitrate.
- Find the best audio-only stream (often
m4aoropus). - Download both separately.
- Merge them into final MP4.
This two-track path is required for HD and above because YouTube serves HD/FullHD/4K as DASH/HLS adaptive segments: video and audio are delivered separately.
Some versions of your app include a manual bitrate slider for video. That slider influences which yt_dlp format is chosen. Instead of always taking “bestvideo”, it can pick the stream with a total bitrate closest to the slider value. That’s handy if you want smaller file sizes instead of always forcing the top-bitrate variant.
- Downloading the streams
yt_dlp can either:
- Be asked to download the formats directly to disk, or
- Be asked just to give you the direct media URLs, and then you download them yourself.
Most YouTube download tools do something like:
ydl_opts = {
"outtmpl": "C:/path/%(title)s.%(ext)s",
"format": "bestvideo[height<=1080]+bestaudio/best[height<=1080]",
"merge_output_format": "mp4",
"ffmpeg_location": "path\to\ffmpeg.exe"
}
with yt_dlp.YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
That format string is powerful. It literally tells yt_dlp:
- Try to grab best video up to 1080p plus best audio.
- If that fails, fall back to best progressive format up to 1080p.
- Output as MP4 in the end.
In your app, this logic is effectively wrapped behind GUI choices instead of requiring the user to know the syntax. The GUI converts “1080p” + “target bitrate 4 Mbps” + “MP4 output” into actual yt_dlp options.
While downloading, yt_dlp also produces progress hooks. The app can register a callback that receives events like:
- “downloading: 34.5%”
- “postprocessing: merging formats”
That’s how the GUI can update a label or progress bar live.
- Merging audio + video into MP4
When the video-only and audio-only files are finished, we end up with something like:
-
temp_video.webm(VP9, H.264, etc.) -
temp_audio.m4a(AAC)
To turn those into one playable .mp4, the app uses ffmpeg. ffmpeg is a command-line tool that can mux streams together, convert codecs, change containers, etc.
A typical merge command (conceptually) looks like:
ffmpeg -i temp_video.webm -i temp_audio.m4a ^
-c:v copy -c:a aac ^
"FinalVideo.mp4"
Key points:
-
-c:v copymeans “don’t re-encode the video, just copy the stream.” That preserves quality and speeds things up. -
-c:a aaceither copies AAC directly or re-encodes audio to AAC, depending on source format and MP4 compatibility. - Output is a normal MP4 that basically every player can open.
In many GUI-driven downloaders, you never see this step because it’s done automatically. yt_dlp can even call ffmpeg for you as a “postprocessor,” so you don’t always have to shell out manually. The executable includes yt_dlp.postprocessor.ffmpeg, so we know that ffmpeg postprocessing is built-in.
In plain terms: the tool quietly runs ffmpeg behind the scenes to deliver a final MP4 that “just works.”
- Handling YouTube quirks (consent, age gates, etc.)
YouTube sometimes walls off higher-quality or certain tracks behind extra checks (cookie consent, age verification, region restrictions, etc.).
That’s why:
- You might see low-res formats (like 360p) download fine,
- but 1080p / 4K fails with messages about consent or “sign in to confirm your age.”
The app tries to solve the “normal” path (public videos, no login). For restricted videos, two things can happen:
1) yt_dlp throws an error saying it can’t fetch the high-res formats.
2) The merge step never triggers because the high-res video-only stream never came down.
There’s been discussion around importing cookies from Chrome / Opera GX so yt_dlp can pretend to be your logged-in browser. That’s a common workaround: passing a cookies file lets yt_dlp access formats your browser is allowed to see. The fact that browsers like Opera GX and Chrome were mentioned means the tool is being pushed in that direction, even if it’s not fully automated.
In other words:
- Low resolutions = easier, single file, no special access.
- High resolutions / HDR / age-restricted videos = may require browser cookies or fail if you’re “unauthenticated.”
- Output management
After the merge, the app:
- Builds a sane filename (usually based on the YouTube title, cleaned so it’s a valid Windows filename).
- Saves the final MP4 to a chosen folder (often Downloads or the working directory).
- (Optionally) deletes the temporary
temp_video/temp_audiofiles once the final MP4 is confirmed to exist.
Sanitizing the filename matters because YouTube titles can contain characters Windows doesn’t like, such as : or ?.
A typical sanitiser does something like:
import re
def clean_name(title):
return re.sub(r'[<>:"/|?*]', '_', title)
That prevents Windows Explorer from complaining.
- Packaging into a standalone
.exe
Python scripts normally require:
- A Python interpreter,
- Your
.pyfiles, - Third-party modules (
yt_dlp,customtkinter, etc.), - And
ffmpeg.exesomewhere on disk.
That’s annoying to ship to non-technical users. So the project is bundled into a self-contained Windows executable.
This was likely done with auto-py-to-exe, which is a wrapper around PyInstaller that gives you a point-and-click interface to build .exes.
Here’s roughly what happens in packaging:
1) Your Python source + libraries get analyzed.
2) All the bytecode, plus resource files, plus ffmpeg.exe, is stuffed into a single archive.
3) PyInstaller’s bootloader (a small C program) is placed in front of that archive.
When you run Youtube to mp4.exe, the bootloader:
- Creates a temporary extraction folder (often a
_MEIxxxxxfolder in%TEMP%). - Unpacks Python, your code, yt_dlp, ffmpeg, and all needed DLLs there.
- Executes your
mainscript as if you just ranpython main.py.
Because of this, the .exe you provided is fairly large — on the order of tens of megabytes. It’s not “just the script,” it’s Python runtime + libraries + ffmpeg + everything else.
Advantages of this approach:
- The end user does not need Python installed.
- The UI and download logic behave consistently across systems.
- You can add an icon and a nice filename so it feels like a normal Windows app.
- Error handling and UX polish
There are a few user-experience choices baked into the current design:
- Resizable window: solves clipped widgets (like the far-right toggle text).
- Quality slider / dropdown: instead of dumping a giant advanced list of formats (
248,251,bestvideo+bestaudio/bestetc.), the tool shows “720p,” “1080p,” “4K,” etc. - Bitrate slider: lets you choose file size vs quality without needing to know internal
tbrvalues. - Progress messages: things like “Downloading video…”, “Merging audio…”, “Done!”. This gives feedback during long downloads.
- Graceful fallback: if 4K fails because YouTube blocks access, the tool can either tell you why or drop to a working format (like 360p progressive) instead of silently crashing.
- Putting it all together
Here’s the full lifecycle of one download request:
1) User input
- You paste a YouTube URL into the GUI.
- You pick a resolution/bitrate with the sliders and toggles.
2) Probe formats
- The app calls
yt_dlpin “info only” mode to list available formats and metadata.
3) Pick strategy
- If there’s a progressive MP4 at or under the target quality: choose that.
- Otherwise, choose the best video-only stream that matches your quality/bitrate, and the best audio-only stream.
4) Download
- The app downloads those streams.
- The GUI updates status as it goes.
5) Mux / Merge
-
ffmpeg(bundled inside the .exe) merges video+audio into one MP4. - No manual ffmpeg knowledge needed — it runs invisibly in the background.
6) Cleanup and save
- The final MP4 is renamed using a cleaned-up video title.
- Temporary chunks get deleted.
- The GUI reports success.
7) You watch the video
- You now have a normal
.mp4that will open in VLC, Windows Media Player, etc.
- Why this design works well
- No external setup: Everything is in one EXE, so a non-technical user can double-click it on Windows and immediately use it.
- GUI instead of command line: People don’t have to memorize yt-dlp or ffmpeg arguments.
- Quality control: The app exposes “quality,” “resolution,” and “bitrate” as sliders / dropdowns instead of scary codec jargon.
- Automatic merge: The tool handles adaptive streaming (audio/video split) automatically, so HD and 4K downloads become a single click.
In short, the app is a friendly wrapper around a pretty advanced chain:
scrape → choose formats → download streams → transcode/mux → save MP4,
all orchestrated through a Python GUI and shipped as a Windows-native .exe.
Here is the repo! https://github.com/Coolythecoder/Youtube-to-mp4
Top comments (0)