Setting Up Mirrors for YouTube Livestreams

#showdev #tutorial

Many such as myself like to have lo-fi music streams playing in the background while I'm coding or working. The subtle rhythms of the genre gives a perfect blend of ambience and presence while working on other tasks.

Though recently I've been finding YouTube to be taking up too much of my already pitiful bandwidth. Chrome DevTools reveals that YouTube was transferring 6.4MB worth of data per minute (which admittably isn't a lot, but having to work from home using my phone's cellular network has made this very noticeable).

Why not use youtube-dl?

It would seem that youtube-dl would be a perfect solution to the problem, simply selecting the audio playlist of the stream and chucking it into a compatible player like VLC or MPV.

Unfortunately YouTube livestreams, unlike normal videos, don't have separate audio tracks. A simple youtube-dl --list-formats will reveal this:

chowder@blade:~$ youtube-dl -F https://www.youtube.com/watch?v=5qap5aO4i9A
format code  extension  resolution note
91           mp4        256x144     197k , avc1.42c00b, 30.0fps, mp4a.40.5
92           mp4        426x240     338k , avc1.4d4015, 30.0fps, mp4a.40.5
93           mp4        640x360     829k , avc1.4d401e, 30.0fps, mp4a.40.2
94           mp4        854x480    1380k , avc1.4d401f, 30.0fps, mp4a.40.2
95           mp4        1280x720   2593k , avc1.4d401f, 30.0fps, mp4a.40.2
96           mp4        1920x1080  4715k , avc1.640028, 30.0fps, mp4a.40.2 (best)

Even at the lowest resolution, each 5-second segment of the stream is approximately ~80KB with the audio being only ~30KB of that.

chowder@blade:~$ ls -lh | grep seg.ts
-rw-r--r-- 1 chowder chowder  80K Jun  4 17:32 seg.ts
chowder@blade:~$ ffmpeg -i seg.ts -vn -acodec copy audio-only.ts
...
video:0kB audio:29kB

Obviously there's still some savings to be made if we could somehow obtain only the audio part of a live stream.

Setting up a mirror

The idea that came to me was to setup a mirror of the YouTube livestream that would strip the video from the stream segments, passing through only the audio component.

(This would obviously have to be hosted from outside my home network to avoid the bandwidth costs I'm trying to shave in the first place.)

FFmpeg to the rescue

Instead of writing an HLS playlist parser, then mapping the download URLs for each segment to URLs from your mirror, I realised FFmpeg already provided all of this functionality by being able to ingest an HLS playlist URL as input, and output a local HLS playlist.

To do this, I first retrieved the YouTube HLS playlist URL with youtube-dl:

HLS_PLAYLIST_URL=$(youtube-dl -f worst -g $YOUTUBE_LIVESTREAM_URL)

The -f worst parameter simply instructs youtube-dl to use the worst quality playlist offered, since the audio quality across all playlists are the same.

Then I passed the URL obtained to FFmpeg to build a local HLS playlist:

ffmpeg -i "$HLS_PLAYLIST_URL" \
    -c:a copy \
    -ac 2 \
    -vn \
    -f hls \
    -hls_time 5 \
    -hls_flags delete_segments \
    -hls_list_size 4 \
    stream.m3u8

Breaking down the parameters used:

-c:a copy - Use stream copy mode for the audio, which stop ffmpeg from re-encoding the audio stream
-ac 2 - Use 2 audio channels
-vn - No video output
-f hls - Output a HLS playlist
-hls_time 5 - The length of each stream segment in seconds
-hls_flags delete_segments - Deletes any segment files no longer referenced by the playlist file, think of it as "garbage collection" for the segment files
stream.m3u8 - This is the name of the local HLS playlist file generated

Running the above command generated a stream.m3u8 file in the current directory, and a series of stream<n>.ts files each corresponding to a segment of the stream. The stream.m3u8 file was also periodically replaced as the livestream progressed.

Hosting the playlist

I booted into an EC2 instance that I had, and wrapped the commands above into a simple shell script:

set -e

cd /var/www/html/hls-mirror

youtube_url="https://www.youtube.com/watch?v=dQw4w9WgXcQ"

hls_url=$(youtube-dl -f worst -g $youtube_url)

exec ffmpeg -i "$hls_url" \
    -c:a copy \
    -ac 2 \
    -vn \
    -f hls \
    -hls_time 5 \
    -hls_flags delete_segments \
    -hls_list_size 4 \
    -hide_banner \
    -loglevel error \
    stream.m3u8

And setup the script as a daemon service with systemd:

[Unit]
Description=HLS Mirror

[Service]
ExecStart=/usr/local/bin/run-hls-mirror
Restart=always
RestartSec=3

[Install]
WantedBy=multi-user.target

The playlist and segment files were then served through Apache. Though you might find this list of static HTTP server one-liners useful for similar purposes.

That's pretty much it - at this point I was able to open a network stream to the URL of the stream.m3u8 file in VLC, and start enjoying my low-footprint audio livestream.

Setting up a frontend player

I also decided to setup a web-based frontend so that I can easily access the livestream without needing to download another app for it.

Video.js seemed like a good choice for this sort of thing, and so I hand-rolled some HTML, CSS and JS to put together a simple site hosted on GitHub Pages.

Finally, I can continue having lo-fi music in the background of my Zoom calls without getting cut off... 😃

Can you do this with AWS Lambda?

Requiring an entire VPS for this purpose may be a little overkill for some – in my case I had already been running an instance for other reasons and so the entire project came at "zero" additional cost; though it had me thinking if it was possible to reduce the cost of hosting this by going serverless.

The first thing I considered was to define a set of Lambda functions that would:

Parse the playlist file from YouTube
Generate and store a mapping for each segment file
Return a playlist file with the segment URLs mapped to your own
At request, fetch the original segment files, and return them with the video components stripped

Unfortunately the HLS API for YouTube is setup in such a way that you can only stream it from a single source IP address, while each invocation of a Lambda would be from a different IP address.

While there are multiple ways to setup static IP addresses for Lambdas, all of them require setting up a VPC with a NAT Gateway, which will cost you at minimum $30/month/availability zone.

In the end, with EC2 instances being as cheap as they are at the low end (especially for reserved instances) I didn't see a compelling reason to further cost optimise.