Pratyush Singh

Posted on Jan 30, 2018

The Taste of Media Streaming with Flask

#flask #python #streaming

Photo by Austin Neill on Unsplash (Edited)

AnyAudio is a hobby project which I started with my friends in the 2nd year of my engineering. As of now, it consists of an API server (which also serves media 😅), a react based PWA (at least that’s what I am trying to make it) and an Android app. There is a Github organization dedicated to the different components of the project.

In this blog post, I will discuss the path which our streaming API went through to become good enough for daily use.

But where do you get ’em media from?

“Where can you find every single version of every song you have ever listened?”

Simple, YouTube!

YouTube hosts every song that ever existed and will continue to do so. Not just that, it also has various formats and bit-rates for audio that it hosts. Not to mention the accuracy of search results that it presents.

Keeping these things in mind, we decided to use YouTube as the source of data for everything — search results, playlists, media, autocomplete and what not.

Most of the part was scraped from YouTube but to fetch the audio URL of a video, we used youtube-dl (a command-line tool to download videos from YouTube and many other websites).

Flow of Requests for Streaming

In order to stream an audio, a client needs to follow the following flow of requests —

Get the details for the audio it wants to play. As of now, there are two ways to do this — using search and from predefined playlists.
Once the client has details of the audio, it requests the stream_url from the server (when needed).
er then responds with a URL where the client can make the request in order to get the actual audio. This is the URL it can embed in the <audio../> tag or ExoPlayer source to directly play the song. It is again served by the AnyAudio server.
The server acts as a middleman and redirects the data from YouTube to the client.

The first things that may come to your mind is that “Why does the client need to make an extra request to fetch the actual stream URL?”

This is because finding out the actual media URL is a time-consuming process. youtube-dl takes a fair amount of time to get resource location of a YouTube video. Fetching it in advance for every media in search results or playlists will be a pain in the UX. So, we decided to generate it on demand from the client.

This is how we got the stream API working. But in order to make it usable, we had to make a lot of improvements.

Bruh, do you even seek?

The API provided with a final URL in the stream API. But when using this URL as a source, the resulting player was not seekable. This behaviour was not acceptable for a media playing website.

After looking up for a solution on the Internet (like this stackoverflow question), I found out that partial content support was necessary for creating media elements that are seekable.

In order for clients to know that partial content is supported by the server, the server needs to add the following header to the response —

"Accept-Ranges": "bytes"

This implies that server is capable of serving partial content, specified as a range of bytes.

To add this header to responses, one can use the following decoration —

@app.after_request
def after_request(response):
    response.headers.add('Accept-Ranges', 'bytes')
    return response

After this, the clients would know that the server supports partial content. And they will start sending a request header mentioning the byte range for which they want the data. It looks something like this —

Range: bytes=0-1023

So we now had to serve only this part of the media file when asked. In flask, this can be easily done as follows —

r = requests.get(url)
range_header = request.headers.get('Range', None)
if range_header:  # Client has requested for partial content
  size = int(r.headers.get('content-length'))  # Actual size of song

  # Look up for ranges
  m = re.search('(\d+)-(\d*)', range_header)
  g = m.groups()
  byte1, byte2 = 0, None
  if g[0]:
    byte1 = int(g[0])
  if g[1]:
    byte2 = int(g[1])
  length = size - byte1
  if byte2:
    length = byte2 + 1 - byte1
  data = r.content[byte1: byte2]  # Trim the data from server

  # Prepare response
  rv = Response(data, 206, mimetype=mime, direct_passthrough=True)
  rv.headers.add('Content-Range', 'bytes {0}-{1}/{2}'.format(byte1, byte1 + length - 1, size))
  return rv

# No partial content, handle normally

After this, the seekbar was functional and the user could go to any position on a playing audio.

The problem with long songs

Streaming audio on AnyAudio was now a better experience than before. But there is a problem here. For every media request made to the server, the first thing done is fetching the media from YouTube servers (see line 1 in the code above). And after this, a fixed part of the media is sent as a response to the client.

The lengthy songs present on YouTube, such as jukeboxes and music compilations, have a large file size just for the audio. And considering the fact that there is a backend request made by the media player for every seek, listening to long audio would take forever.

The solution to this is quite simple — ask just for what you need.

Ask just for what you need

Earlier, every request of media made to the server resulted in downloading the whole audio from YouTube before serving any. This problem was solved by —

Limiting the size of media served by the server with each stream request.
Requesting partial content from YouTube.
The limit of file size helps to restrict the amount of data that will be requested from YouTube and as a result, we will have much more deterministic stream time than before. Here is a piece of code that explains the process —

range_header = request.headers.get('Range', None)
    if range_header:
        from_bytes, until_bytes = range_header.replace('bytes=', '').split('-')
        if not until_bytes:  # No until bytes is set, set it to start + 3MB
            until_bytes = int(from_bytes) + int(1024 * 1024 * 3)  # 1MB * 3 = 3MB

        # Get only what required from YouTube
        headers = {'Range': 'bytes=%s-%s' % (from_bytes, until_bytes)}
        r = requests.get(url, headers=headers)
        data = r.content

        # Generate response
        rv = Response(data, 206, mimetype=mime, direct_passthrough=True)
        rv.headers.add('Content-Range', r.headers.get('Content-Range'))
        return rv

By putting the Range header with the request made to YouTube, the amount of data that the server needs to download before serving it to the client becomes less.

One more step forward — Connecting Streams

With this change, streaming was a lot better than before. But this was still not the optimal performance that one expects from a media streaming service.

The client doesn’t get a response from the server unless the server has downloaded the partial data from YouTube and the response has been prepared and sent. This can be improved by streaming the content from YouTube directly to the user. In simple words, by connecting the data stream from YouTube to the connection where the client is expecting a reply.

To enable streaming in requests, we can simply add the stream=True parameter while making the request. Combining this with Flask’s streaming abilities, we can instantly start serving data to the client without waiting for some requests to complete. Here is a rough sample of how this would work —


def generate_data_from_response(resp, chunk=2048):
    for data_chunk in resp.iter_content(chunk_size=chunk):
        yield data_chunk


def serve_partial(url, range_header, mime, size=3145728):
    from_bytes, until_bytes = range_header.replace('bytes=', '').split('-')
    if not until_bytes:
        until_bytes = int(from_bytes) + size  # Default size is 3MB

    # Make request to YouTube
    headers = {'Range': 'bytes=%s-%s' % (from_bytes, until_bytes)}
    r = requests.get(url, headers=headers, stream=True)

    # Build response
    rv = Response(generate_data_from_response(r), 206, mimetype=mime,
                  direct_passthrough=True)
    rv.headers.add('Content-Range', r.headers.get('Content-Range'))
    rv.headers.add('Content-Length', r.headers['Content-Length'])
    return rv

This resulted in a Time to First Byte (TTFB) value of around 400ms on a dual core server with average client speed. Though this is nowhere compared to the requests made to YouTube from the same client (around 15ms), it is an achievement for a hobby project with a cheap server.

What are we planning next?

As of now, it is very difficult to invest time in AnyAudio. But we are really looking forward to introducing API v3 to the server which allows more straightforward requests and playlist support. This would mean that users would be able to listen to complete YouTube playlists on the web app.

We are also planning to improve the suggestions made on the web app. As of now, it is just related to the current song that is being played. But instead, it can be a lot better by taking into account the reader’s listening habits, search queries, how often the songs bump into the suggestion for the user and so on.

Integrating this would require the users to create an account on the platform, which we do not wish to include as we just want the platform to be as hassle-free as possible. So for this, we are planning just to use the browser’s local storage. Additional features would include saving songs and listening to them whenever the user wants. Offline listening is also something that we are looking to integrate once it turns out to be a good PWA.

Parting words

I have learned a lot with this project and hope to learn a lot more. We would love if you drop by and star our GitHub repositories. And if you are a developer and love to fix some dirty code and introduce awesome test cases, feel free to get in touch with us at our Gitter channel.

DEV Community