DEV Community

Andreas Offenhaeuser
Andreas Offenhaeuser

Posted on

How to create a streaming HTTP interface in Python?

Hey folks,

lately I'm playing around a lot with artificial intelligence and Python. What I want to build now is a web interface for a python based ML model (keras etc). For full flexibility I want the app to a) take a stream of input data b) return a stream of output data. Each chunk in the input stream would be run through the ML model and the corresonding output streamed back as a chunk.

However I could not find any resources online that tell me how to SEND a stream response in a production grade python web framework. Anyone got experience how to send a HTTP stream response in python?

Currently I'm focusing on the combination of falcon + gunicorn to create a web app as Flask doesn't seem to be a production grade framework. Although my requirements are not production-grade I would love to figure out how to do this at such a level.

/andreas

Top comments (6)

Collapse
 
rhymes profile image
rhymes • Edited

Hi Andreas,

Do you have special requirements?

With HTTP 1.1 you can stream data using chunked transfer encoding, which means you can send the header Transfer-Encoding: chunked and the data you want. You can see an example on MDN.

You can also stream with other protocols: HTTP 2, websockets, grpc and so on.

So if you control both the client and the server you can choose how to stream your data.

Falcon unfortunately doesn't support that header.

The actual reason why is that WSGI itself (the interface under pretty much all Python servers) does not support Transfer-Encoding:

However, because WSGI servers and applications do not communicate via HTTP, what RFC 2616 calls "hop-by-hop" headers do not apply to WSGI internal communications. WSGI applications must not generate any "hop-by-hop" headers [4], attempt to use HTTP features that would require them to generate such headers, or rely on the content of any incoming "hop-by-hop" headers in the environ dictionary. WSGI servers must handle any supported inbound "hop-by-hop" headers on their own, such as by decoding any inbound Transfer-Encoding, including chunked encoding if applicable.

This because WSGI servers are middlewares and streaming would break the pattern.

Flask does not support it either but they worked around it using generators. Here it's my example:

import time
from datetime import datetime
from flask import Flask, Response

app = Flask(__name__)


@app.route('/time')
def doyouhavethetime():
    def generate():
        while True:
            yield "{}\n".format(datetime.now().isoformat())
            time.sleep(1)
    return Response(generate(), mimetype='text/plain')

and the client:

flask stream output

As you can see it's not using "Transfer-Encoding", just iterating on the generator and sending data.

Another option you have is to use aiohttp which is not based on WSGI and works well with chunked streaming.

You can find an example in this article though there's a bug on line 8 of his example, the rest works :-)

Replace:

interval = int(request.GET.get('interval', 1))

with

interval = int(request.query.get('interval', 1))

The headers sent by the server:

< HTTP/1.1 200 OK
< Content-Type: text/plain
< Transfer-Encoding: chunked
< Server: Python/3.6 aiohttp/3.0.9

As you can see it supports chunked streaming.

Collapse
 
anoff profile image
Andreas Offenhaeuser

Wow thanks for this amazing response! I will take a deeper look at aiohttp. But wrapping my code into a generator would also be a valid fallback.

Maybe I am just too spoiled by the way nodeJS handles streams that I have a hard time understanding why things are so difficult in the python world :)

Collapse
 
rhymes profile image
rhymes

If you're used to NodeJS I'm sure you'll fit in with aiohttp being async and all.

About the generator trick: I haven't tried with gunicorn and multiple processes. I feel like it's going to destroy the performance because each process might be able to serve only on request (the one generating the stream).

Can't wait to read your article on the solution ;)

Collapse
 
msoedov profile image
Alex Miasoiedov • Edited

It's gonna work well if you don't need to handle 10+ clients simultaneously

Collapse
 
rhymes profile image
rhymes

Hi Alex,

do you mean Flask's solution? If so, probably even less. If you mean aiohttp's I'm curious to know if you tested it.

I've never used aiohttp so I don't know much.

Collapse
 
msoedov profile image
Alex Miasoiedov

Use SSE with aiohttp, you can use websockets or Transfer-Encoding: chunked as well. Don't use any framework on top of WSGI unless you are building server for few/one concurrent clients