Streaming ChatGPT API responses with python and JavaScript

It took me a while to figure out how to get a python flask server and web client to support streaming OpenAI completions so I figured I'd share.



from flask import Flask, stream_template, request, Response
import openai
from dotenv import load_dotenv
import os

load_dotenv()
# put these values in an .env file parallel to this file
openai.organization = os.environ.get("OPENAI_ORG")
openai.api_key = os.environ.get('OPENAI_API_KEY')
def send_messages(messages):
    return openai.ChatCompletion.create(
        model="gpt-3.5-turbo",
        messages=messages,
        stream=True
    )

app = Flask(__name__)

@app.route('/chat', methods=['GET', 'POST'])
def chat():
    if request.method == 'POST':
        messages = request.json['messages']
        def event_stream():
            for line in send_messages(messages=messages):
                print(line)
                text = line.choices[0].delta.get('content', '')
                if len(text): 
                    yield text

        return Response(event_stream(), mimetype='text/event-stream')
    else:
        return stream_template('./chat.html')

if __name__ == '__main__':
    app.run()

chat.html



<!DOCTYPE html>
<html>
  <head>
    <title>Chat</title>
  </head>
  <body>
    <h1>Chat</h1>
    <form id="chat-form">
      <label for="message">Message:</label>
      <input type="text" id="message" name="message">
      <button type="submit">Send</button>
    </form>
    <div id="chat-log"></div>
    <script src="{{ url_for('static', filename='chat.js') }}">
    </script>
  </body>
</html>

You can't use EventSource for this if you want to use POST method, this uses fetch API instead.

chat.js



const form = document.querySelector("#chat-form");
const chatlog = document.querySelector("#chat-log");

form.addEventListener("submit", async (event) => {
  event.preventDefault();

  // Get the user's message from the form
  const message = form.elements.message.value;

  // Send a request to the Flask server with the user's message
  const response = await fetch("/chat", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({ messages: [{ role: "user", content: message }] }),
  });

  // Create a new TextDecoder to decode the streamed response text
  const decoder = new TextDecoder();

  // Set up a new ReadableStream to read the response body
  const reader = response.body.getReader();
  let chunks = "";

  // Read the response stream as chunks and append them to the chat log
  while (true) {
    const { done, value } = await reader.read();
    if (done) break;
    chunks += decoder.decode(value);
    chatlog.innerHTML = chunks;
  }
});

Obviously this is not an optimal chat user experience but it'll get you started.

Resources for building AI applications with Neon Postgres 🤖

Core concepts, starter applications, framework integrations, and deployment guides. Use these resources to build applications like RAG chatbots, semantic search engines, or custom AI tools.

Explore AI Tools →

Top comments (3)

mercm8 • Nov 13 '23 • Edited

I tried this and it worked great running on localhost, but when I tried deploying it to my makeshift webserver (rpi / nginx) it stopped streaming and waited for the response stream to finish before the message appeared. Any idea why?

edit: I needed to add 'X-Accel-Buffering' = 'no' to response headers, changing the code to
response = Response(event_stream(), mimetype='text/event-stream') response.headers['X-Accel-Buffering'] = 'no' return response

Brayden Moore • Jul 29 '23

Exactly what I was looking for. Also dig the way you write code. E.g. one app route with an if/else rather than one for POSTs and another just to display the template. Nice.