Create a streaming AI assistant with ChatGPT, FastAPI, WebSockets and React ✨🤖🚀

#webdev #programming #ai #python

A Generative Pre-Trained Transformer (GPT) is a type of Large Language Model (LLM) and they are the hot topic in the technology world this year with many companies scrambling to add this technology to their products. Creating and training these large models can be a very complex, time consuming and expensive. You may think that you cannot use this technology since it is so complex and expensive but companies like OpenAI have done a ton of work to create useful models and setup platforms exposing APIs to use them. If you have ever used an API where you send some data in, it does some magic behind the scenes and you get some data to use in a response, then you can integrate this cutting edge technology into your application. Let’s take a look at how we can setup a Full stack web app which lets us ask questions sent to OpenAI and stream the response.

⭐️ The complete source code referenced in this guide is available on GitHub
https://github.com/dpills/ai-assistant

In order to use the OpenAI API you will need to sign up for an account and then Generate an API Key then add this to a .env file in your new project folder.

📝 .env

OPENAI_API_KEY=sk-YWUedpcl1xiGvGiD4xTwT3TlbkFJx9Sgnt8s0QYNxxxxxxxx

Install the API dependencies along with the openai python library.

📝 pyproject.toml

[tool.poetry]
name = "ai-assistant"
version = "0.1.0"
description = ""
authors = ["dpills"]
readme = "README.md"

[tool.poetry.dependencies]
python = "^3.11"
openai = "^1.2.3"
python-dotenv = "^1.0.0"
fastapi = "^0.104.1"
uvicorn = { extras = ["standard"], version = "^0.24.0.post1" }
websockets = "^12.0"

[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"

$ poetry install
...
  • Installing fastapi (0.104.1)
  • Installing openai (1.2.3)
...

Create a python file and import the OpenAI library which will use the OPENAI_API_KEY from the environment variables to authenticate. Within the options set stream to true and use an asynchronous generator to stream the response chunks as they are returned. We are using the GPT-3.5 Turbo model which is available in the free-trial but you can swap this out for a newer model such as GPT-4 if you have access to it.

📝 main.py

from typing import AsyncGenerator

from dotenv import load_dotenv
from openai import AsyncOpenAI

load_dotenv()

client = AsyncOpenAI()

async def get_ai_response(message: str) -> AsyncGenerator[str, None]:
    """
    OpenAI Response
    """
    response = await client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant, skilled in explaining "
                    "complex concepts in simple terms."
                ),
            },
            {
                "role": "user",
                "content": message,
            },
        ],
        stream=True,
    )

    all_content = ""
    async for chunk in response:
        content = chunk.choices[0].delta.content
        if content:
            all_content += content
            yield all_content

That is the basic setup for adding ChatGPT to your application, which is pretty simple! 😃

Now lets add a WebSocket API endpoint with the FastAPI WebSocket support in order to have a persistent bi-directional connection where we can stream the response from ChatGPT to our web app in real-time.

📝 main.py

from typing import AsyncGenerator, NoReturn

import uvicorn
from dotenv import load_dotenv
from fastapi import FastAPI, WebSocket
from openai import AsyncOpenAI

load_dotenv()

app = FastAPI()
client = AsyncOpenAI()

...

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket) -> NoReturn:
    """
    Websocket for AI responses
    """
    await websocket.accept()
    while True:
        message = await websocket.receive_text()
        async for text in get_ai_response(message):
            await websocket.send_text(text)

if __name__ == "__main__":
    uvicorn.run(
        "main:app",
        host="0.0.0.0",
        port=8000,
        log_level="debug",
        reload=True,
    )

Finally, for this simple example we can just have the API serve our Web App with a index.html file on the root route.

📝 main.py

from typing import AsyncGenerator, NoReturn

import uvicorn
from dotenv import load_dotenv
from fastapi import FastAPI, WebSocket
from fastapi.responses import HTMLResponse
from openai import AsyncOpenAI

load_dotenv()

app = FastAPI()
client = AsyncOpenAI()

with open("index.html") as f:
    html = f.read()

async def get_ai_response(message: str) -> AsyncGenerator[str, None]:
    """
    OpenAI Response
    """
    response = await client.chat.completions.create(
        model="gpt-3.5-turbo",
        messages=[
            {
                "role": "system",
                "content": (
                    "You are a helpful assistant, skilled in explaining "
                    "complex concepts in simple terms."
                ),
            },
            {
                "role": "user",
                "content": message,
            },
        ],
        stream=True,
    )

    all_content = ""
    async for chunk in response:
        content = chunk.choices[0].delta.content
        if content:
            all_content += content
            yield all_content

@app.get("/")
async def web_app() -> HTMLResponse:
    """
    Web App
    """
    return HTMLResponse(html)

@app.websocket("/ws")
async def websocket_endpoint(websocket: WebSocket) -> NoReturn:
    """
    Websocket for AI responses
    """
    await websocket.accept()
    while True:
        message = await websocket.receive_text()
        async for text in get_ai_response(message):
            await websocket.send_text(text)

if __name__ == "__main__":
    uvicorn.run(
        "main:app",
        host="0.0.0.0",
        port=8000,
        log_level="debug",
        reload=True,
    )

Now create the index.html file using the CDN development version of React, transpiled with the standalone Babel version and the React Material UI library. The app connects to the web socket and submits new questions to the WebSocket API which then streams the response as it is generated and renders the markdown.

📝 index.html

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="utf-8" />
    <title>AI Assistant 🤓</title>
    <meta name="viewport" content="initial-scale=1, width=device-width" />
    <script src="https://unpkg.com/react@latest/umd/react.development.js" crossorigin="anonymous"></script>
    <script src="https://unpkg.com/react-dom@latest/umd/react-dom.development.js"></script>
    <script src="https://unpkg.com/@mui/material@latest/umd/material-ui.development.js"
        crossorigin="anonymous"></script>
    <script src="https://unpkg.com/@babel/standalone@latest/babel.min.js" crossorigin="anonymous"></script>
    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
    <link rel="preconnect" href="https://fonts.googleapis.com" />
    <link rel="preconnect" href="https://fonts.gstatic.com" crossorigin />
    <link rel="stylesheet"
        href="https://fonts.googleapis.com/css2?family=Roboto:wght@300;400;500;600;700&display=swap" />
    <link rel="stylesheet" href="https://fonts.googleapis.com/icon?family=Material+Icons" />
</head>

<body>
    <div id="root"></div>
    <script type="text/babel">
        const {
            colors,
            CssBaseline,
            ThemeProvider,
            Typography,
            TextField,
            Container,
            createTheme,
            Box,
            Skeleton,
        } = MaterialUI;

        const theme = createTheme({
            palette: {
                mode: 'dark'
            },
        });
        const WS = new WebSocket("ws://localhost:8000/ws");

        function App() {
            const [response, setResponse] = React.useState("");
            const [question, setQuestion] = React.useState("");
            const [loading, setLoading] = React.useState(false);

            React.useEffect(() => {
                WS.onmessage = (event) => {
                    setLoading(false);
                    setResponse(marked.parse(event.data));
                };
            }, []);

            return (
                <Container maxWidth="lg">
                    <Box sx={{ my: 4 }}>
                        <Typography variant="h4" component="h1" gutterBottom>
                            AI Assistant 🤓
                        </Typography>
                        <TextField
                            id="outlined-basic"
                            label="Ask me Anything"
                            variant="outlined"
                            style={{ width: '100%' }}
                            value={question}
                            disabled={loading}
                            onChange={e => {
                                setQuestion(e.target.value)
                            }}
                            onKeyUp={e => {
                                setLoading(false)
                                if (e.key === "Enter") {
                                    setResponse('')
                                    setLoading(true)
                                    WS.send(question);
                                }
                            }}
                        />
                    </Box>
                    {!response && loading && (<>
                        <Skeleton />
                        <Skeleton animation="wave" />
                        <Skeleton animation={false} /></>)}
                    {response && <Typography dangerouslySetInnerHTML={{ __html: response }} />}
                </Container>
            );
        }

        ReactDOM.createRoot(document.getElementById('root')).render(
            <ThemeProvider theme={theme}>
                <CssBaseline />
                <App />
            </ThemeProvider>,
        );
    </script>
</body>

</html>

Now we are ready to test it out! Run the python script which will start the API and serve the Web App.

$ python3 main.py
INFO:     Uvicorn running on http://0.0.0.0:8000 (Press CTRL+C to quit)
DEBUG:    = connection is CONNECTING
DEBUG:    < GET /ws HTTP/1.1
DEBUG:    < user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36
DEBUG:    < upgrade: websocket
DEB**UG:    < origin: http://localhost:8000**
DEBUG:    < sec-websocket-version: 13
DEBUG:    < cookie: ajs_anonymous_id=57154c91-3ec5-4308-beaa-25a353f3ce66
DEBUG:    < sec-websocket-key: BQOnRY2biub2alMCN7ax+w==
DEBUG:    < sec-websocket-extensions: permessage-deflate; client_max_window_bits
INFO:     ('127.0.0.1', 54308) - "WebSocket /ws" [accepted]
DEBUG:    > date: Sun, 12 Nov 2023 19:00:20 GMT
DEBUG:    > server: uvicorn
INFO:     connection open
DEBUG:    = connection is OPEN

Navigate to http://localhost:8000, type in a question and hit enter.

Our question gets sent to the API through the WebSocket, then to OpenAI which generates a response which is streamed back through the WebSocket and rendered in the web app! 🎉

We can see that it takes around 30 seconds for ChatGPT to generate the full response which is a long time to wait for any application. But since we are taking advantage of streaming the response over the WebSocket it makes it feel very responsive. The response data is rendered on the screen faster than the average person can read it which leads to a better user experience. 🙂

I hope this shows you how easy it is to add AI enhancements to any application and gives you some basic building blocks to create a great experience for your users! 😊

Top comments (2)

Prayson Wilfred Daniel • Nov 18 '23

Bravo 👏. I have not yet worked with Web Sockets nor saved HTML in FastAPI. Last, I had sockets with Flask streaming OpenCV video with face recognition modelling.

I will bookmark this for my reference. Thank you for your awesome labour.

You rock 🤟🏿