Cloud Native Workflow for Private MPT-30B AI Apps

#kubernetes #ai #cloudnative #llm

In this article, we will guide you through the process of developing your own private AI application 🤖, leveraging the capabilities of Kubernetes.

Unlike many other tutorials, we will NOT rely on OpenAI APIs. Instead, we will utilize a private AI instance with a Apache 2.0 licensed model MPT-30B, that ensures the confidentiality of all 🔒 sensitive data 🔒 within your Kubernetes cluster. No data goes to the third-party cloud 🙅‍♂️ 🌩️!

To set up the development environment on Kubernetes, we will utilize devspace. This environment includes a file sync pipeline for your AI application, as well as the backend AI API (a RESTful API service designed to replace OpenAI API) for the AI app.

Let's kick-start the process by deploying the necessary services on Kubernetes using the command devspace deploy. DevSpace will handle the deployment of the initial structure of our applications, along with their dependencies, including ialacol. For more detailed explanations, please refer to the in-line comments provided in the code snippet below:

# This is the configuration file for DevSpace
# 
# devspace use namespace private-ai # suggest to use a namespace instead of the default name space
# devspace deploy # deploy the skeleton of the app and the dependencies (ialacol)
# devspace dev # start syncing files to the container
# devspace purge # to clean up
version: v2beta1
deployments:
  # This are the manifest our private app deployment
  # The app will be in "sleep mode" after `devspace deploy`, and start when we start
  # syncing files to the container by `devspace dev`
  private-ai-app:
    helm:
      chart:
        # We are deploying the so-called Component Chart: https://devspace.sh/component-chart/docs
        name: component-chart
        repo: https://charts.devspace.sh
      values:
        containers:
          - image: ghcr.io/loft-sh/devspace-containers/python:3-alpine
            command:
            - "sleep"
            args:
            - "99999"
        service:
          ports:
          - port: 8000
        labels:
          app.kubernetes.io/name: private-ai-app
  ialacol:
    helm:
      # the backend for the AI app, we are using ialacol https://github.com/chenhunghan/ialacol/
      chart:
        name: ialacol
        repo: https://chenhunghan.github.io/ialacol
      # overriding values.yaml of ialacol helm chart
      values:
        replicas: 1
        deployment:
          image: quay.io/chenhunghan/ialacol:latest
          env:
            # We are using MPT-30B, which is the most sophisticated model at the moment
            # If you want to start with some small but mightym try orca-mini
            # DEFAULT_MODEL_HG_REPO_ID: TheBloke/orca_mini_3B-GGML
            # DEFAULT_MODEL_FILE: orca-mini-3b.ggmlv3.q4_0.bin
            # MPT-30B
            DEFAULT_MODEL_HG_REPO_ID: TheBloke/mpt-30B-GGML
            DEFAULT_MODEL_FILE: mpt-30b.ggmlv0.q4_1.bin
            DEFAULT_MODEL_META: ""
        # Request more resource if needed
        resources:
          {}
        # pvc for storing the cache
        cache:
          persistence:
            size: 5Gi
            accessModes:
              - ReadWriteOnce
            storageClass: ~
        cacheMountPath: /app/cache
        # pvc for storing the models
        model:
          persistence:
            size: 20Gi
            accessModes:
              - ReadWriteOnce
            storageClass: ~
        modelMountPath: /app/models
        service:
          type: ClusterIP
          port: 8000
          annotations: {}
        # You might want to use the following to select a node with more CPU and memory
        # for MPT-30B, we need at least 32GB of memory
        nodeSelector: {}
        tolerations: []
        affinity: {}

Let's wait for few seconds for the pods to become green, I am using Lens, it's awesome btw.

When all pods are green. We are ready for the next step.

The private AI app we are developing is a simple web server with an endpoint POST /prompt. When a client sends a request with a prompt in the request body to POST /prompt, the endpoint's controller will forward the prompt to the backend AI API, retrieve the response, and send it back to the client.

To begin, let's install the necessary dependencies on our local machine

python3 -m venv .venv
source .venv/bin/activate
pip install fastapi uvicorn
pip install openai # We are not using OpenAI API, but we can use openai client library to simplify things because our backend (ialacol) has OpenAI compatible RESTful interface.
pip freeze > requirements.txt

and create a main.py file.

from fastapi import FastAPI
import openai
from pydantic import BaseModel

class Body(BaseModel):
    prompt: str

app = FastAPI()

@app.post("/prompt")
async def completions(
    body: Body
):
    prompt = body.prompt
    # Add more logics here, for example, you can add the context to the prompt
    # using context augmentation retrieval methods
    response = openai.Completion.create(
        prompt=prompt,
        model="mpt-30b.ggmlv0.q4_1.bin",
        temperature=0.5
    )
    completion = response.choices[0].text

    return completion

The implementation of our app's endpoint POST /prompt is straightforward. It acts as a proxy, forwarding the request to the backend. You can further extend it by incorporating additional functionality, such as context augmentation retrieval based on the provided prompt.

With the core functionality of the app in place, let's synchronize the source files to the cluster by running the command devspace dev. This command performs the following actions:

It instructs devSpace to sync the files located at the root folder to the /app folder of the remote pod.
Whenever changes are made to the requirements.txt file, it triggers a pip install within the pod.
Additionally, it forwards port 8000, allowing us to access the app at http://localhost:8000.

dev:
  private-ai-app:
    # Use the label selector to select the pod for swapping out the container
    labelSelector:
      app.kubernetes.io/name: private-ai-app
    # use the name space we assign by devspace use namespace
    namespace: ${DEVSPACE_NAMESPACE}
    devImage: ghcr.io/loft-sh/devspace-containers/python:3-alpine
    workingDir: /app
    command: ["uvicorn"]
    args: ["main:app", "--reload", "--host", "0.0.0.0", "--port", "8000"]
    # expose the port 8000 to the host
    ports:
    - port: "8000:8000"
    # Add env for the pod if needed
    env:
    # This will tell openai python library to use the ialacol service instead of the OpenAI cloud
    - name: OPENAI_API_BASE
      value: "http://ialacol.${DEVSPACE_NAMESPACE}.svc.cluster.local:8000/v1"
    # You don't need to have an OpenAI API key, but OpenAI python library will complain without it
    - name: OPENAI_API_KEY
      value: "sk-xxx"
    sync:
      - path: ./:/app
        excludePaths:
        - requirements.txt
        printLogs: true
        uploadExcludeFile: ./.dockerignore
        downloadExcludeFile: ./.gitignore
      - path: ./requirements.txt:/app/requirements.txt
        # start the container after uploading the requirements.txt and install the dependencies
        startContainer: true
        file: true
        printLogs: true
        onUpload:
          exec:
          - command: |-
              pip install -r requirements.txt
            onChange: ["requirements.txt"]
    logs:
      enabled: true
      lastLines: 200

Wait for the files sync completed (you should see some logs in the terminal), and test our app by

curl -X POST -H 'Content-Type: application/json' -d '{ "prompt": "Hello!" }' http://localhost:8000/prompt

That's it, enjoy building your first private AI app 🥳!

Source code in the article private-ai-app-starter-python

DEV Community

Cloud Native Workflow for Private MPT-30B AI Apps

Top comments (0)

Read next

Extracting Sensitive Data via Remote Timing Attacks on Efficient Language Models

Robust Interpretable Reasoning via Neurosymbolic Program Synthesis

100x Faster Diffusion Planning with DiffuserLite's Refinement Approach

How to Fix Kubernetes Node Disk Pressure