Ajeet Singh Raina

Posted on Nov 6, 2023 • Edited on Jan 16, 2024 • Originally published at collabnix.com

How to Containerise a Large Language Model(LLM) App with Serge and Docker

#llm #docker #chatgpt #containers

Large language models (LLMs) are a type of artificial intelligence (AI) that are trained on massive datasets of text and code. They can be used for a variety of tasks, such as generating text, translating languages, and writing different kinds of creative content.

A large language model (LLM) is a deep learning algorithm that can perform a variety of natural language processing (NLP) tasks. LLMs use transformer models and are trained using massive datasets — hence, large. This enables them to recognize, translate, predict, or generate text or other content.

What are the benefits of using LLMs?

LLMs offer a number of benefits over traditional NLP techniques, including:

They can handle more complex tasks, such as machine translation and question answering.
They are more accurate than traditional techniques.
They can be used to generate more creative and informative text.
They can be adapted to new tasks more easily than traditional techniques.

What are the challenges of using LLMs?

LLMs also have some challenges, including:
They require a lot of data to train.
They can be computationally expensive to train and deploy.
They can be biased, reflecting the biases in the data they are trained on.
They can be used to generate harmful or offensive content.

How are LLMs being used today?

LLMs are being used in a variety of ways today, including:

Chatbots: LLMs can be used to create chatbots that can have natural conversations with humans.
Question-answering systems: LLMs can be used to build question-answering systems that can answer questions posed in natural language.
Natural language generation systems: LLMs can be used to build natural language generation systems that can generate text, translate languages, and write different kinds of creative content.
Code generation systems: LLMs can be used to build code generation systems that can generate code from natural language descriptions.
Data analysis systems: LLMs can be used to build data analysis systems that can extract insights from data.

Introducing Serge

Serge is an open-source chat platform for LLMs that makes it easy to self-host and experiment with LLMs locally. It is fully dockerized, so you can easily containerize your LLM app and deploy it to any environment.
This blog post will walk you through the steps on how to containerize an LLM app with Serge.

Prerequisites

To follow this tutorial, you will need the following:

A computer with Docker installed
The Serge source code
A pre-trained LLM model

Note

Make sure you have enough disk space and available RAM to run the modes. 7B requires about 4.5 GB of free RAM, 13GB requires about 12GB free, 30B requires about 20GB free.

Step 1: Create a new directory for your app

First, create a new directory for your app.

mkdir my-app
cd my-app

Step 2: Clone the Serge repository

Next, clone the Serge repository into your app directory.

 git clone https://github.com/serge-chat/serge.git

Step 3: Create a Dockerfile

Now, you need to create a Dockerfile for your app. The Dockerfile is a text file that tells Docker how to build your app image.

In your app directory, create a new file called Dockerfile.
nano Dockerfile

Paste the following code into the Dockerfile:

FROM serge-chat/serge:latest

COPY my-model.pkl /app/

CMD ["python", "app.py"]

This Dockerfile tells Docker to use the latest version of the Serge image as the base image. It then copies the pre-trained LLM model to the /app directory and runs the app.py script when the image is run.

Step 4: Build the Docker image

Once you have created the Dockerfile, you can build the Docker image for your app.

docker build -t my-app .

This will create a Docker image called my-app.

Step 5: Run the Docker image

Finally, you can run the Docker image for your app.

docker run -it my-app

This will start a containerized instance of your LLM app. You can then connect to the app using a web browser.

Step 6. Using Docker Compose

services:
  serge:
    image: ghcr.io/serge-chat/serge:latest
    container_name: serge
    restart: unless-stopped
    ports:
      - 8008:8008
    volumes:
      - weights:/usr/src/app/weights
      - datadb:/data/db/

volumes:
  weights:
  datadb:

Then, just visit http://localhost:8008/, You can find the API documentation at http://localhost:8008/api/doc

Kubernetes example

You can deploy Serge using the manifests below, it contains the required kind to make it run on a Kubernetes cluster.

Use this deployment manifest for your setup:

---
apiVersion: v1
kind: Service
metadata:
  labels:
    app: serge
  name: serge
  namespace: serge-ai
spec:
  ports:
    - name: "8008"
      port: 8008
      targetPort: 8008
    - name: "9124"
      port: 9124
      targetPort: 9124
  selector:
    app: serge
status:
  loadBalancer: {}
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: serge
  name: serge
  namespace: serge-ai
spec:
  replicas: 1
  selector:
    matchLabels:
      app: serge
  template:
    metadata:
      labels:
        app: serge
    spec:
      containers:
        - image: ghcr.io/serge-chat/serge:latest
          name: serge
          ports:
            - containerPort: 8008
            - containerPort: 9124
          resources:
            requests:
              cpu: 5000m
              memory: 5120Mi
            limits:
              cpu: 8000m
              memory: 8192Mi
          volumeMounts:
            - mountPath: /data/db
              name: datadb
            - mountPath: /usr/src/app/weights
              name: weights
      restartPolicy: Always
      volumes:
        - name: datadb
          persistentVolumeClaim:
            claimName: datadb
        - name: weights
          persistentVolumeClaim:
            claimName: weights
status: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: serge
  name: weights
  namespace: serge-ai
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 64Gi
status: {}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  labels:
    app: serge
  name: datadb
  namespace: serge-ai
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 16Gi
status: {}
---

You can now deploy Serge with the following commands:

$ kubectl create ns serge-ai
$ kubectl apply -f manifest.yaml

You can add the supported Alpaca models using the following commands after gathering the Pod ID:

$ kubectl get pod -n serge-ai
NAME                     READY   STATUS    RESTARTS   AGE
serge-58959fb6b7-px76v   1/1     Running   0          8m42s

$ kubectl exec -it serge-58959fb6b7-px76v -n serge-ai python3 /usr/src/app/api/utils/download.py tokenizer 7B

If you have an IngressClass on your cluster, it is possible to use Serge behind an ingress. Below is an example with an Nginx IngressClass:

---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: serge-ingress
  namespace: serge-ai
  annotations:
    nginx.ingress.kubernetes.io/configuration-snippet: |
      proxy_set_header Upgrade $http_upgrade;
      proxy_set_header Connection upgrade;
      proxy_set_header Accept-Encoding gzip;
    nginx.org/websocket-services: serge
    nginx.ingress.kubernetes.io/cors-allow-methods: "PUT, GET, POST, OPTIONS, DELETE"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - MY-DOMAIN.COM >>> **EDIT HERE**
      secretName: serge-tls
  rules:
    - host: MY-DOMAIN.COM >>> **EDIT HERE**
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: serge
                port:
                  number: 8008
---

If you have Cert-Manager installed, you can make a TLS certificate with the following YAML file:

---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: serge-tls
  namespace: serge-ai
spec:
  secretName: serge-tls
  issuerRef:
    name: acme-issuer
    kind: ClusterIssuer
  dnsNames:
  - 'MY-DOMAIN.COM' >>> **EDIT HERE**
  privateKey:
    algorithm: RSA
    encoding: PKCS1
    size: 4096
---

Conclusion

This blog post has shown you how to containerize a large language model app with Serge. By following these steps, you can easily deploy your LLM app to any environment.
Images.

DEV Community