DEV Community

Dhiraj Patra
Dhiraj Patra

Posted on

LLM Deployment Pipeline with Azure and Kubeflow

To deploy model espcially LLM based application in Azure can be daunting task manually. We can automate the deployment pipeline with Kubeflow.

I am providing one example of an end-to-end machine learning deployment pipeline using Kubeflow on Azure. This example will cover setting up a Kubeflow pipeline, training a model, and deploying the model.

Prerequisites:

  1. Azure Account: You need an Azure account.

  2. Azure Kubernetes Service (AKS): You need a Kubernetes cluster. You can create an AKS cluster via the Azure portal or CLI.

  3. Kubeflow: You need Kubeflow installed on your AKS cluster. Follow the Kubeflow on Azure documentation to set this up.

Step 1: Setting Up the Environment

First, ensure you have the Azure CLI and kubectl installed and configured.


# Install Azure CLI

curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash



# Install kubectl

az aks install-cli



# Log in to Azure

az login



# Set the subscription (if you have multiple subscriptions)

az account set --subscription "<your-subscription-id>"



# Get credentials for your AKS cluster

az aks get-credentials --resource-group <resource-group-name> --name <aks-cluster-name>

Enter fullscreen mode Exit fullscreen mode

Step 2: Deploying Kubeflow on AKS

Follow the official Kubeflow deployment guide for Azure AKS:

Deploy Kubeflow on Azure AKS

Step 3: Creating a Kubeflow Pipeline

We'll create a simple pipeline that trains and deploys a machine learning model.

Pipeline Definition

Create a file pipeline.py:


import kfp

from kfp import dsl

from kfp.components import create_component_from_func



def train_model() -> str:

    import pandas as pd

    from sklearn.datasets import load_iris

    from sklearn.linear_model import LogisticRegression

    from sklearn.model_selection import train_test_split

    import joblib



    iris = load_iris()

    X_train, X_test, y_train, y_test = train_test_split(iris.data, iris.target, test_size=0.2)



    clf = LogisticRegression()

    clf.fit(X_train, y_train)



    accuracy = clf.score(X_test, y_test)

    print(f"Model accuracy: {accuracy}")



    model_path = "/model.pkl"

    joblib.dump(clf, model_path)



    return model_path



train_model_op = create_component_from_func(

    train_model, base_image='python:3.8-slim'

)



@dsl.pipeline(

    name='Iris Training Pipeline',

    description='A pipeline to train and deploy an Iris classification model.'

)

def iris_pipeline():

    train_task = train_model_op()



if __name__ == '__main__':

    kfp.compiler.Compiler().compile(iris_pipeline, 'iris_pipeline.yaml')

Enter fullscreen mode Exit fullscreen mode

Step 4: Deploying the Pipeline

Upload the pipeline to your Kubeflow instance.


pip install kfp



kfp_client = kfp.Client()

kfp_client.upload_pipeline(pipeline_package_path='iris_pipeline.yaml', pipeline_name='Iris Training Pipeline')

Enter fullscreen mode Exit fullscreen mode

Step 5: Running the Pipeline

Once the pipeline is uploaded, you can run it via the Kubeflow dashboard or programmatically.


# Run the pipeline

experiment = kfp_client.create_experiment('Iris Experiment')

run = kfp_client.run_pipeline(experiment.id, 'iris_pipeline_run', 'iris_pipeline.yaml')

Enter fullscreen mode Exit fullscreen mode

Step 6: Deploying the Model

Assuming the trained model is saved in a storage bucket, you can create a deployment pipeline to deploy the model to Azure Kubernetes Service (AKS).

Model Deployment Component

Create a file deploy.py:


from kubernetes import client, config



def deploy_model(model_path: str):

    config.load_kube_config()



    # Define deployment specs

    deployment = client.V1Deployment(

        metadata=client.V1ObjectMeta(name="iris-model-deployment"),

        spec=client.V1DeploymentSpec(

            replicas=1,

            selector={'matchLabels': {'app': 'iris-model'}},

            template=client.V1PodTemplateSpec(

                metadata=client.V1ObjectMeta(labels={'app': 'iris-model'}),

                spec=client.V1PodSpec(containers=[client.V1Container(

                    name="iris-model",

                    image="mydockerhub/iris-model:latest",

                    ports=[client.V1ContainerPort(container_port=80)]

                )])

            )

        )

    )



    # Create deployment

    apps_v1 = client.AppsV1Api()

    apps_v1.create_namespaced_deployment(namespace="default", body=deployment)



deploy_model_op = create_component_from_func(

    deploy_model, base_image='python:3.8-slim'

)



@dsl.pipeline(

    name='Iris Deployment Pipeline',

    description='A pipeline to deploy an Iris classification model.'

)

def iris_deploy_pipeline(model_path: str):

    deploy_task = deploy_model_op(model_path)



if __name__ == '__main__':

    kfp.compiler.Compiler().compile(iris_deploy_pipeline, 'iris_deploy_pipeline.yaml')

Enter fullscreen mode Exit fullscreen mode

Step 7: Running the Deployment Pipeline

Upload and run the deployment pipeline.


# Upload the deployment pipeline

kfp_client.upload_pipeline(pipeline_package_path='iris_deploy_pipeline.yaml', pipeline_name='Iris Deployment Pipeline')



# Run the deployment pipeline

experiment = kfp_client.create_experiment('Iris Deployment Experiment')

run = kfp_client.run_pipeline(experiment.id, 'iris_deploy_pipeline_run', 'iris_deploy_pipeline.yaml', params={'model_path': '<path-to-your-model>'})

Enter fullscreen mode Exit fullscreen mode

Conclusion

This end-to-end example demonstrates setting up a Kubeflow pipeline on Azure, training a model, and deploying it to AKS. Customize the model_path, Docker image, and other specifics as needed for your actual use case.

Deploying a Large Language Model (LLM) involves a few additional steps compared to a general machine learning model. Here’s how you can set up an end-to-end deployment pipeline for an LLM using Kubeflow on Azure, similar to the previous example.

Prerequisites

Ensure you have the necessary tools and environment set up as mentioned in the previous steps, including an Azure account, AKS cluster, and Kubeflow.

Step 1: Setting Up the Environment

Use the same steps as before to install Azure CLI, kubectl, and configure your environment.

Step 2: Deploying Kubeflow on AKS

Follow the official Kubeflow deployment guide for Azure AKS:

Deploy Kubeflow on Azure AKS

Step 3: Creating a Kubeflow Pipeline for LLM

Let's create a pipeline that fine-tunes a Hugging Face LLM and deploys it.

Pipeline Definition

Create a file llm_pipeline.py:


import kfp

from kfp import dsl

from kfp.components import create_component_from_func



def train_llm() -> str:

    from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments

    from datasets import load_dataset

    import torch



    # Load dataset

    dataset = load_dataset("wikitext", "wikitext-2-raw-v1")



    # Load model and tokenizer

    model_name = "gpt2"

    model = AutoModelForCausalLM.from_pretrained(model_name)

    tokenizer = AutoTokenizer.from_pretrained(model_name)



    def tokenize_function(examples):

        return tokenizer(examples["text"], padding="max_length", truncation=True)



    tokenized_datasets = dataset.map(tokenize_function, batched=True)

    tokenized_datasets = tokenized_datasets.remove_columns(["text"])

    tokenized_datasets.set_format("torch")



    # Define training arguments

    training_args = TrainingArguments(

        output_dir="./results",

        evaluation_strategy="epoch",

        learning_rate=2e-5,

        per_device_train_batch_size=8,

        per_device_eval_batch_size=8,

        num_train_epochs=3,

        weight_decay=0.01,

    )



    # Create Trainer

    trainer = Trainer(

        model=model,

        args=training_args,

        train_dataset=tokenized_datasets["train"],

        eval_dataset=tokenized_datasets["validation"],

    )



    # Train model

    trainer.train()



    # Save model

    model_path = "/model"

    model.save_pretrained(model_path)

    tokenizer.save_pretrained(model_path)



    return model_path



train_llm_op = create_component_from_func(

    train_llm, base_image='python:3.8-slim'

)



@dsl.pipeline(

    name='LLM Training Pipeline',

    description='A pipeline to train and deploy a Large Language Model.'

)

def llm_pipeline():

    train_task = train_llm_op()



if __name__ == '__main__':

    kfp.compiler.Compiler().compile(llm_pipeline, 'llm_pipeline.yaml')

Enter fullscreen mode Exit fullscreen mode

Step 4: Deploying the Pipeline

Upload the pipeline to your Kubeflow instance.


pip install kfp



kfp_client = kfp.Client()

kfp_client.upload_pipeline(pipeline_package_path='llm_pipeline.yaml', pipeline_name='LLM Training Pipeline')

Enter fullscreen mode Exit fullscreen mode

Step 5: Running the Pipeline

Once the pipeline is uploaded, run it via the Kubeflow dashboard or programmatically.


# Run the pipeline

experiment = kfp_client.create_experiment('LLM Experiment')

run = kfp_client.run_pipeline(experiment.id, 'llm_pipeline_run', 'llm_pipeline.yaml')

Enter fullscreen mode Exit fullscreen mode

Step 6: Deploying the Model

Create a deployment pipeline to deploy the LLM to Azure Kubernetes Service (AKS).

Model Deployment Component

Create a file deploy_llm.py:


from kubernetes import client, config



def deploy_llm(model_path: str):

    config.load_kube_config()



    # Define deployment specs

    deployment = client.V1Deployment(

        metadata=client.V1ObjectMeta(name="llm-deployment"),

        spec=client.V1DeploymentSpec(

            replicas=1,

            selector={'matchLabels': {'app': 'llm'}},

            template=client.V1PodTemplateSpec(

                metadata=client.V1ObjectMeta(labels={'app': 'llm'}),

                spec=client.V1PodSpec(containers=[client.V1Container(

                    name="llm",

                    image="mydockerhub/llm:latest",

                    ports=[client.V1ContainerPort(container_port=80)],

                    volume_mounts=[client.V1VolumeMount(mount_path="/model", name="model-volume")]

                )],

                volumes=[client.V1Volume(

                    name="model-volume",

                    persistent_volume_claim=client.V1PersistentVolumeClaimVolumeSource(claim_name="model-pvc")

                )])

            )

        )

    )



    # Create deployment

    apps_v1 = client.AppsV1Api()

    apps_v1.create_namespaced_deployment(namespace="default", body=deployment)



deploy_llm_op = create_component_from_func(

    deploy_llm, base_image='python:3.8-slim'

)



@dsl.pipeline(

    name='LLM Deployment Pipeline',

    description='A pipeline to deploy a Large Language Model.'

)

def llm_deploy_pipeline(model_path: str):

    deploy_task = deploy_llm_op(model_path)



if __name__ == '__main__':

    kfp.compiler.Compiler().compile(llm_deploy_pipeline, 'llm_deploy_pipeline.yaml')

Enter fullscreen mode Exit fullscreen mode

Step 7: Running the Deployment Pipeline

Upload and run the deployment pipeline.


# Upload the deployment pipeline

kfp_client.upload_pipeline(pipeline_package_path='llm_deploy_pipeline.yaml', pipeline_name='LLM Deployment Pipeline')



# Run the deployment pipeline

experiment = kfp_client.create_experiment('LLM Deployment Experiment')

run = kfp_client.run_pipeline(experiment.id, 'llm_deploy_pipeline_run', 'llm_deploy_pipeline.yaml', params={'model_path': '<path-to-your-model>'})

Enter fullscreen mode Exit fullscreen mode

Conclusion

This example demonstrates how to create a Kubeflow pipeline for training and deploying a Large Language Model (LLM) on Azure Kubernetes Service (AKS). Adjust the model_path, Docker image, and other specifics as needed for your actual use case. The steps involve setting up the pipeline, running the training, and deploying the trained model, all within the Kubeflow framework.

To deploy containerized LLMs with Kubeflow on Azure, you'll need to follow these steps:

  1. Containerize Your LLM: Create a Docker image of your LLM application.

  2. Push the Docker Image to a Container Registry: Push the Docker image to Azure Container Registry (ACR) or Docker Hub.

  3. Create a Kubeflow Pipeline for Deployment: Define a Kubeflow pipeline to deploy your LLM application using the Docker image.

  4. Run the Deployment Pipeline: Execute the pipeline to deploy your LLM application on AKS.

Step 1: Containerize Your LLM

Create a Dockerfile for your LLM application.

Example Dockerfile


# Use an official Python runtime as a parent image

FROM python:3.11-slim



# Set the working directory in the container

WORKDIR /app



# Copy the current directory contents into the container at /app

COPY . /app



# Install any needed packages specified in requirements.txt

RUN pip install --no-cache-dir -r requirements.txt



# Make port 80 available to the world outside this container

EXPOSE 80



# Define environment variable

ENV NAME World



# Run app.py when the container launches

CMD ["python", "app.py"]

Enter fullscreen mode Exit fullscreen mode

Example app.py


from flask import Flask, request, jsonify

from transformers import AutoModelForCausalLM, AutoTokenizer



app = Flask(__name__)



model_name = "gpt2"

model = AutoModelForCausalLM.from_pretrained(model_name)

tokenizer = AutoTokenizer.from_pretrained(model_name)



@app.route('/predict', methods=['POST'])

def predict():

    data = request.json

    inputs = tokenizer.encode(data['text'], return_tensors='pt')

    outputs = model.generate(inputs)

    response = tokenizer.decode(outputs[0], skip_special_tokens=True)

    return jsonify({'response': response})



if __name__ == '__main__':

    app.run(host='0.0.0.0', port=80)

Enter fullscreen mode Exit fullscreen mode

Build and Push Docker Image


# Build the Docker image

docker build -t mydockerhub/llm:latest .



# Push the Docker image to Docker Hub or ACR

docker push mydockerhub/llm:latest

Enter fullscreen mode Exit fullscreen mode

Step 2: Push Docker Image to Azure Container Registry

If you prefer to use ACR:


# Log in to Azure

az login



# Create an ACR if you don't have one

az acr create --resource-group <your-resource-group> --name <your-registry-name> --sku Basic



# Log in to the ACR

az acr login --name <your-registry-name>



# Tag the Docker image with the ACR login server name

docker tag mydockerhub/llm:latest <your-registry-name>.azurecr.io/llm:latest



# Push the Docker image to ACR

docker push <your-registry-name>.azurecr.io/llm:latest

Enter fullscreen mode Exit fullscreen mode

Step 3: Create a Kubeflow Pipeline for Deployment

Create a deployment pipeline to deploy the containerized LLM.

Deployment Component

Create a file deploy_llm.py:


from kubernetes import client, config

from kfp.components import create_component_from_func

from kfp import dsl



def deploy_llm(image: str):

    config.load_kube_config()



    deployment = client.V1Deployment(

        metadata=client.V1ObjectMeta(name="llm-deployment"),

        spec=client.V1DeploymentSpec(

            replicas=1,

            selector={'matchLabels': {'app': 'llm'}},

            template=client.V1PodTemplateSpec(

                metadata=client.V1ObjectMeta(labels={'app': 'llm'}),

                spec=client.V1PodSpec(containers=[client.V1Container(

                    name="llm",

                    image=image,

                    ports=[client.V1ContainerPort(container_port=80)]

                )])

            )

        )

    )



    service = client.V1Service(

        metadata=client.V1ObjectMeta(name="llm-service"),

        spec=client.V1ServiceSpec(

            selector={'app': 'llm'},

            ports=[client.V1ServicePort(protocol="TCP", port=80, target_port=80)]

        )

    )



    apps_v1 = client.AppsV1Api()

    core_v1 = client.CoreV1Api()



    apps_v1.create_namespaced_deployment(namespace="default", body=deployment)

    core_v1.create_namespaced_service(namespace="default", body=service)



deploy_llm_op = create_component_from_func(

    deploy_llm, base_image='python:3.8-slim'

)



@dsl.pipeline(

    name='LLM Deployment Pipeline',

    description='A pipeline to deploy a containerized LLM.'

)

def llm_deploy_pipeline(image: str):

    deploy_task = deploy_llm_op(image=image)



if __name__ == '__main__':

    kfp.compiler.Compiler().compile(llm_deploy_pipeline, 'llm_deploy_pipeline.yaml')

Enter fullscreen mode Exit fullscreen mode

Step 4: Run the Deployment Pipeline

Upload and run the deployment pipeline.


# Upload the deployment pipeline

kfp_client = kfp.Client()

kfp_client.upload_pipeline(pipeline_package_path='llm_deploy_pipeline.yaml', pipeline_name='LLM Deployment Pipeline')



# Run the deployment pipeline

experiment = kfp_client.create_experiment('LLM Deployment Experiment')

run = kfp_client.run_pipeline(

    experiment.id, 

    'llm_deploy_pipeline_run', 

    'llm_deploy_pipeline.yaml', 

    params={'image': '<your-registry-name>.azurecr.io/llm:latest'}

)

Enter fullscreen mode Exit fullscreen mode

Conclusion

By following these steps, you can deploy a containerized LLM using Kubeflow on Azure. This process involves containerizing your LLM application, pushing the Docker image to a container registry, creating a deployment pipeline in Kubeflow, and running the pipeline to deploy your LLM application on Azure Kubernetes Service (AKS). Adjust the specifics as needed for your actual use case.

You can get more help here. Also you can get many Machine Learning and LLM notebooks including few for Kubeflow here.

Top comments (0)