Vivek0712

Posted on Feb 14, 2025

Predict NYC Taxi Fares | End-to-end Azure MLOps with GitHub Actions

Implementation: https://github.com/Vivek0712/mlops-v2-gha-demo

AI workload operations revolve around curation of data and consumption of that data. Operations ensure that you maintain quality, reliability, security, ethics, and other standards you prioritize for your AI/ML workloads. In this blog, you’ll learn how to set up an end-to-end MLOps pipeline on Azure using GitHub Actions to predict NYC taxi fares with a linear regression model. This project follows industry best practices like DevOps, DataOps, MLOps, and GenAIOps.

Introduction to AI Workloads and Operational Methodologies
Understanding GitHub Actions
Why Azure Machine Learning with GitHub Actions?
Prerequisites
Set Up Authentication with Azure and GitHub Actions
- Create a Service Principal
- Assign Permissions to the Service Principal
Set Up the GitHub Repo
Deploy the Machine Learning Project Infrastructure with GitHub Actions
Next Steps
References

Introduction to AI Workloads and Operational Methodologies

AI workloads can be broken down into three main categories:

Application Development
Data Handling
AI Model Management

Different operational methodologies come into play across these categories:

DevOps focuses on managing the application development lifecycle with automated CI/CD pipelines and monitoring.
DataOps extends DevOps to handle data extraction, transformation, and loading (ETL/ELT) processes. It monitors data flow, data cleansing, and anomaly detection.
MLOps operationalizes machine learning workflows, handling model training, validation, and deployment pipelines.
GenAIOps, a specialized form of MLOps, targets generative AI solutions, focusing on model discovery, refining pretrained models, and handling unstructured or semi-structured data.

These methodologies often overlap. For instance, in discriminative AI, DataOps plays a larger role, while in generative AI, DevOps might be more heavily utilized due to more complex application development pipelines.

Understanding GitHub Actions

GitHub Actions is a CI/CD platform that can automate software development workflows. It allows you to:

Build and test every pull request or commit.
Deploy merged pull requests to production.
Automate tasks on other repository events (e.g., add labels to issues on creation).

GitHub Actions provides Linux, Windows, and macOS runners, or you can use self-hosted runners. This flexibility ensures that you can tailor your workflows to your specific infrastructure needs.

Why Azure Machine Learning with GitHub Actions?

Azure Machine Learning integrates seamlessly with GitHub Actions to automate various stages of the machine learning lifecycle, such as:

Deploying infrastructure for ML (compute, storage, networking).
Performing data preparation (ETL/ELT).
Training ML models at scale.
Deploying trained models as real-time endpoints or batch endpoints.
Monitoring deployed models to track performance and detect anomalies.

In this blog, we’ll build an end-to-end MLOps pipeline to train a linear regression model to predict NYC taxi fares. We’ll be using the recommended Azure architecture for MLOps and the Azure MLOps (v2) solution accelerator to simplify the process.

Prerequisites

Azure Subscription
- If you don’t have one, you can create a free Azure account.
Azure Machine Learning Workspace
- Create a workspace.

Git installed locally.
A GitHub repository or a GitHub organization where you can create or fork repositories.
Basic understanding of Azure Machine Learning concepts like workspaces, compute, and endpoints.

Set Up Authentication with Azure and GitHub Actions

Before setting up an MLOps project using Azure Machine Learning, you need to configure secure authentication between GitHub Actions and Azure.

Create a Service Principal

Go to Azure Portal > Azure Active Directory > App registrations.
Select New Registration.

Configure:
- Name: Vivek-azureML-GHdemo (or any name you prefer)
- Supported account types: Accounts in any organizational directory (Any Microsoft Entra directory - Multitenant)
After creating the app registration, go to Certificates & Secrets and:

Select New client secret.
Copy the secret value and store it in a safe place (this value will be used in GitHub secrets).

Copy the Application (client) ID, Directory (tenant) ID, and Subscription ID for the next steps.

Important: Keep your secret value secure. You will not be able to view it again once you leave this page.

Assign Permissions to the Service Principal

Navigate to your Azure Subscription.
Go to Access Control (IAM).

Select +Add and choose Add role assignment.
Assign the Contributor role to your new Service Principal (Vivek-azureML-GHdemo) so that it has the necessary permissions to create and manage resources.

Set Up the GitHub Repo

We’ll use the MLOps v2 Demo Template Repo as our starting point:

Fork the repository
- Go to https://github.com/Azure/mlops-v2-gha-demo/fork to create a fork in your own GitHub org.

Configure secrets in your forked repository:

From your GitHub repo, go to Settings > Secrets > Actions.
Select New repository secret.
Set the following secrets using the corresponding values from your Azure Service Principal:
- AZURE_CREDENTIALS (Paste the full JSON output you obtained when creating the SP)
- ARM_CLIENT_ID
- ARM_CLIENT_SECRET
- ARM_SUBSCRIPTION_ID
- ARM_TENANT_ID

Here’s an example of what your AZURE_CREDENTIALS secret might look like (JSON snippet):

   {
     "clientId": "<YOUR_SERVICE_PRINCIPAL_APP_ID>",
     "clientSecret": "<YOUR_SERVICE_PRINCIPAL_SECRET>",
     "subscriptionId": "<YOUR_AZURE_SUBSCRIPTION_ID>",
     "tenantId": "<YOUR_AZURE_TENANT_ID>",
     "activeDirectoryEndpointUrl": "https://login.microsoftonline.com",
     "resourceManagerEndpointUrl": "https://management.azure.com/",
     "activeDirectoryGraphResourceId": "https://graph.windows.net/",
     "sqlManagementEndpointUrl": "https://management.core.windows.net:8443/",
     "galleryEndpointUrl": "https://gallery.azure.com/",
     "managementEndpointUrl": "https://management.core.windows.net/"
   }

Deploy the Machine Learning Project Infrastructure with GitHub Actions

Once your secrets are in place, you can use the Azure MLOps (v2) solution accelerator code in the forked repository to deploy the required Azure Machine Learning resources (e.g., workspace, compute, data stores, etc.).

In your forked repo, go to Actions.
Find the workflow that corresponds to infrastructure deployment—usually named something like .github/workflows/infra-deploy.yml.
Manually run the workflow or wait for the next push/merge to trigger it automatically.
Monitor the workflow’s progress on the Actions tab.
Once completed, verify the resources in the Azure portal:
- Azure Machine Learning Workspace
- Any associated resource group and compute resources

Note: If you need more specific instructions (for example, customizing the workflow), refer to the official documentation:

Deploy machine learning project infrastructure with GitHub Actions

Next Steps

After your infrastructure is deployed, you can proceed to:

Configure Data Preparation
- Use your new Azure Machine Learning workspace to set up data ingestion and data versioning (e.g., using Azure Blob Storage or Azure Data Lake).
Train and Evaluate the Model
- Commit your training scripts (e.g., a linear regression training script) to the GitHub repo.
- Update your GitHub Actions workflow to automatically trigger model training and evaluation upon code updates.
Model Deployment
- Deploy the trained model to an Azure Container Instance or Azure Kubernetes Service for real-time inference.
Monitor Your Model
- Set up alerts and performance monitoring for your deployed endpoints.
- Integrate logs, metrics, and triggers for data drift detection or model re-training.

For a full walkthrough of these steps, follow the official documentation:

Set up an MLOps project with GitHub Actions

References

Congratulations! You’ve set up an end-to-end MLOps pipeline using Azure Machine Learning and GitHub Actions to predict NYC taxi fares. With these best practices, you can continuously integrate and deploy changes to your ML models, ensuring reliability and scalability in your AI solutions.

Happy MLOps!

DEV Community