DEV Community

Cover image for Getting started with Azure Machine Learning
Vivek0712
Vivek0712

Posted on

3 1

Getting started with Azure Machine Learning

In this second part of the Azure Machine Learning Series, we will discuss the following with regards to Azure Machine Learning.
image

Understanding Machine learning Workflows

We will see a quick recap of the introductory blog on Machine Learning. The workflows involved in Machine Learning are as follows

image

Challenges in MLOps

  • Logically create, maintain the resources
  • Keep track of ML Experiments and each Runs of experiments - Create, Reuse, delete environment with dependencies
  • Opting, provisioning, re-using Local/Cloud based compute
  • Maintaining different versions of Model
  • Re-using the existing ML workflows
  • Deploying and Maintaining the ML models

Tools to perform Machine Learning in Azure

  • Azure Portal
    • Azure Machine Learning Studio (UI + Coding)
    • Azure Machine Learning Designer (Completely UI + less coding)
  • Azure CLI
  • Azure Python SDK
    • Using Azure ML Studio
    • Visual Studio Code + AzureML Extensions
    • Other IDEs

Architecture of Azure Machine Learning

image

Before we get started...

Pre-requisites

  • Basic Python programming Language
  • Understanding of Machine Learning Workflows

Setup

  • Azure Account with Subscription Azure Sub
  • Create a Machine Learning Resource. create ml
  • Provide a name for the workspace, Container Register
  • Launch the Machine Learning Studio
  • Create Compute Resource
    1. On the left side, under Manage, select Compute.
    2. Select +New to create a new compute instance.
    3. Keep all the defaults on the first page, select Next.
    4. Supply a name and select Create.
    5. In about two minutes, you'll see the State of the compute instance change from Creating to Running. It's now ready to go.
  • Create Dataset

You can create datasets from datastores, public URLs, and Azure Open Datasets.

  • Launch a Notebook instance

You can follow this doc to create necessary Azure ML Resources

Creating Azure ML Resources

Workspace

An Azure ML Workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML Workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models.

You can create/access workspace by

  • Using Constructor
  • Using config.json file
from azureml.core import Workspace
try:
ws = Workspace(subscription_id = subscription_id, resource_group = resource_group, workspace_name = workspace_name)
ws.write_config()
ws = Workspace.from_config()
print("Workspace configuration succeeded. Skip the workspace creation steps below")
except:
print("Workspace not accessible. Create the workspace")
ws = Workspace.create(name= workspace_name,
subscription_id= subscription_id,
resource_group= resource_group,
create_resource_group=True,
location= workspace_region
)
# Fetch and Display the workspace
ws = Workspace.from_config()
#Display the details
#ws.get_details()
view raw workspace.py hosted with ❤ by GitHub

Compute

All ML Experiments requires Compute to execute. To Create / Access the Compute Resource use ComputeTarget Class

from azureml.core.compute import ComputeTarget, AmlCompute
from azureml.core.compute_target import ComputeTargetException
# Choose a name for your CPU cluster
cpu_cluster_name = "cpu-cluster"
# Verify that cluster does not exist already
try:
cpu_cluster = ComputeTarget(workspace=ws, name=cpu_cluster_name)
print("Found existing cpu-cluster")
except ComputeTargetException:
print("Creating new cpu-cluster")
# Specify the configuration for the new cluster
compute_config = AmlCompute.provisioning_configuration(vm_size="STANDARD_D2_V2",
min_nodes=0,
max_nodes=4)
# Create the cluster with the specified name and configuration
cpu_cluster = ComputeTarget.create(ws, cpu_cluster_name, compute_config)
# Wait for the cluster to complete, show the output log
cpu_cluster.wait_for_completion(show_output=True)
view raw compute.py hosted with ❤ by GitHub

Experiment

In Azure Machine Learning, an experiment is a named process, usually the running of a script or a pipeline, that can generate metrics and outputs and be tracked in the Azure Machine Learning workspace.

Create Experiment
An experiment can be run multiple times, with different data, code, or settings; and Azure Machine Learning tracks each run, enabling you to view run history and compare results for each run.

The Experiment Run Context

When you submit an experiment, you use its run context to initialize and end the experiment run that is tracked in Azure Machine Learning
You can log, monitor every run in the experiment.

from azureml.core import Experiment
import pandas as pd
# Create an Azure ML experiment in your workspace
experiment = Experiment(workspace = ws, name = 'my-experiment')
# Start logging data from the experiment
run = experiment.start_logging()
# All your experiment code goes here!!!
### BLAH! BLAH! BLAH! ML STUFFF
print("Hello ML World!!")
# Complete the experiment
run.complete()
view raw experiment.py hosted with ❤ by GitHub

Data

Data
Any Machine Learning problems involves working with Data.
It involves importing the data from the data source
Registering, Maintaining the dataset in Data Store
Versioning the dataset. You can learn more about Datasets here

#Check and List the datasets attached to our Workspace
from azureml.core import Dataset
print("\nData Stores:")
# Get the default datastore
default_ds = ws.get_default_datastore()
# Enumerate all datastores, indicating which is the default
for ds_name in ws.datastores:
print(ds_name, "- Default =", ds_name == default_ds.name)
print("\nDatasets:")
for dataset_name in list(ws.datasets.keys()):
dataset = Dataset.get_by_name(ws, dataset_name)
print("\t", dataset.name, 'version', dataset.version)
# Using the data
tab_data_set = Dataset.get_by_name(ws, dataset_name)
#Taking first 20 rows and converting it to a Pandas Dataframe
tab_data_set.take(20).to_pandas_dataframe()
#Upload your own data
# default_ds.upload_files(files=['./data/diabetes.csv'], # Upload the diabetes csv files in /data
# target_path='diabetes-data/', # Put it in a folder path in the datastore
# overwrite=True, # Replace existing files of the same name
# show_progress=True)
# Registering the Dataset with the workspace
try:
tab_data_set = tab_data_set.register(workspace=ws,
name='diabetes dataset',
description='diabetes data',
tags = {'format':'CSV'},
create_new_version=True)
except Exception as ex:
print(ex)
print('Datasets registered')
print("Datasets:")
for dataset_name in list(ws.datasets.keys()):
dataset = Dataset.get_by_name(ws, dataset_name)
print("\t", dataset.name, 'version', dataset.version)
view raw dataset.py hosted with ❤ by GitHub

Now we have all the necessary resources to train, deploy and monitor ML Models.

Stay tuned for the next blog in this series for the same.

Image of Timescale

🚀 pgai Vectorizer: SQLAlchemy and LiteLLM Make Vector Search Simple

We built pgai Vectorizer to simplify embedding management for AI applications—without needing a separate database or complex infrastructure. Since launch, developers have created over 3,000 vectorizers on Timescale Cloud, with many more self-hosted.

Read full post →

Top comments (0)