DEV Community

Cover image for Solved: Automating Azure VM Shutdown at Night to Save Costs (Python SDK)
Darian Vance
Darian Vance

Posted on • Originally published at wp.me

Solved: Automating Azure VM Shutdown at Night to Save Costs (Python SDK)

🚀 Executive Summary

TL;DR: To significantly reduce Azure compute costs, this guide outlines how to automate the shutdown of non-production Azure Virtual Machines at night. It leverages the Azure Python SDK to deallocate VMs, ensuring you only pay for compute resources when actively in use.

🎯 Key Takeaways

  • Secure programmatic access to Azure VMs is achieved by creating an Azure Service Principal with the ‘Virtual Machine Contributor’ role, scoped to the subscription or specific resource groups.
  • The Azure Python SDK, specifically azure-identity for DefaultAzureCredential and azure-mgmt-compute for ComputeManagementClient, is used to authenticate and perform VM deallocation operations.
  • Scheduling the Python script with cron (or a similar scheduler) requires careful management of environment variables (AZURE\_CLIENT\_ID, AZURE\_CLIENT\_SECRET, AZURE\_TENANT\_ID, AZURE\_SUBSCRIPTION\_ID) to ensure the script can authenticate successfully.

Automating Azure VM Shutdown at Night to Save Costs (Python SDK)

Welcome to TechResolve, where we empower you with practical solutions for cloud optimization. Managing cloud infrastructure efficiently is key to controlling operational costs. One common scenario where expenses can quickly escalate is leaving Azure Virtual Machines (VMs) running 24/7, especially for development, testing, or non-production environments that are only utilized during business hours.

This tutorial provides a comprehensive, step-by-step guide to automate the shutdown of your Azure VMs at night using the Azure Python SDK. By deallocating VMs during off-peak hours, you can significantly reduce your Azure compute costs, as you only pay for compute resources when they are running. We’ll walk through setting up the necessary Azure components, developing a Python script, and scheduling its execution, ensuring your cloud expenditure aligns with your actual usage.

Prerequisites

Before we begin, ensure you have the following in place:

  • An active Azure subscription.
  • Azure CLI installed and configured on your local machine or a cloud shell. This is used for setting up the Service Principal.
  • Python 3.x installed on the machine where you intend to run the automation script.
  • Basic familiarity with Python programming and Azure resource management concepts.
  • Access to a Linux-based system (or WSL) for scheduling with cron, or an alternative scheduling service for Windows/Azure.

Step-by-Step Guide

Step 1: Create an Azure Service Principal and Assign Permissions

To allow our Python script to interact with your Azure resources securely and without human intervention, we’ll create an Azure Active Directory Service Principal. This acts as an identity for your application.

Execute the following Azure CLI command. Replace [YOUR_SUBSCRIPTION_ID] with your actual Azure subscription ID. Choose a unique name for your Service Principal (e.g., http://AzureVMSHutdownSP).

az ad sp create-for-rbac --name "http://AzureVMSHutdownSP" --role "Virtual Machine Contributor" --scope "/subscriptions/[YOUR_SUBSCRIPTION_ID]"
Enter fullscreen mode Exit fullscreen mode

This command creates a Service Principal and assigns it the “Virtual Machine Contributor” role at the subscription level. This role provides the necessary permissions to deallocate (shutdown) VMs. The output will contain important credentials: appId (client ID), password (client secret), and tenant (tenant ID).

Make sure to securely save the appId, password, and tenant from the command’s output. We will use these as environment variables in a later step.

Step 2: Install Python SDK and Develop the Shutdown Script

Next, we’ll set up our Python environment and write the script that identifies and shuts down your Azure VMs. First, install the required Azure SDK packages:

python3 -m pip install azure-identity azure-mgmt-compute
Enter fullscreen mode Exit fullscreen mode

Now, create a Python file, for example, azure_vm_shutdown_script.py, and add the following code. Remember to replace [YOUR_SUBSCRIPTION_ID], [YOUR_RESOURCE_GROUP_NAME_1], and [YOUR_RESOURCE_GROUP_NAME_2] with your specific values.

import os
import logging
from azure.identity import DefaultAzureCredential
from azure.mgmt.compute import ComputeManagementClient

# Configure logging for better visibility
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')

# Azure subscription ID (can also be an environment variable or passed explicitly)
# It's recommended to set this as an environment variable for production use.
subscription_id = os.environ.get("AZURE_SUBSCRIPTION_ID", "[YOUR_SUBSCRIPTION_ID]")
if not subscription_id or subscription_id == "[YOUR_SUBSCRIPTION_ID]":
    logging.error("AZURE_SUBSCRIPTION_ID not found or not updated. Please set the environment variable or provide it in the script.")
    exit(1)

logging.info(f"Attempting to authenticate for subscription: {subscription_id}")

try:
    # Authenticate using DefaultAzureCredential.
    # This automatically picks up credentials from environment variables (Service Principal)
    # or other Azure CLI/IDE configurations.
    credential = DefaultAzureCredential()
    compute_client = ComputeManagementClient(credential, subscription_id)
    logging.info("Successfully authenticated to Azure.")
except Exception as e:
    logging.error(f"Authentication failed: {e}")
    exit(1)

def shutdown_vms_in_resource_group(resource_group_name):
    """
    Shuts down (deallocates) all running VMs within a specified resource group.
    """
    logging.info(f"Checking VMs in resource group: {resource_group_name}")
    try:
        vms = compute_client.virtual_machines.list(resource_group_name)
        for vm in vms:
            # Check if the VM is running before attempting to deallocate
            # The 'power_state' attribute is often not directly available on 'list'
            # and might require fetching the VM instance individually for accurate state.
            # However, `begin_deallocate` is idempotent and safe to call.
            # For more robust filtering (e.g., by tag), you'd typically fetch details or use
            # the `list_all` method and then filter by resource group and other criteria.

            # A more direct check for state might involve `compute_client.virtual_machines.get(resource_group_name, vm.name, expand='instanceView')`
            # and then checking `vm_details.instance_view.statuses`.
            # For simplicity, we assume `begin_deallocate` handles non-running VMs gracefully,
            # which it generally does by returning immediately.

            logging.info(f"Processing VM: {vm.name} (ID: {vm.id})")

            # Deallocate the VM. This stops the VM and releases its compute resources.
            # We use begin_deallocate and .wait() to ensure the operation completes.
            logging.info(f"Initiating deallocation for VM: '{vm.name}'.")
            compute_client.virtual_machines.begin_deallocate(resource_group_name, vm.name).wait()
            logging.info(f"VM '{vm.name}' deallocated successfully.")

    except Exception as e:
        logging.error(f"Error processing resource group '{resource_group_name}': {e}")

# List of resource groups to manage. You can expand this list as needed.
# VMs in these resource groups will be targeted for shutdown.
resource_groups_to_manage = ["[YOUR_RESOURCE_GROUP_NAME_1]", "[YOUR_RESOURCE_GROUP_NAME_2]"]

if not resource_groups_to_manage or "[YOUR_RESOURCE_GROUP_NAME_1]" in resource_groups_to_manage:
    logging.warning("No valid resource groups specified for management or placeholders still present. Exiting.")
    exit(0)

logging.info(f"Starting VM shutdown process for specified resource groups: {', '.join(resource_groups_to_manage)}")
for rg in resource_groups_to_manage:
    shutdown_vms_in_resource_group(rg)

logging.info("Azure VM shutdown script finished.")
Enter fullscreen mode Exit fullscreen mode

Logic Explanation:

  • logging: Configured to provide informative messages about the script’s progress.
  • DefaultAzureCredential: This class from azure.identity is powerful. It attempts to authenticate using a chain of methods, including environment variables (which we’ll set next), Azure CLI, managed identities, etc. This makes the script flexible for different environments.
  • ComputeManagementClient: This is the client for interacting with Azure compute resources (like Virtual Machines).
  • shutdown_vms_in_resource_group function:
    • It lists all VMs within a specified resource group.
    • For each VM, it calls compute_client.virtual_machines.begin_deallocate().wait(). The deallocate operation stops the VM and releases its associated hardware, which is crucial for cost savings. .wait() ensures the operation completes before moving on.
    • Error handling is included to catch issues during VM processing.
  • resource_groups_to_manage: A list where you define which resource groups contain the VMs you want to shut down.

Step 3: Configure Environment Variables

For security and flexibility, it’s best practice to pass your Service Principal credentials and subscription ID to the Python script via environment variables. The DefaultAzureCredential will automatically pick these up.

Set the following environment variables in your shell. Replace the bracketed placeholders with the values obtained from Step 1 and your subscription ID:

export AZURE_CLIENT_ID="[YOUR_APP_ID]"
export AZURE_CLIENT_SECRET="[YOUR_PASSWORD]"
export AZURE_TENANT_ID="[YOUR_TENANT_ID]"
export AZURE_SUBSCRIPTION_ID="[YOUR_SUBSCRIPTION_ID]"
Enter fullscreen mode Exit fullscreen mode

You can add these lines to your user’s ~/.bashrc or ~/.profile file to make them persistent across sessions. After adding, run source ~/.bashrc (or . ~/.bashrc) to apply them immediately.

Alternatively, for more organized environments, you could use a config.env file and a tool like python-dotenv, but for simplicity, direct environment variables are effective.

Step 4: Schedule the Script with Cron

To automate the execution of your Python script, we’ll use cron, a time-based job scheduler in Unix-like operating systems.

Open your cron editor:

Open your cron editor
Enter fullscreen mode Exit fullscreen mode

Add the following line to schedule your script. This example schedules the script to run every night at 1:00 AM UTC. Make sure to use the full path to your Python script (e.g., /home/user/azure_vm_shutdown_script.py).

0 1 * * * python3 /home/user/azure_vm_shutdown_script.py
Enter fullscreen mode Exit fullscreen mode

Cron Syntax Breakdown:

  • 0: Minute (0-59)
  • 1: Hour (0-23, where 0 is midnight)
  • *: Day of month (1-31)
  • *: Month (1-12)
  • *: Day of week (0-7, where 0 or 7 is Sunday)

This entry tells cron to execute python3 /home/user/azure_vm_shutdown_script.py at 1:00 AM every day.

Remember that cron jobs run with a limited environment. Ensure that all necessary environment variables (AZURE_CLIENT_ID, etc.) are set in the cron environment or sourced from a script that sets them before running your Python script. A common approach is to create a small wrapper script:

# script_wrapper
#!/bin/bash
export AZURE_CLIENT_ID="[YOUR_APP_ID]"
export AZURE_CLIENT_SECRET="[YOUR_PASSWORD]"
export AZURE_TENANT_ID="[YOUR_TENANT_ID]"
export AZURE_SUBSCRIPTION_ID="[YOUR_SUBSCRIPTION_ID]"
python3 /home/user/azure_vm_shutdown_script.py
Enter fullscreen mode Exit fullscreen mode

Then, make the wrapper executable (chmod +x script_wrapper) and schedule the wrapper in cron: 0 1 * * * /home/user/script_wrapper.

Common Pitfalls

  • Insufficient Permissions: The Service Principal must have at least “Virtual Machine Contributor” role assigned at the correct scope (subscription or resource group) to deallocate VMs. If the script fails with permission errors, verify the assigned role and scope in Azure Portal.
  • Incorrect Environment Variables: If the AZURE_CLIENT_ID, AZURE_CLIENT_SECRET, AZURE_TENANT_ID, or AZURE_SUBSCRIPTION_ID are not set correctly or are not accessible by the cron job, the script will fail to authenticate. Double-check your environment variable setup, especially for cron jobs where the shell environment can differ.
  • VM Power State: While the deallocate command is idempotent, understanding the VM’s power state is essential for debugging. If a VM is already stopped (deallocated), the script might log that it’s attempting to deallocate, but no actual action occurs. For more precise control, you might fetch the VM’s instance view to confirm its current power state before attempting deallocation.

Conclusion

By following this guide, you have successfully implemented an automated solution for shutting down your Azure VMs at night using the Python SDK and scheduled with cron. This strategy is highly effective for reducing cloud expenditures by ensuring your compute resources are only active when needed.

This foundational script can be extended further: you could implement more sophisticated filtering (e.g., based on VM tags for opt-in/opt-out), dynamically discover resource groups, or even combine it with a morning startup script. Embrace automation to optimize your cloud environment and keep your costs in check.


Darian Vance

👉 Read the original article on TechResolve.blog


Support my work

If this article helped you, you can buy me a coffee:

👉 https://buymeacoffee.com/darianvance

Top comments (0)