Python: List all Current and Planned Azure Regions

#mentalhealth #productivity #discuss #career

The list of all current and planned Azure regions can be reviewed at the Azures Global Infrastructure Website which is publicly available [1]. However, it is not a flat list that one could skim through or re-use within a document but it requires interaction by selecting the corresponding regions and countries.

Fortunately, there is Pandas and Python, both of which can be used to extract the various tables from the website and put the data out as a convenient list on the console, into a file or for further processing.

Below, I will go through some lines of code with which this could be done. However, be aware that this approach has significant drawbacks: If the wording on the website or the structure of the table is changed, then the script will not work anymore and requires modification.

I'm pretty sure that these small tasks can soon be handled within a few seconds by AI Companions, such as the new Bing - but doing it manually is still fun, certainly provides an opportunity to learn and you know where the information comes from and how it was processed.

Happy reading. :-)

What are Azure Regions?

Azure operates in multiple datacenters around the world. These datacenters are grouped in to geographic regions, giving [Azure Customers] flexibility in choosing where to build [their] applications.

— Azure Documentation - What are Azure regions? [2]

How are new Regions announced?

New regions, as well as other Azure updates, are typically announced on the Azure Updates website. [3]
It can even be consumed as an RSS Feed.

Using Python and Pandas to extract the various regions and their status

Every now and then I need a list of all current and planned Azure regions. Since navigating the Azures Global Infrastructure Website is a very manual task, I thought about retrieving the data using Python. Here are the steps I took. [1]

First of all, a few libraries would be required:

requests for sending HTTP requests and retrieving data from the web [4]
pandas to filter and present the data [5]
lxml since the read_html() function of Pandas uses this library by default (does not need to be imported explicitly) [6]

The import statements would then look like this (search is actually required within the script to find certain keywords within the data):

import requests
import pandas as pd
from re import search

As next step, the data from the Azure Website could be retrieved and Pandas' readhtml() function could be used to read the various tables into dataframes.

url = 'https://azure.microsoft.com/en-us/explore/global-infrastructure/geographies/'
html = requests.get(url).content
df_list = pd.read_html(html)

When looking at the first slice of the dataframes, one could see that some further polishing might be needed and that there was a lot of information that was not necessarily of interest for a basic list.

for df in df_list[:1]:
    print(df)

Output:

                       Regions  \
0                     Location   
1                  Year opened   
2  Availability Zones presence   
3                   Compliance   
4               Data residency   
5            Disaster Recovery   
6           Products by region   
7                 Available to   

                               East Asia  Start free  \
0                                          Hong Kong   
1                                               2010   
2                             Available with 3 zones   
3  Global Compliance  CIS Benchmark, CSA STAR Att...   
4  Stored at rest in the Asia Pacific region  Lea...   
5  Cross-region options:  Azure Site Recovery  Re...   
6                        See products in this region   
7                         All customers and partners   

                          Southeast Asia  Start free  
0                                          Singapore  
1                                               2010  
2                             Available with 3 zones  
3  Global Compliance  CIS Benchmark, CSA STAR Att...  
4  Stored at rest in the Asia Pacific region  Lea...  
5  Cross-region options:  Azure Site Recovery  Re...  
6                        See products in this region  
7                         All customers and partners

From above output, it is possible to derive some intermediate conclusions though:

The dataframe containing the original table headers starts with Regions might not be of interest.
The header row of the other dataframes contains the Region, the first row contains the location.

This can be confirmed when looking at the header rows only:

for df in df_list[:1]:
    for dc in list(df):
        print(dc)

Regions
East Asia  Start free
Southeast Asia  Start free

One could check whether a row contains Regions and if so, just ignore it. Then if it contains Coming soon, one could set the state variable to planned and otherwise consider it active.

The appropriate location is contained in the first row, which is why it could be derived from df[dc][0] and then the region fields could need some clean-up by removing strings that are not of interest, like Start free or Get started.

regions_list = []
locations_list = []

for df in df_list:
    for dc in list(df):
        if search('Regions', dc):
            pass
        else:
            if search('Coming soon', dc):
                state = 'planned'
            else:
                state = 'active'

            az_location = df[dc][0]
            region = dc.removesuffix('  Start free')
            region = region.removesuffix('  Get started')
            region = region.removesuffix('  Coming soon')

Since there are duplicates in the list of dataframes (wich I assume are coming from the fact that there is a selection for nearby datacenters on the website, that is based on the currently selected region), it may be required to skip some of them, which is why a list could be useful that contains all regions that have already been processed.

Then, a dictionary can be created that holds the corresponding values and is finally appended to a list.

            if region in regions_list:
                pass
            else:
                regions_list.append(region)
                locations_list.append(
                    dict({
                        'az_display_name': region,
                        'az_short_name': region.replace(' ','').lower(),
                        'az_location': az_location,
                        'az_state': state
                    })
                )

As last step, the region details could be put into one (or separate) dataframes. Since the list of regions is relatively large, Pandas needs to be configured to display them all when the output is primarily for the console.

pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)

Once this is done, the dataframes can be put out to the console.

df = pd.DataFrame(locations_list)

df_planned_regions = df[df.values == 'planned']
df_active_regions = df[df.values == 'active']

# Print only active regions
print(df_active_regions)

# Print only planned regions
print('\n') # Newline
print(df_planned_regions)

# Print all regions, regardless of their status
print('\n') # Newline
print(df)

The list of planned regions would look something like this:

         az_display_name       az_short_name      az_location az_state
13     Indonesia Central    indonesiacentral          Jakarta  planned
17         Malaysia West        malaysiawest     Kuala Lumpur  planned
18     New Zealand North     newzealandnorth         Auckland  planned
19  Saudi Arabia Central  saudiarabiacentral     Saudi Arabia  planned
20          Taiwan North         taiwannorth           Taipei  planned
21          Austria East         austriaeast           Vienna  planned
23          Denmark East         denmarkeast       Copenhagen  planned
29        Greece Central       greececentral           Athens  planned
30           Italy North          italynorth            Milan  planned
32        Poland Central       polandcentral           Warsaw  planned
33         Spain Central        spaincentral           Madrid  planned
41         Chile Central        chilecentral         Santiago  planned
42        Mexico Central       mexicocentral  Querétaro State  planned
46             East US 3             eastus3          Georgia  planned
60   US Sec West Central    ussecwestcentral      Undisclosed  planned
62        Israel Central       israelcentral           Israel  planned

Through the to_markdown() function, we could even create it in markdown format right away.

    print(df_planned_regions.to_markdown())

|    | az_display_name      | az_short_name      | az_location     | az_state   |
|---:|:---------------------|:-------------------|:----------------|:-----------|
| 13 | Indonesia Central    | indonesiacentral   | Jakarta         | planned    |
| 17 | Malaysia West        | malaysiawest       | Kuala Lumpur    | planned    |
| 18 | New Zealand North    | newzealandnorth    | Auckland        | planned    |
| 19 | Saudi Arabia Central | saudiarabiacentral | Saudi Arabia    | planned    |
| 20 | Taiwan North         | taiwannorth        | Taipei          | planned    |
| 21 | Austria East         | austriaeast        | Vienna          | planned    |
| 23 | Denmark East         | denmarkeast        | Copenhagen      | planned    |
| 29 | Greece Central       | greececentral      | Athens          | planned    |
| 30 | Italy North          | italynorth         | Milan           | planned    |
| 32 | Poland Central       | polandcentral      | Warsaw          | planned    |
| 33 | Spain Central        | spaincentral       | Madrid          | planned    |
| 41 | Chile Central        | chilecentral       | Santiago        | planned    |
| 42 | Mexico Central       | mexicocentral      | Querétaro State | planned    |
| 46 | East US 3            | eastus3            | Georgia         | planned    |
| 60 | US Sec West Central  | ussecwestcentral   | Undisclosed     | planned    |
| 62 | Israel Central       | israelcentral      | Israel          | planned    |

The table renders nicely in markdown.

	az_display_name	az_short_name	az_location	az_state
13	Indonesia Central	indonesiacentral	Jakarta	planned
17	Malaysia West	malaysiawest	Kuala Lumpur	planned
18	New Zealand North	newzealandnorth	Auckland	planned
19	Saudi Arabia Central	saudiarabiacentral	Saudi Arabia	planned
20	Taiwan North	taiwannorth	Taipei	planned
21	Austria East	austriaeast	Vienna	planned
23	Denmark East	denmarkeast	Copenhagen	planned
29	Greece Central	greececentral	Athens	planned
30	Italy North	italynorth	Milan	planned
32	Poland Central	polandcentral	Warsaw	planned
33	Spain Central	spaincentral	Madrid	planned
41	Chile Central	chilecentral	Santiago	planned
42	Mexico Central	mexicocentral	Querétaro State	planned
46	East US 3	eastus3	Georgia	planned
60	US Sec West Central	ussecwestcentral	Undisclosed	planned
62	Israel Central	israelcentral	Israel	planned

Please find the cohesive code below. It may not be very elegant but shows how tables from websites could be extracted using Python and Pandas.

import requests
import pandas as pd
from re import search

def list_azure_regions():
    url = 'https://azure.microsoft.com/en-us/explore/global-infrastructure/geographies/'
    html = requests.get(url).content
    df_list = pd.read_html(html)

    regions_list = []
    locations_list = []

    for df in df_list:
        for dc in list(df):
            if search('Regions', dc):
                pass
            else:
                if search('Coming soon', dc):
                    state = 'planned'
                else:
                    state = 'active'

                az_location = df[dc][0]
                region = dc.removesuffix('  Start free')
                region = region.removesuffix('  Get started')
                region = region.removesuffix('  Coming soon')

                if region in regions_list:
                    pass
                else:
                    regions_list.append(region)
                    locations_list.append(
                        dict({
                            'az_display_name': region,
                            'az_short_name': region.replace(' ','').lower(),
                            'az_location': az_location,
                            'az_state': state
                        })
                    )

    return locations_list

if __name__ == '__main__':
    azure_regions = list_azure_regions()

    pd.set_option('display.max_rows', None)
    pd.set_option('display.max_columns', None)

    df = pd.DataFrame(azure_regions)

    df_planned_regions = df[df.values == 'planned']
    df_active_regions = df[df.values == 'active']

    # Print only active regions
    print(df_active_regions)

    # Print only planned regions
    print('\n') # Newline
    print(df_planned_regions)

    # Print all regions, regardless of their status
    print('\n') # Newline
    print(df)

References

#	Title	URL	Accessed-On
1	Azure geographies	https://azure.microsoft.com/en-us/explore/global-infrastructure/geographies/#overview	2023-02-16
2	What are Azure regions?	https://learn.microsoft.com/en-us/azure/virtual-machines/regions#what-are-azure-regions	2023-02-16
3	Azure Updates	https://azure.microsoft.com/en-us/updates/	2023-02-16
4	requests 2.28.2	https://pypi.org/project/requests/	2023-02-16
5	pandas 1.5.3	https://pypi.org/project/pandas/	2023-02-16
6	lxml 4.9.2	https://pypi.org/project/lxml/	2023-02-16