The list of all current and planned Azure regions can be reviewed at the Azures Global Infrastructure Website which is publicly available [1]. However, it is not a flat list that one could skim through or re-use within a document but it requires interaction by selecting the corresponding regions and countries.
Fortunately, there is Pandas and Python, both of which can be used to extract the various tables from the website and put the data out as a convenient list on the console, into a file or for further processing.
Below, I will go through some lines of code with which this could be done. However, be aware that this approach has significant drawbacks: If the wording on the website or the structure of the table is changed, then the script will not work anymore and requires modification.
I'm pretty sure that these small tasks can soon be handled within a few seconds by AI Companions, such as the new Bing - but doing it manually is still fun, certainly provides an opportunity to learn and you know where the information comes from and how it was processed.
Happy reading. :-)
What are Azure Regions?
Azure operates in multiple datacenters around the world. These datacenters are grouped in to geographic regions, giving [Azure Customers] flexibility in choosing where to build [their] applications.
— Azure Documentation - What are Azure regions? [2]
How are new Regions announced?
New regions, as well as other Azure updates, are typically announced on the Azure Updates website. [3]
It can even be consumed as an RSS Feed.
Using Python and Pandas to extract the various regions and their status
Every now and then I need a list of all current and planned Azure regions. Since navigating the Azures Global Infrastructure Website is a very manual task, I thought about retrieving the data using Python. Here are the steps I took. [1]
First of all, a few libraries would be required:
-
requests
for sending HTTP requests and retrieving data from the web [4] -
pandas
to filter and present the data [5] -
lxml
since theread_html()
function of Pandas uses this library by default (does not need to be imported explicitly) [6]
The import statements would then look like this (search
is actually required within the script to find certain keywords within the data):
import requests
import pandas as pd
from re import search
As next step, the data from the Azure Website could be retrieved and Pandas' readhtml()
function could be used to read the various tables into dataframes.
url = 'https://azure.microsoft.com/en-us/explore/global-infrastructure/geographies/'
html = requests.get(url).content
df_list = pd.read_html(html)
When looking at the first slice of the dataframes, one could see that some further polishing might be needed and that there was a lot of information that was not necessarily of interest for a basic list.
for df in df_list[:1]:
print(df)
Output:
Regions \
0 Location
1 Year opened
2 Availability Zones presence
3 Compliance
4 Data residency
5 Disaster Recovery
6 Products by region
7 Available to
East Asia Start free \
0 Hong Kong
1 2010
2 Available with 3 zones
3 Global Compliance CIS Benchmark, CSA STAR Att...
4 Stored at rest in the Asia Pacific region Lea...
5 Cross-region options: Azure Site Recovery Re...
6 See products in this region
7 All customers and partners
Southeast Asia Start free
0 Singapore
1 2010
2 Available with 3 zones
3 Global Compliance CIS Benchmark, CSA STAR Att...
4 Stored at rest in the Asia Pacific region Lea...
5 Cross-region options: Azure Site Recovery Re...
6 See products in this region
7 All customers and partners
From above output, it is possible to derive some intermediate conclusions though:
- The dataframe containing the original table headers starts with
Regions
might not be of interest. - The header row of the other dataframes contains the Region, the first row contains the location.
This can be confirmed when looking at the header rows only:
for df in df_list[:1]:
for dc in list(df):
print(dc)
Regions
East Asia Start free
Southeast Asia Start free
One could check whether a row contains Regions
and if so, just ignore it. Then if it contains Coming soon
, one could set the state
variable to planned
and otherwise consider it active
.
The appropriate location is contained in the first row, which is why it could be derived from df[dc][0]
and then the region fields could need some clean-up by removing strings that are not of interest, like Start free
or Get started
.
regions_list = []
locations_list = []
for df in df_list:
for dc in list(df):
if search('Regions', dc):
pass
else:
if search('Coming soon', dc):
state = 'planned'
else:
state = 'active'
az_location = df[dc][0]
region = dc.removesuffix(' Start free')
region = region.removesuffix(' Get started')
region = region.removesuffix(' Coming soon')
Since there are duplicates in the list of dataframes (wich I assume are coming from the fact that there is a selection for nearby datacenters on the website, that is based on the currently selected region), it may be required to skip some of them, which is why a list could be useful that contains all regions that have already been processed.
Then, a dictionary can be created that holds the corresponding values and is finally appended to a list.
if region in regions_list:
pass
else:
regions_list.append(region)
locations_list.append(
dict({
'az_display_name': region,
'az_short_name': region.replace(' ','').lower(),
'az_location': az_location,
'az_state': state
})
)
As last step, the region details could be put into one (or separate) dataframes. Since the list of regions is relatively large, Pandas needs to be configured to display them all when the output is primarily for the console.
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
Once this is done, the dataframes can be put out to the console.
df = pd.DataFrame(locations_list)
df_planned_regions = df[df.values == 'planned']
df_active_regions = df[df.values == 'active']
# Print only active regions
print(df_active_regions)
# Print only planned regions
print('\n') # Newline
print(df_planned_regions)
# Print all regions, regardless of their status
print('\n') # Newline
print(df)
The list of planned regions would look something like this:
az_display_name az_short_name az_location az_state
13 Indonesia Central indonesiacentral Jakarta planned
17 Malaysia West malaysiawest Kuala Lumpur planned
18 New Zealand North newzealandnorth Auckland planned
19 Saudi Arabia Central saudiarabiacentral Saudi Arabia planned
20 Taiwan North taiwannorth Taipei planned
21 Austria East austriaeast Vienna planned
23 Denmark East denmarkeast Copenhagen planned
29 Greece Central greececentral Athens planned
30 Italy North italynorth Milan planned
32 Poland Central polandcentral Warsaw planned
33 Spain Central spaincentral Madrid planned
41 Chile Central chilecentral Santiago planned
42 Mexico Central mexicocentral Querétaro State planned
46 East US 3 eastus3 Georgia planned
60 US Sec West Central ussecwestcentral Undisclosed planned
62 Israel Central israelcentral Israel planned
Through the to_markdown()
function, we could even create it in markdown format right away.
print(df_planned_regions.to_markdown())
| | az_display_name | az_short_name | az_location | az_state |
|---:|:---------------------|:-------------------|:----------------|:-----------|
| 13 | Indonesia Central | indonesiacentral | Jakarta | planned |
| 17 | Malaysia West | malaysiawest | Kuala Lumpur | planned |
| 18 | New Zealand North | newzealandnorth | Auckland | planned |
| 19 | Saudi Arabia Central | saudiarabiacentral | Saudi Arabia | planned |
| 20 | Taiwan North | taiwannorth | Taipei | planned |
| 21 | Austria East | austriaeast | Vienna | planned |
| 23 | Denmark East | denmarkeast | Copenhagen | planned |
| 29 | Greece Central | greececentral | Athens | planned |
| 30 | Italy North | italynorth | Milan | planned |
| 32 | Poland Central | polandcentral | Warsaw | planned |
| 33 | Spain Central | spaincentral | Madrid | planned |
| 41 | Chile Central | chilecentral | Santiago | planned |
| 42 | Mexico Central | mexicocentral | Querétaro State | planned |
| 46 | East US 3 | eastus3 | Georgia | planned |
| 60 | US Sec West Central | ussecwestcentral | Undisclosed | planned |
| 62 | Israel Central | israelcentral | Israel | planned |
The table renders nicely in markdown.
az_display_name | az_short_name | az_location | az_state | |
---|---|---|---|---|
13 | Indonesia Central | indonesiacentral | Jakarta | planned |
17 | Malaysia West | malaysiawest | Kuala Lumpur | planned |
18 | New Zealand North | newzealandnorth | Auckland | planned |
19 | Saudi Arabia Central | saudiarabiacentral | Saudi Arabia | planned |
20 | Taiwan North | taiwannorth | Taipei | planned |
21 | Austria East | austriaeast | Vienna | planned |
23 | Denmark East | denmarkeast | Copenhagen | planned |
29 | Greece Central | greececentral | Athens | planned |
30 | Italy North | italynorth | Milan | planned |
32 | Poland Central | polandcentral | Warsaw | planned |
33 | Spain Central | spaincentral | Madrid | planned |
41 | Chile Central | chilecentral | Santiago | planned |
42 | Mexico Central | mexicocentral | Querétaro State | planned |
46 | East US 3 | eastus3 | Georgia | planned |
60 | US Sec West Central | ussecwestcentral | Undisclosed | planned |
62 | Israel Central | israelcentral | Israel | planned |
Please find the cohesive code below. It may not be very elegant but shows how tables from websites could be extracted using Python and Pandas.
import requests
import pandas as pd
from re import search
def list_azure_regions():
url = 'https://azure.microsoft.com/en-us/explore/global-infrastructure/geographies/'
html = requests.get(url).content
df_list = pd.read_html(html)
regions_list = []
locations_list = []
for df in df_list:
for dc in list(df):
if search('Regions', dc):
pass
else:
if search('Coming soon', dc):
state = 'planned'
else:
state = 'active'
az_location = df[dc][0]
region = dc.removesuffix(' Start free')
region = region.removesuffix(' Get started')
region = region.removesuffix(' Coming soon')
if region in regions_list:
pass
else:
regions_list.append(region)
locations_list.append(
dict({
'az_display_name': region,
'az_short_name': region.replace(' ','').lower(),
'az_location': az_location,
'az_state': state
})
)
return locations_list
if __name__ == '__main__':
azure_regions = list_azure_regions()
pd.set_option('display.max_rows', None)
pd.set_option('display.max_columns', None)
df = pd.DataFrame(azure_regions)
df_planned_regions = df[df.values == 'planned']
df_active_regions = df[df.values == 'active']
# Print only active regions
print(df_active_regions)
# Print only planned regions
print('\n') # Newline
print(df_planned_regions)
# Print all regions, regardless of their status
print('\n') # Newline
print(df)
References
# | Title | URL | Accessed-On |
---|---|---|---|
1 | Azure geographies | https://azure.microsoft.com/en-us/explore/global-infrastructure/geographies/#overview | 2023-02-16 |
2 | What are Azure regions? | https://learn.microsoft.com/en-us/azure/virtual-machines/regions#what-are-azure-regions | 2023-02-16 |
3 | Azure Updates | https://azure.microsoft.com/en-us/updates/ | 2023-02-16 |
4 | requests 2.28.2 | https://pypi.org/project/requests/ | 2023-02-16 |
5 | pandas 1.5.3 | https://pypi.org/project/pandas/ | 2023-02-16 |
6 | lxml 4.9.2 | https://pypi.org/project/lxml/ | 2023-02-16 |
Top comments (0)