axurcio

Posted on May 28, 2022 • Originally published at insight-services-apac.github.io on May 5, 2022

How to align your azure environment using right tiering strategy

#azure #tiering #strategy

Microsoft Azure is one of the major cloud providers in the technology space and many enterprise and small scale customers have migrated to the azure cloud as part of their digital transformation journeys. As Azure cloud provides a bundle of services aligning to the organisations existing principles and methodologies, it’s important to align these strategies during the foundation stage for each workload to better place them in the right tier.

Every workload hosting in an organisation has a story on how it should be provisioned on the basis of certain factors such as Availability, Resiliency, Fault tolerance and Disaster Recovery and additionally in terms of Recovery Point Objective(RPO) and Recovery Time Objective(RTO) and when deployed requires a certain criteria based on the criticality and impact without comprising the cost, though placing them in the right tier is equally important.

Most of the legacy and few of the latest workloads are dependent of running on virtual machines(VMs) due to factors such as supportability, frameworks and operating systems and many others. Hence, the majority of the workload ends up running as IaaS before organisations invest in modernising them.

This criteria focuses more on Infrastructure as a Service(IaaS) and to some extent can be applied to the Platform as a Service(PaaS) services too.

Microsoft Azure provides the following services to host these workloads based on the needs of the organisation/applications.

Virtual Machines(VMs)

Azure Virtual Machines (VM) is one of several types of on-demand, scalable computing resources that Azure offers. An Azure VM gives you the flexibility of virtualization without having to buy and maintain the physical hardware that runs it.
Single Instance Virtual Machine SLA varies based on the Managed disk SKU used for Operating System and Data Disk.

Managed Disks

Azure managed disks are block-level storage volumes that are managed by Azure and used with Azure Virtual Machines. Managed disks are like a physical disk in an on-premises server but virtualized.
Azure managed disks offer two storage redundancy options, locally-redundant storage(LRS - do support all azure regions and disk types such as HDD, SSD and Ultra Drives) and zone-redundant storage (ZRS - which is limited to specific regions and do support only premium disks now).
Managed Disks does not have a financially backed SLA itself. The availability of Managed Disks is based on the SLA of the underlying storage used and virtual machine to which it is attached, but they are designed for 99.999% availability.

Disk SKU	Operating System	Data Disk	SLA
Premium SSD or Ultra Disk	Yes	Yes	99.9%
Standard SSD	Yes	Yes	99.5%
Standard HDD	Yes	Yes	95%

Azure Backup(AB)

The Azure Backup service provides simple, secure, and cost-effective solutions to back up your data and recover it from the Microsoft Azure cloud.
Azure Backup supports multiple services such as virtual machines, Managed Disks, Azure File Shares and many others.
It guarantees at least 99.9% availability of the backup and restore functionality of the Azure Backup service.
It does support multiple types of replication to keep your storage/data highly available such as Locally redundant storage (LRS - creates 3 copies of the data in a storage unit in a datacenter ), Geo-redundant storage (GRS - replicates your data to a secondary region) and Zone-redundant storage (ZRS - replicates your data in availability zones, guaranteeing data residency and resiliency in the same region).

Azure Site Recovery(ASR)

Site Recovery helps ensure business continuity by keeping business apps and workloads running during outages. Site Recovery replicates workloads running on virtual machines (VMs) from a primary site to a secondary location or within a region between zones.
For each Protected Instance configured for Azure-to-Azure Failover, it guarantees a two-hour Recovery Time Objective.

Availability Sets

Spreading the virtual machines in the availability set provides redundancy from hardware, network and storage within the same datacenter. If a disaster occured at the datacenter level, service will not be available.
All azure regions do support availability sets unlike Availability Zones.
Availability set is meant for more than 2 virtual machines to avail the advantages and there may not be identical copies of the application.
Active Directory Domain Services(On-Premises AD) service is a good example when multiple instances are running in the same region.
For all Virtual Machines that have two or more instances deployed in the same Availability Set or in the same Dedicated Host Group, we guarantee you will have Virtual Machine Connectivity to at least one instance at least 99.95% of the time

Availability Zones

Spreading the virtual machines in availability zones provides redundancy from power, cooling, and networking infrastructure across datacenters(termed as zones made up of multiple datacenters). If there is a disaster occured at the zone level, service will be provided by other active zones.
Unlike availability sets, not all azure regions support zoning. It’s important to finalise a location before provisioning the applications.
Availability Zones are meant for applications which require disaster recovery capabilities within the region or running multiple instances across zones to provide high availability to your application.
Multiple instances of a web server running in each zone is a good example of providing availability to the application. For all Virtual Machines that have two or more instances deployed across two or more Availability Zones in the same Azure region, will have Virtual Machine Connectivity to at least one instance at least 99.99% of the time.

Tiering Categories

Below are the Tiering category definitions based on tags. This can be extended further based on the requirements and below categories can be used as a good starting point to tier your services based on organisation needs. It it recommended to tag the workloads and its dependencies for easier tracking.

Tier Category	Tag Name	Tag Value	Service Availability
High Availability + Backup	Tier	0	~ Zero RPO and RTO
Disaster Recovery + Backup	Tier	1	< 5mins of RPO and < 2hrs of RTO
Backup	Tier	2	< 24hrs of RPO and < 2days of RTO
No (Disaster Recovery + Backup)	Tier	3	Based on demand

Tier Category 0

This tier is meant for services which are critical and requires high-availability and provides foundation to the later tiered services. There are multiple ways to achieve this in azure such as using availability set or availability zones or hosting multiple instances of the service across regions.

Possibilities:
- Virtual Machines - can be hosted using Availability set or Availability zone for a Virtual machine or multiple in each region
- Managed Disks - recommended to use LRS or ZRS where possible within each region
- Azure Backup - recommended to use LRS storage or ZRS where possible, GRS is not required as secondary instance will be already running across region
- Azure Site Recovery - not recommended as the service itself is running multiple instances across regions or zones

Tier Category 1

This tier is meant for services which are critical or categorised under a production environment which has challenging RPO and RTO requirements. This would be a scenario where workloads don’t want to host in high-availability mode and use disaster recovery mechanisms to recover in case of the human or nature made disaster.

Possibilities:
- Virtual Machines - can be hosted using Availability set or Availability zone for a Single Virtual machine or multiple only in one region
- Managed Disks - recommended to use LRS and ZRS where possible within one region
- Azure Backup - recommended to LRS or ZRS where possible, if backup is not required in the secondary region or use GRS based storage for all the instances if backup data required
- Azure Site Recovery - recommended to enable for all the applicable instances and create a disaster recovery plan based on application criteria

Tier Category 2

This tier is meant for services which are categorised under staging, testing and development instances and considered as non-critical services. This would be a scenario where workloads don’t want to be hosted or replicated within same or other regions in case the disaster occurs.

Possibilities:
- Virtual Machines - can be hosted using Availability set or Availability zone for a Single Virtual machine or multiple only in one region
- Managed Disks - recommended to use LRS and ZRS where possible within one region
- Azure Backup - recommended to use LRS storage, or ZRS where possible
- Azure Site Recovery - not recommended to enable replication as these are non critical workloads

Tier Category 3

This tier is meant for services which are categorised under Proof of Concept(PoC), short term duration testing or development related workloads. This would be the scenario where workloads don’t want to be backed up or replicated expecting there is no impact with the availability and can be redeployed in case required.

Possibilities:
- Virtual Machines - can be hosted using Availability set or Availability zone for a Single Virtual machine or multiple only in one region
- Managed Disks - recommended to use LRS or ZRS where possible within one region
- Azure Backup - not recommended to use any replication type
- Azure Site Recovery - not recommended to enable replication as these are non critical workloads

Addendum

Availability %	Downtime per year	Downtime per quarter	Downtime per month	Downtime per week	Downtime per day (24 hours)
99% (“two nines”)	3.65 days	21.9 hours	7.31 hours	1.68 hours	14.40 minutes
99.5% (“two and a half nines”)	1.83 days	10.98 hours	3.65 hours	50.40 minutes	7.20 minutes
99.9% (“three nines”)	8.77 hours	2.19 hours	43.83 minutes	10.08 minutes	1.44 minutes
99.95% (“three and a half nines”)	4.38 hours	65.7 minutes	21.92 minutes	5.04 minutes	43.20 seconds
99.99% (“four nines”)	52.60 minutes	13.15 minutes	4.38 minutes	1.01 minutes	8.64 seconds
99.999% (“five nines”)	5.26 minutes	1.31 minutes	26.30 seconds	6.05 seconds	864.00 milliseconds

Conclusion

It is important to have tiering strategy prior to the migration/deploying an application. Tiering strategy doesn’t just help in aliging the services to the right tier but also plays vital role in delivering the service based on the criticality. Choosing the tier based on factors such as redundancy and replication options helps also in optimising the cost of the overall solution. Hope it helps!

DEV Community