High Availability in Azure: Storage Redundancies

#azure #highavailability

Azure Storage Account

In Azure, the following entities are backed by Azure storage accounts: blobs, file shares, queues, NoSQL table storages, Data Lake Storage (gen2) and unmanaged disks. In this blog post, we'll go over the various redundancy options available for these storage accounts. We'll compare & contrast them based on the following parameters:

Replication latency: How soon before all replicas are in full-sync?
Disaster scenarios: Are you looking at partial data loss or fully unrecoverable data? How easy (or difficult) is it to get back on track once things have hit rock bottom?
SLAs: How many 9s?

Hopefully this blog post will serve as a cheat-sheet and help you choose the right Azure storage redundancy options for your use cases.

LRS (locally-redundant storage)

With LRS, your data is replicated thrice across multiple fault domains & update domains within a single storage scale unit (all within a single datacenter). Note that all three replicas are addressed by a single endpoint (i.e. you can't target individual replicas for read/write operations).

Replication latency: No replication latency, data is synchronously written to all three replicas on every write request.

Disaster scenarios:

disaster type	service interruption?	data loss?	recovery possible?
hardware failure in physical rack/node	NO	NO¹	N/A
datacenter disaster	YES	YES	NO²
availability zone disaster	"	"	"
regional disaster	"	"	"
geographic disaster	"	"	"
worldwide disaster	"	"	"

Since the replicas are spread across multiple fault domains.
Assuming all three replicas within the storage scale unit are affected, your data is permanently lost & unrecoverable.

SLAs:

object storage >= 99.999999999% (11 nines)
read requests (hot tier) >= 99.9% (3 nines)
read requests (cool tier) >= 99% (2 nines)
write requests (hot tier) >= 99.9% (3 nines)
write requests (cool tier) >= 99% (2 nines)

ZRS (zone-redundant storage)

With ZRS, your data is replicated across three availability zones within the same region (please note that currently not all regions support availability zones). As in the earlier case with LRS, all three replicas are addressed by a single endpoint.

Replication latency: Very low latency, data is synchronously written to all three replicas on every write request.

Disaster scenarios:

disaster type	service interruption?	data loss?	recovery possible?
hardware failure in physical rack/node	NO	NO¹	N/A
datacenter disaster	"	"	"
availability zone disaster	YES²	NO	N/A
regional disaster	YES	YES	NO³
geographic disaster	"	"	"
worldwide disaster	"	"	"

Only one replica will be affected, since the replicas are spread across different availability zones.
Temporary service interruption until Azure finishes DNS updates (not entirely sure how long these updates take, the official docs do not mention this). To mitigate this, best to use transient fault-handling patterns (retries with back-offs and circuit breakers) for all reads/writes on the storage account. More details can be found here and here.
Assuming all three replicas across the availability zones are affected, your data is permanently lost & unrecoverable.

SLAs:

object storage >= 99.9999999999% (12 nines)
read requests (hot tier) >= 99.9% (3 nines)
read requests (cool tier) >= 99% (2 nines)
write requests (hot tier) >= 99.9% (3 nines)
write requests (cool tier) >= 99% (2 nines)

GRS (geo-redundant storage)

With GRS, your data is replicated across two paired-regions (within the same Azure geography) in a primary region + secondary region setup. This ensures that one regional replica will be available in the event of a regional disaster.

The primary region & the secondary regions are addressed by separate endpoints. The secondary endpoint is generally inaccessible. However in case of a fail-over, the secondary is promoted to primary and read + write access is enabled for this endpoint. Fail-overs are automatically initiated by Azure in the event of a regional disaster. Azure is also introducing user-initiated fail-overs, which is currently in preview mode as of the time of writing this post.

Note: Both GRS (geo-redundant storage) and RA-GRS (read-access geo-redundant storage) are misnomers. They don't create redundant copies across Azure geographies, only across paired-regions within the same Azure geography.

Replication latency: Your data is first replicated synchronously within the primary region via LRS. The data is then replicated asynchronously to the secondary region (eventually consistent). Within the secondary region, it is replicated synchronously using LRS. The official SLA for Azure storage does not make any guarantees about the time needed for geo-replication.

Disaster scenarios:

disaster type	service interruption?	data loss?	recovery possible?
hardware failure in physical rack/node	NO	NO	N/A
datacenter disaster	YES¹	POSSIBLE²	YES³
availability zone disaster	"	"	"
regional disaster	"	"	"
geographic disaster	YES	YES	NO
worldwide disaster	"	"	"

Within the primary & secondary regions itself, the data is replicated via LRS. In the event of a datacenter disaster in the primary region, it is possible that all replicas within the storage scale unit are affected and the primary endpoint will now be both inaccessible & unrecoverable. Although the secondary region has replica data, its endpoint will be inaccessible until a fail-over is initiated (the data will be inaccessible until the fail-over is complete).
With GRS, the replication from primary to secondary regions is asynchronous. In the event of the primary being destroyed before it has completely replicated the data to secondary, the secondary will have a stale copy and un-replicated writes will be permanently lost.
Only when a fail-over has completed, the secondary endpoint becomes the new primary, accessible for read + write operations, with LRS replication.

SLAs:

object storage >= 99.99999999999999% (16 nines)
read requests (hot tier) >= 99.9% (3 nines)
read requests (cool tier) >= 99% (2 nines)
write requests (hot tier) >= 99.9% (3 nines)
write requests (cool tier) >= 99% (2 nines)

RA-GRS (read-access geo-redundant storage)

Same as GRS, but you always have read-only access to the secondary replica.

Replication latency: Same as GRS.

Disaster scenarios:

disaster type	service interruption?	data loss?	recovery possible?
hardware failure in physical rack/node	NO	NO	N/A
datacenter disaster	YES¹	POSSIBLE²	YES³
availability zone disaster	"	"	"
regional disaster	"	"	"
geographic disaster	YES	YES	NO
worldwide disaster	"	"	"

In the event of primary replica being destroyed, the secondary region will still have read-only access, even without a failover being initiated (unlike GRS where the secondary is inaccessible until a fail-over has been completed).
In the event of the primary being destroyed before it has completely replicated the data to secondary, the secondary will have a stale copy and un-replicated writes will be permanently lost (same as GRS).
Prior to fail-over, the secondary will have read-access. After fail-over, the secondary becomes the new primary, with read + write access and LRS replication.

SLAs: