Eyal Estrin for AWS Community Builders

Posted on Apr 22 • Originally published at eyal-estrin.Medium

Comparison of Cloud Storage Services

#aws #devops #cloud #design

When designing workloads in the cloud, it is rare to have a workload without persistent storage, for storing and retrieving data.

In this blog post, we will review the most common cloud storage services and the different use cases for choosing specific cloud storage.

Object storage

Object storage is perhaps the most commonly used cloud-native storage service.

It is been used by various use cases from simple storage or archiving of logs or snapshots to more sophisticated use cases such as storage for data lakes or AI/ML workloads.

Object storage is used by many cloud-native applications from Kubernetes-based workloads using CSI driver (such as Amazon EKS, Azure AKS, and Google GKE), and for Serverless / Function-as-a-Service (such as AWS Lambda, and Azure Functions).

As a cloud-native service, the access to object storage is done via Rest API, HTTP, or HTTPS.

Unstructured data is stored inside object storage services as objects, in a flat hierarchy, where most cloud providers call it buckets.

Data is automatically synched between availability zones in the same region (unless we choose otherwise), and if needed, buckets can be synched between regions (using cross-region replication capability).

To support different data access patterns, each of the hyperscale cloud providers, offers its customers different storage classes (or storage tiers), from real-time, near real-time, to archive storage, and a capability for configuring rules for moving data between storage classes (also known as lifecycle policies).

As of 2023, all hyperscale cloud providers enforce data encryption at rest in all newly created buckets.

Comparison between Object storage alternatives:

As you can read in the comparison table above, most features are available in all hyper-scale cloud providers, but there are still some differences between the cloud providers:

AWS – Offers a cheap storage tier called S3 One Zone-IA for scenarios where data access patterns are less frequent, and data availability and resiliency are not highly critical, such as secondary backups. AWS also offers a tier called S3 Express One Zone for single-digit millisecond data access requirements, with low data availability or resiliency, such as AI/ML training, Amazon Athena analytics, and more.
Azure – Most storage services in Azure (Blob, files, queues, pages, and tables), require the creation of an Azure storage account – a unique namespace for Azure storage data objects, accessible over HTTP/HTTPS. Azure also offers a Premium block blob for high-performance workloads, such as AI/ML, IoT, etc.
GCP – Cloud storage in Google, is not limited to a single region but can be provisioned and synched automatically to dual-regions and even multi-regions.

Block storage

Block storage is the disk volume attached to various compute services – from VMs, managed databases, Kubernetes worker notes, and mounted inside containers.

Block storage can be used as the storage for transactional databases, data warehousing, and workloads with high volumes of read and write.

Block storage is not just limited to traditional workloads deployed on top of virtual machines, they can be mounted as persistent volumes for container-based workloads (such as Amazon ECS), and for Kubernetes-based workloads using CSI driver (such as Amazon EKS, Azure AKS, and Google GKE).

Block storage volumes are usually limited to a single availability zone within the same region and should be mounted to a VM in the same AZ.

Comparison between Block storage alternatives:

As you can read in the comparison table above, most features are available in all hyper-scale cloud providers, but there are still some differences between the cloud providers:

AWS – Offers a feature called Amazon Data Lifecycle Manager, which automates the process of creation, retention, and deletion of EBS snapshots.
Azure – Offers the ability to manage data replication of persistent disks within the same region (Locally redundant storage / LRS and Zone-redundant storage / ZRS) and between primary and secondary regions (Geo-redundant storage / GRS, Geo-zone-redundant storage / GZRS, and Read-access geo-redundant storage / RA-GRS).
GCP – Offers the ability to replicate persistent disks across two zones in the same region (Regional Persistent Disk). GCP also offers the ability to pre-purchase capacity, throughput, and IOPS to be provisioned as needed (Hyperdisk Storage Pools).

File storage

File storage services are the equivalent of the traditional Storage Area Network (SAN).

All major hyperscale cloud providers offer managed file storage services, allowing customers to share files between multiple Windows (CIFS/SMB), and Linux (NFS) virtual machines.

File storage is not just limited to traditional workloads sharing files between multiple virtual machines, they can be mounted as persistent volumes for container-based workloads (such as Amazon ECS, Azure Container Apps, and Google Cloud Run), Kubernetes-based workloads using CSI driver (such as Amazon EKS, Azure AKS, and Google GKE, and for Serverless / Function-as-a-Service (such as AWS Lambda, and Azure Functions).

Other than the NFS or CIFS/SMB file storage services, major cloud providers also offer a managed NetApp files system (for customers who wish to have the benefits of NetApp storage) and managed Lustre file system (for HPC workloads or workloads that require extreme high-performance throughput).

Comparison between NFS File storage alternatives:

As you can read in the comparison table above, most features are available in all hyper-scale cloud providers, but there are still some differences between the cloud providers:

AWS – Offers cheap storage tier called EFS One Zone file system, for scenarios where data access pattern is less frequent, and data availability and resiliency are not highly critical. By default, data inside the One Zone file system is automatically backed up using AWS Backup.
Azure – Offers an additional security protection mechanism such as malware scanning and sensitive data threat detection, as part of a service called Microsoft Defender for Storage.
GCP – Offers enterprise-grade tier for critical applications such as SAP or GKE workloads, with regional high-availability and data replication called Enterprise tier.

Comparison between CIFS/SMB File storage alternatives:

Comparison between managed NetApp File storage alternatives:

Comparison between File storage for HPC workloads alternatives:

Summary

Persistent storage is required by almost any workload, including cloud-native applications.

In this blog post, we have reviewed the various managed storage options offered by the hyperscale cloud providers.

As best practice, it is crucial to understand the application's requirements, when selecting the right storage option.

About the Author

Eyal Estrin is a cloud and information security architect, and the author of the books Cloud Security Handbook, and Security for Cloud Native Applications, with more than 20 years in the IT industry.

You can connect with him on Twitter.

Opinions are his own and not the views of his employer.

👇Help to support my authoring👇

☕Buy me a coffee☕

DEV Community

Comparison of Cloud Storage Services

Object storage

Comparison between Object storage alternatives:

Block storage

Comparison between Block storage alternatives:

File storage

Comparison between NFS File storage alternatives:

Comparison between CIFS/SMB File storage alternatives:

Comparison between managed NetApp File storage alternatives:

Comparison between File storage for HPC workloads alternatives:

Summary

About the Author

Top comments (0)

Read next

AWS Credentials for Serverless

Create a knowledge base using Amazon OpenSearch

A Day in the Life of a DevOps Engineer

GIT for Beginners