DEV Community

Cover image for Introduction to Cloud Storage
hridyesh bisht for AWS Community Builders

Posted on

Introduction to Cloud Storage

I spent over a couple of weeks reading about cloud storage, types of cloud storage, Block Storage( AWS EBS), Object Storage(AWS S3), different types of Objects Storage options, File Storage( AWS EFS) and disaster recovery using cloud storage. I have started with explaining what do you mean by cloud storage and how does it work.

What do you mean by Cloud Storage ?

Cloud storage stores data on the Internet through a cloud computing provider who manages and operates data storage as a service. It’s delivered on demand with just-in-time capacity and costs, and eliminates buying and managing your own data storage infrastructure.

How Does Cloud Storage Work?

Cloud storage is purchased from a third party cloud vendor who owns and operates data storage capacity and delivers it over the Internet in a pay-as-you-go model.

Applications access cloud storage through traditional storage protocols or directly via an API.

Credits: https://www.cbldatarecovery.com/blog/images/83.jpg

Benefits of Cloud Storage,

  1. Cost of Ownership: With cloud storage, there is no hardware to purchase, storage to provision. You can add or remove capacity on demand, quickly change performance and retention characteristics, and only pay for storage that you actually use.
  2. Time for Deployment: Cloud storage allows IT to quickly deliver the exact amount of storage needed, right when it's needed.

Types of Cloud Storage

There are three types of cloud data storage: object storage, file storage, and block storage. Each offers their own advantages and have their own use cases,

Credits: https://www.emc.com/content/dam/uwaem/production-design-assets/en/sdsaemmodule/images/ecs-object-storage-parking-your-data-storage-models-compared.gif

A. Block Storage: It breaks up data into blocks and then stores those blocks as separate pieces, each with a unique identifier.

  1. Block storage also decouples data from user environments, allowing that data to be spread across multiple environments.
  2. They are provisioned with each virtual server and offer the ultra low latency required for high performance workloads.
  3. Developers favor block storage for computing situations where they require fast, efficient, and reliable data transportation.

Amazon Elastic Block Store (Amazon EBS) provides block level storage volumes for use with EC2 instances. EBS volumes that are attached to an instance are exposed as storage volumes that persist independently from the life of the instance.

  1. Only a single EBS volume can be attached to a single EC2 instance, however multiple EBS volumes can be attached to a single instance.
  2. Every write to an EBS volume is replicated multiple times within the same availability zone of your region. This means that the EBS volume is only available in a single availability zone.
  3. Majorly Used for fast Input/Output Operations per second(IOPS).
  4. Provides an option to perform back-ups of data know as snapshots, they are incremental.

Different EBS Volume types,

  1. Solid state drive (SSD):
    1. Suited for work with smaller blocks as boot volumes for EC2 instances
  2. Hard disk drives (HDDs)
    1. Suited for workloads that require higher throughput/large blocks of data.

Credits: https://miro.medium.com/max/880/1*5S0qCBfI8Lc58eKARxFAyQ.png

How to create an EBS Volume?

  1. During the creation of a new instance and attach it at the time of the launch
  2. From within the EC2 dashboard of the AWS management console as a standalone volume ready to be attached to an instance when required.

Credits: https://bogotobogo.com/DevOps/AWS/images/EBS_Backed_Image_Creation/EBS_Backed_AMI_Creation.png

Do not use EBS for temporary storage or multi instance storage access as they can be accessed by one instance at a time.

For more information on on Block Storage,

  1. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html
  2. https://aws.amazon.com/ebs/?ebs-whats-new.sort-by=item.additionalFields.postDateTime&ebs-whats-new.sort-order=desc
  3. https://in.pcmag.com/storage/42372/ssd-vs-hdd-whats-the-difference

B.Object Storage: Data that does not conform to, or cannot be organized easily into, a traditional relational database with rows and columns.

They are ideal for building modern applications from scratch that require scale and flexibility, and can also be used to import existing data stores for analytics, backup, or archive.

AWS Simple Storage Service(S3) is a fully managed, object-based storage service that is highly available, highly durable, very cost-effective, and widely accessible.

S3 is a regional service and so when uploading data you as the customer are required to specify the regional location for that data to be placed in.

Credits: https://image.slidesharecdn.com/commonworkloadsontheawscloud-150327005120-conversion-gate01/95/common-workloads-on-the-aws-cloud-12-638.jpg?cb=1427690175

How to store objects in S3,

  1. Define and create a bucket( bucket = container for your data.).
  2. This bucket name must be completely unique, not just within the region you specify, but globally against all other S3 buckets that exist, because of the flat address space, you simply can't have a duplicate name.
  3. Once you have created your bucket you can then begin to upload your data within it. Any object uploaded to your buckets are given a unique object key to identify it.
  4. Can if required create folders within the bucket to aid with categorization of your objects for easier data management

Credits: https://d1.awsstatic.com/re19/Westeros/Diagram_S3_Access_Points.fa88c474dc1073aede962aaf3a6af2d6b02be933.png

Different types of AWS Storage classes,

  1. S3 Standard:
    1. It is ideal when you need high throughput with low latency with the added ability of being able to access your data frequently.
    2. S3 Standard offers eleven nines of durability across multiple availability zones. It offers a 99.99% availability SLA.
  2. S3 Intelligent Tiering:
    1. Ideal when the frequency of access to the object is unknown. Depending on your data access patterns of objects in the Intelligent Tiering Class, S3 will move your objects between two different tiers,
      1. frequent access
      2. infrequent access
    2. S3 Intelligent Tiering also offers 11 nines of durability across multiple availability zones. It offers a 99.9% availability SLA.
  3. S3 Standard infrequent access:
    1. It is designed for data that does not need to be accessed as frequently as data within the Standard tier, and yet still offers high throughput and low latency access, much like S3 Standard does.
    2. It carries that 11 nines durability across multiple AZs, by copying your objects to multiple availability zones within a single region to protect against AZ outages. It offers a 99.99% availability SLA.
  4. S3 One Zone Infrequent Access:
    1. An infrequent storage class it is designed for objects that are unlikely to be accessed frequently. It also carries the same throughput and low latency. The objects will be copied multiple times to different storage locations within the same availability zone instead of across multiple availability zones.
    2. One Zone IA offers the lowest level of availability which is currently 99.5 percent and this is down to the fact that your data is being stored in a single availability zone.

Credits: https://www.softnas.com/wp/wp-content/uploads/2019/09/amazon-s3-aws-storage-classes.png

AWS Simple Storage Service Glacier (S3 Glacier): The fundamental difference with the Amazon Glacier storage classes come at a fraction of the cost when it comes to storing the same amount of data than the S3 storage classes. As it doesn't provide you instant access to your data.

When retrieving your data it can take up to several hours to gain access to it depending on certain criteria. The data structure within Glacier is centered around vaults and Archives. A Glacier vault simply acts as a container for Glacier archives. These vaults are regional.

The Glacier dashboard within AWS management console allows you to create your vaults, set data retrieval policies, and event notifications. When it comes to moving data into S3 Glacier for the first time it's effectively a two-step process,

  1. You need to create your vaults as your container for your archives and this could be completed using the Glacier console.
  2. You need to move your data into the Glacier vault using the available API or SDKs.

The default Standard storage class within S3 Glacier,

  1. It is highly secure using in transit and at rest encryption low-cost and durable storage solution.
  2. The durability matches that of other S3 storage classes, being 11 nines across multiple availability zones, and the availability of S3 Glacier is 99.9%.
  3. It does offer a variety of retrieval options depending on how urgently you need the data back, each offering a different price point. These being expedited, Standard, and bulk.
    1. Expedited: This is used when you have an urgent requirement to retrieve your data but the request has to be less than 250 megabytes.
    2. Standard: This can be used to retrieve any of your archives no matter their size but your data will be available in three to five hours.
    3. Bulk: This option is used to retrieve petabytes of data at a time, however, this typically takes between five and twelve hours to complete.

Credits: https://mk0digitalcloud3kwjy.kinstacdn.com/wp-content/uploads/2018/10/Glacier-Retrieval-Options-1024x405.jpg

S3 Glacier Deep Archive: An ideal storage class for circumstances that require specific data retention regulations and compliance with minimal access. The durability and availability matches that of S3 Glacier with 11 nines durability across multiple AZs with 99.9% availability.

Deep Archive, however, does not offer multiple retrieval options. Instead, AWS states that the retrieval of the data will be within 12 hours or less.

Credits: https://blocksandfiles.com/wp-content/uploads/2020/11/AWS-archive-tier-tiering-Nov-2020.jpg

It is essentially used for data archiving and long-term data retention, and is commonly referred to as the cold storage service within AWS.

For more information on Object Storage,

  1. https://aws.amazon.com/what-is-cloud-object-storage/
  2. https://aws.amazon.com/s3/
  3. https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingObjects.html

C.File Storage: It is a hierarchical storage methodology used to organize and store data on a computer hard drive or on network-attached storage (NAS) device.

  1. In file storage, data is stored in files, the files are organized in folders, and the folders are organized under a hierarchy of directories and sub directories.
  2. They are ideal for use cases like large content repositories, development environments, media stores, or user home directories.

Amazon Elastic File Storage(EFS) is considered file level storage and is also optimized for low latency access, but unlike EBS it supports access by multiple EC2 instances.

  1. It appears to users like a file manager interface and uses standard file system semantics, such as locking files, renaming files, updating them, and using a hierarchical structure.
  2. EFS provides simple, scalable file storage for use with Amazon EC2 instances. EC2 instances can be configured to access Amazon EFS instances using configured mount points.

Some features of the service,

  1. It's a fully managed file system for multiple EC2 instances, allowing it to serve as a common data source across potentially thousands of EC2 instances.
  2. It uses standard operating system APIs, so any application that is designed to work with standard operating system APIs will work with EFS.
  3. It's replicated across availability zones in a region, meaning it's highly available.
  4. It provides low latency and can support thousands of concurrent connections.
  5. The through put and IOPS scale dynamically as required.

Credits: https://d1.awsstatic.com/r2018/b/EFS/product-page-diagram-Amazon-EFS-Launch_How-It-Works.cf947858f0ef3557b9fc14077bdf3f65b3f9ff43.png

Q.How do we use EFS?

  1. You create the file system, you can access it by creating mount points within your virtual private cloud, or VPC.
  2. Once the mount points have been created, the NFS file system target can be accessed from any other machine.
  3. When the file system has been mounted to one or more instances, you can read and write data to the file system. That means that users within the region have access to a common data source.
  4. EFS can also be used for premises-based solutions, Amazon provides an option called AWS Direct Connect.
  5. An EFS file system can be mounted on a premises-based server then data can be migrated to the AWS cloud, hosted on an EFS file system.

Credits: https://image.slidesharecdn.com/5-170425003714/95/choosing-the-right-cloud-storage-for-media-and-entertainment-workloads-april-2017-aws-online-tech-talks-27-638.jpg?cb=1493080723

AWS Cloud Storage for Disaster Recovery

Cloud storage services can be considerably cheaper as a backup solution than that of your own on-premise solution. The speed in which you can launch an environment within AWS to replicate your on-premise solution, with easy access to production backup data, is of significant value to many organizations.

Considerations when planning an AWS DR Storage Solution,

Q1.How will you get your data in and out of AWS ?

The method on which you choose to move your data from on-premise into the cloud can vary depending on your own infrastructure and circumstances.

  1. You can use Direct Connect connection to AWS to move data in and out of the environment.
  2. If you don't have a direct connection link between your data center and AWS then you may have a hardware or software VPN connection which could also be used.
  3. If you don't have either of these as connectivity options then you can use your own internet connection from the data center to connect and transfer the data to AWS.
  4. The AWS Storage Gateway service is a method which acts as a gateway between your data center and your AWS environment.

Depending on how much data you need to move or copy to AWS, then these lines of connectivity may not have the required bandwidth to cope with the amount of data transferred. In this instance, there are physical disc appliances that are available,

  1. AWS Snowball service, whereby AWS will send you an appliance, either 50 Terrabytes or 80 Terrabytes in size, to your data center, where you can then copy your data to it before it is shipped back to AWS for uploading onto S3.
  2. AWS Snowmobile, this is an Exabyte-scale data transfer service, where you can transfer up to 100 Petabytes per Snowmobile, which is a 45-foot long shipping container pulled by a semi-trailer truck.

Credits: https://i.ytimg.com/vi/qSoZ56x-Ta0/maxresdefault.jpg

Q2.How quickly do you need your data back?

Some storage services offer immediate access to your data, such as Amazon S3, while others may require several hours to retrieve, such as Amazon Glacier Standard Retrieval.

Q3.How to Secure the data?

When working with sensitive information, you must ensure that you have a means of encryption both in-transit and when at rest.

To check how AWS storage services stack up against this governance, AWS has a service called AWS Artifact, it allows customers to view and access AWS Compliance Reports.

Some of these security features, which can help you maintain a level of data protection are:

  1. IAM Policies
  2. Bucket Policies
  3. Access Control Lists
  4. Lifecycle Policies
  5. Multi-Factor Authentication Delete ensures that a user has to enter a 6 digit MFA code to delete an object, which prevents accidental deletion due to human error.
  6. Enabling versioning on an S3 bucket, ensures you can recover from misuse of an object or accidental deletion, and revert back to an older version of the same data object.

Credits: https://image.slidesharecdn.com/aws-security-fundamentals-0923-1beef3da-0459-4147-a5a4-fb6541bd6c0e-976015146-190918173203/95/aws-cloud-security-fundamentals-24-638.jpg?cb=1568827942

S3 as a data backup Solution?

Amazon S3 is a highly available and durable service, with huge capacity for scaling and along with numerous security features to maintain a tightly secure environment.

This makes S3 an ideal storage solution for static content, which makes Amazon S3 perfect as a backup solution. Amazon S3 provides three different classes of storage, each designed to provide a different level of service and benefit.

  1. S3 Standard
  2. S3 Infrequent Access
  3. S3 Glacier

As a general rule, if your data retrieval will take longer than a week using your existing connection method, then you should consider using AWS Snowball.

You can use AWS Storage Gateway for on-premise data backup. It allows integration between your on-premise storage and that of AWS. This connectivity can allow you scale your storage requirements both securely and cost efficiently.

Storage Gateway offers different configurations and options allowing you to use the service to fit your needs. It offers file, volume and tape gateway configurations which you can use to help with your DR and data backup solutions.

Credits: https://d1.awsstatic.com/Image_Thumbs/webinar-slides-thumb/April%202017%20AWS%20SGW%20Deep%20Dive%20thumb.0ede2ac94927e2efefa5665daf0583f8c8358b0a.png

For more information on Disaster Recovery,

  1. https://docs.aws.amazon.com/wellarchitected/latest/reliability-pillar/plan-for-disaster-recovery-dr.html
  2. https://aws.amazon.com/blogs/database/implementing-a-disaster-recovery-strategy-with-amazon-rds/
  3. https://docs.aws.amazon.com/prescriptive-guidance/latest/backup-recovery/on-premises-to-aws.html

For more information, https://awseducate.instructure.com/courses/197/pages/aws-cloud-computing-fundamentals?module_item_id=9215

I will be spending next couple of weeks focusing on AWS databases. Let me know where i could improve at?

Top comments (1)

Collapse
 
user_f41165616b profile image
user_f41165616b • Edited

We can more and more actively use cloud storage. For example salesforce quickbooks integration
Ensure that all virtual warehouses are set to suspend automatically. When they’re done processing queries, auto-suspend will turn off your virtual warehouses and stop credit consumption.