DEV Community

med_karim_amimi for AWS MENA Community

Posted on

Load, Store, and Protect LinuxBased NFS Workloads in AWS

This article outlines the Linux-Based Network File Systems migration strategy from on-premise to AWS cloud.

Introduction:

Network File Systems (NFS) are widely used in on-premise applications in order to synchronously share the same content over multiple servers. This document describes the best practices for migrating Linux file-based applications to the cloud smoothly, efficiently and securely.
AWS is offering the most suitable way for its customers to see their data transferred to the cloud taking advantage of managed services that were designed to jointly provide a smooth migration process. This process would leverage the capabilities offered by AWS DataSync, Amazon EFS & AWS Backup.

Service Overview:

AWS DataSync:

It is used mainly to move large amount of data from on-premise to AWS. It can synchronize to: Amazon S3,Amazon EFS, Amazon FSx for Windows and it moves data from your NAS or file system via NFS or SMB. The data transfer via AWS Datasync can be secure, fast and cost-effective comparing to similar open source tools.
In this document, we will be focusing on using Datasync in conjunction with on-premise NFS and Amazon EFS.

Amazon Elastic File System:

EFS is Managed NFS that can be mounted on many EC2 in multi-AZ and provides highly available, scalable and and shareable storage. EFS is POSIX file system that scales automatically without capacity planning needed.

AWS Backup:

AWS Backup is fully AWS managed service that enables you to centralize and automate data protection. In this document, It will be used at to backup the EFS file system that we are moving data to.

Migration Guide:

Prerequisites:

  1. An active AWS account.
  2. Required permissions to create new resources for each of the mentioned services.
  3. A virtualization system that is capable of running the DataSync agent: VMware ESXi, Microsoft Hyper-V Hypervisor or Linux Kernel-based Virtual Machine (KVM).
  4. Web browser to activate the DataSync agent.

Setup and Configuration:

Elastic File System:
EFS could be created either using AWS Console, CLI or SDK. As discussed earlier, No setup, provisioning or capacity management are needed. The high availability level that customer may require depends on different storage classes offered by AWS (Based on redundancy options and Access frequency). It is up to the customer to define the application needs and criticality. The clients (Usually EC2) are allowed to access the EFS via mount targets that should be separately created. Each set of EC2 instances tied to an AZ needs its new target mount to be created in that particular AZ. Like EC2 instances, the EFS security is managed by Security Groups. It could be accessed by its DNS name. The DNS resolution of cross AZ target mounts is automatically managed.
High availability ,Durability, File system Life cycle among many other options could be set while creating the EFS. Encryption at rest is a recommended option for data protection purposes but can only be set during the file system creation while encryption in transit can be set later.
EFS offers two performance related modes:

  • Performance modes: The default option ,General Purpose, and the Max I/O Mode. The latter option offers an unlimited number of file system operation per second has a slightly higher latency per each file system operation.
  • Throughput modes: In the Bursting mode, The throughput scales linearly with the file system size with a base of 2.1TB of burst credit when creating a new EFS. The throughput burst can go up to 100 MB/s of writing-only operation which will be the main activity during the migration while the provisioned mode offers a fixed throughput that can be set while creating the EFS and adjusted later based on customer needs.

For the purposes of this document, we recommend selecting the General Purpose performance mode and to use provisioned throughput mode if your transfer rate should exceed 100MB/s.
AWS provides four EFS storage classes:

  • EFS Standard and EFS One Zone: Default setting used for frequently accessed files.
  • EFS standard-Infrequent access(Standard-IA) and EFS One Zone-Infrequent access (One Zone- IA): Used for files that are not used in a daily basis.

Choosing IA storage can reduce costs but increasing latency. The EFS lifecycle can automatically manage the file move based on already set custom policy. From operational point of view, EFS serves its clients transparently regardless the EFS storage class.
In this article, we recommend enabling lifecycle management for cost effectiveness.

AWS DataSync:
AWS Datasync is a AWS fully managed service useful for data migration and data synchronization between the on-premise and cloud environments. This document assumes that a Direct Connect is set for a better transfer rate(up to 10Gbps per Datasync Agent). AWS Datasync offers encryption in transit and data integrity checks. It performs some data processing for a more efficient bandwidth utilization.
The needed steps to perform this operation are:

  1. Download and activate the DataSync agent as a virtual machine in your on-premises environment.
  2. Create a ‘task’, which is the complete definition of a data transfer that will handle the transfer between two locations: an on-premises Network File System (NFS) server and Amazon EFS.
  3. Set the appropriate DataSync configuration options:
  • Enable NFS traffic (port 2049) in your on-premises firewall and AWS VPC.
  • Set DataSync configuration to share at least one Security Group with the EFS mount targets.
  • Configure VPC endpoints and enable additional on-premises firewall ports in case you want to avoid sending data over public internet.
  • Make sure that default metadata settings are turned on to preserve files ownership, permissions and timestamp.
  • Disable the automatic data verification step if you are sure that stop any activity against your on-premise file system before and during data transfer. The customer can turn off this option to reduce the transfer time.
  • Set a bandwidth limit for the task to prevent DataSync process to consume too many network resources.
  • Deploy and allocate as many as four DataSync agents to you task. This options allow to speed up the data transfer by overcoming the network limits.

Finally,you need to run the DataSync task either on-demand or periodically via the DataSync API.

AWS Backup:
AWS backup offers an extra data protection layer. Automatic AWS backup feature could be turned on/off during the creation of the EFS or later depending on the EFS storage class.

AWS Backup provides automated backup schedules, retention management, and lifecycle management,. It protects the data from any unintended events. AWS Backup is incremental which means that only the changes applied on the file system are recorded.
The backup plan is the policy expression that defines when and how you want to back up your AWS resources. Predefined backup plan are already provided by AWS but customer can add it own plans.

You can optionally add rules handling the data migration conditions and the purge timeline. Moving data to ‘cold’ storage may cost you few extra hours during the restore process. After you create your backup plan, you can choose which of your EFS file systems are governed by it. Two options are available to select the target file systems: The tag-based mechanism which allow the AWS backup process to consider each file system with a particular tag or by simply selecting the files systems IDs.

Backups are encrypted with a KMS key for security purposes. This process is managed by backup vaults created with AWS backup service.
The restore process can be triggered with the AWS console and the backup can be restored to an existing file system or to a newly created one.

Operations and Monitoring:

After setting all needed services for data migration to the cloud, we need to monitor the performance of each one of the services based on metrics provided by the service itself or the centralized monitoring service AWS CloudWatch.

AWS DataSync:
Basically, what we need is to make sure that the AWS DataSync is able to reach the data source and the data target all the time and to get status information about each task which helps to troubleshoot any issue that may occur. The information about every task phase is available as well to provide progress status.
The DataSync API can provide operational health information to external monitoring systems . You can also leverage AWS CloudWatch log group to centralize logs in AWS.
DataSync is well integrated also with Amazon CloudWatch to send events and metrics useful to set alarms for state change in an agent,location,task,or task execution. You can also get the real-time statistics about the transferred volume of data as a function of time.

Amazon EFS:
The metrics provided by Amazon CloudWatch could be useful to decide which EFS performance mode to be used. You can set initially the EFS performance to be at General purpose mode and permanently monitor the PercentIOLimit metric. If the metrics consistently stands around 100%, you may want to set your EFS file system at the Max I/O mode. Alternatively, you can split the workload across multiple General Purpose mode EFS.
AWS proposes two techniques to drive additional throughput while not reaching the I/O limit:
1. EFS recommends leveraging parallel I/O as much as possible taking advantage of the distributed design of EFS. A GitHub tutorial is available for more details regarding this technique.

2. You can also determine if your throughput is being rate-limited based on your storage. The CloudWatch Throughput utilization metric can be useful for this purpose. The MeteredIOBytes metric (ratio of metered throughput you driving on your EFS to PermittedThroughput) could be useful as well. If the ratio is equal to one, the file system is consuming all available throughput. You can see the same behavior if your EFS is on Bursting Throughput mode with a small file system with no burst credit. If this is the case and you have deterministic application throughput requirements, you can consider the Provisioned Throughput mode.
All discussed metrics are available on Amazon CloudWatch dashboards.

Pricing and Cost Considerations:

AWS DataSync:

The service pricing is $0.0125/GB or $12.80/TB transferred. This price is globally applicable.

Amazon EFS:

EFS pricing depends on the file system storage class, infrequent access IO request and the provisioned throughput. This document focuses on US-EAST-1 prices:

Storage Class Price (per GB-month)
EFS Standard storage $0.30
Standard IA storage $0.025
EFS One Zone Storage $0.16
EFS One Zone IA $0.0133

If you choose the Infrequent Access storage classes, an extra fee is applied for retrieving data or moving data to the IA storage class. The fee is $0.01/GB transferred.
You pay also for the provisioned throughput at the rate of $6/MB/s. You only pay if you provision more throughput than the what your EFS class storage/Size provides initially.

AWS Backup:

The backup service costs you $0.05/GB-month for warm storage and $0.01/GBmonth for cold storage.
Restores cost $0.02/GB from warm storage and $0.03/GB from cold storage.

Migration Cost Example:

The following example is made based on simplifying assumptions satisfying the industry estimates:
Assumptions:

  • Migrated Data: 100TB
  • Storage period: 1 year
  • IA stored data: 80% of overall data (never read after being moved to IA EFS)
  • Backup policy: Daily
  • Availability: Multi-AZ
Service Cost
AWS DataSync $0.0125/GB * 1000 GB/TB * 100 TB = $1,250
Amazon EFS Standard Storage $0.30/GB-month * 1000 GB/TB * 100 TB * 20% frequently accessed * 12 months/year = $72,000
Amazon EFS Standard IA I/O $0.01/GB * 1000 GB / TB * 100 TB * 80% infrequently accessed = $800
Amazon EFS Standard IA Storage $0.025/GB-month * 1000 GB/TB * 100 TB * 80% infrequently accessed * 12 months/year = $24,000
AWS Backup $0.05/GB-month * 1000 GB/TB * 100 TB * 12 months/year = $60,000
Total Cost $157,250

Assumptions:

  • Migrated Data: 100TB
  • Storage period: 1 year
  • IA stored data: 80% of overall data (never read after being moved to IA EFS)
  • Backup policy: Daily
  • Availability: One AZ
Service Cost
AWS DataSync $0.0125/GB * 1000 GB/TB * 100 TB = $1,250
Amazon EFS Standard Storage $0.16/GB-month * 1000 GB/TB * 100 TB * 20% frequently accessed * 12 months/year = $38,400
Amazon EFS Standard IA I/O $0.01/GB * 1000 GB / TB * 100 TB * 80% infrequently accessed = $800
Amazon EFS Standard IA Storage $0.0133/GB-month * 1000 GB/TB * 100 TB * 80% infrequently accessed * 12 months/year = $12,768
AWS Backup $0.05/GB-month * 1000 GB/TB * 100 TB * 12 months/year = $60,000
Total Cost $112,418

Conclusion:

This document explains how to set up,tune and monitor your migrating applications to the AWS cloud. The provided solution is based on AWS DataSync for migrating data, Elastic File System for storing the data and AWS Backup for data protection. The above detailed examples show how much it will cost you approximately to migrate 100TB of data from your on-premise data center to AWS Cloud.

Top comments (0)