DEV Community

Troy
Troy

Posted on

Sizing AWS EFS bursting-mode to your application

AWS Elastic File System is a great tool for using shared storage in auto scaling group situations. There are two throughput modes to choose from for your file system, Bursting Throughput and Provisioned Throughput. With Bursting Throughput mode, throughput on Amazon EFS scales as the size of your file system in the standard storage class grows. EFS performance is well documented in this AWS knowledge base article, so we won't get too in-depth here.

One caveat with Bursting Throughput that we'll discuss in this post is bursting limitation for small file systems.

perf table
As you can see above in the table, the lower the File System Size (GiB) the lower the Baseline Aggregate Throughput (MiB/s).

To ensure you have proper initial baseline aggregate throughput, you'll need to increase the file system size using the tool dd. Once you have an EFS file system created and mounted (in this example, it's mounted /efs) use the following command to increase the size to 256GB:

cd /efs
sudo nohup dd if=/dev/urandom of=2048gb-b.img bs=1024k count=256000 status=progress &
Enter fullscreen mode Exit fullscreen mode

The 256GB value can be changed by modifying the count=argument, but bear in mind the above table for allowed % of time for bursting.

Deployed applications such as mounting your Jenkins share in an EFS mount will benefit from this. The Jenkins /workspace path for example, requires many burstable writes dependent on the job\pipeline\project count.

Top comments (5)

Collapse
 
ferricoxide profile image
Thomas H Jones II

Problem with EFS and bursting is, when you resort to the dd (and related) trick(s) in order to achieve acceptable performance, it can blow out the storage-cost savings you might otherwise achieve by having chosen a "pay-as-you-go" storage-class. It can be more cost-effective to set up an ASG-protected Gluster- or even Ceph cluster (when you want to provie a shared, hiearchical-filesystem storage-interface to applications).

At lower data-set sizes, S3 can provide a more-performant shared-storage solution vice EFS. Obvious trade-off is your data-sharing nodes need to coordinate their accesses to S3. Fortunately, GitLab and Artifactory both support this kind of direct, shared use of S3 (or other, API-compatible object-stores). I think there's a couple plugins to similarly-enable Jenkins (though, I turned over our Jenkins clusters a number of months ago, so no longer have current need to pursue it).

Collapse
 
dietertroy profile image
Troy

Yes, good point with the loss of storage-cost savings when block-writing out the storage. An alternative to using the bursting is obviously the provisioned mode, which.. at 1MBps provisioned per month costs equate to $6.00/mo.

A more reasonable cost estimate would be 20GB of data, stored in EFS standard storage with 5MBps provisioned throughput would cost $30.00/mo. This would still cost less than rolling-your-own on EC2 and ensuring distribution in an ASG.

Collapse
 
ferricoxide profile image
Thomas H Jones II

Also, thanks for both the original post and engaging on my comment. So many times, it seems like the people I converse with haven't done much in the way of meaningful exercises when it comes to performance, hosting-costs or (especially) life-cycle evaluations. Even most of the cloud-oriented people I come in contact seem to only be consumers of container services rather delving into the joys of cloud or cross-cloud enablement end of things.

Thread Thread
 
dietertroy profile image
Troy

Likewise! Thanks for the input and expertise. 👍

Collapse
 
ferricoxide profile image
Thomas H Jones II

Part of what drove us to EFS alternatives was that the automation I was writing was meant to cover deploying a DevOps tool-chain solution (GitLab, Jenkins, etc.) into both regular AWS commercial and regions where EFS wasn't even available (nor the more-recent, managed-Lustre offering).

Artifact of that was seeing markedly improved responsiveness (particularly in GitLab). When back-testing in a commercial region, we had to pre-allocate a significantly-larger chunk of EFS To get similar performance to a small Gluster cluster.

Never had the time to do a full "cost vs. responsiveness" test. Would have been interesting, but, until the GovCloud region(s) support EFS, would mostly have been an academic, rather than practical, effort.