DEV Community: Zesty_tech

Improving Storage Efficiency for Solr & Elasticsearch

Zesty_tech — Wed, 29 Nov 2023 10:07:43 +0000

By: Omer Hamerman, Principal DevOps Engineer at Zesty

As a software engineer, I saw heaps of data become mountains. Storage became cheaper, and organizations started saving more and more data. But that data isn’t worth much if you can’t browse or analyze it.

When searching for solutions to this problem, I found search engines like Solr and Elasticsearch helped me leverage this data via reverse indexing and machine learning algorithms. I managed e-commerce stores, and while they come with a nice list of predefined fields (e.g., size, color, fabric), the product descriptions were unstructured data that could still carry valuable information for the customers. With a full-text search, I could improve user experience without manually forcing the description into an artificial structure.

But, when I started using search engines, I often struggled with resource requirement estimation. I learned that even the creators of these systems recommend iterative approaches where I have to reevaluate my needs frequently. Over the years, I embraced this approach and found ways to get my resources under control.

Storage is one of these resources. I’ll explain some of the methods I found to enhance the performance and lower my Solr and Elasticsearch deployment costs.

How Do Search Engines Work?

A search engine like Solr or Elasticsearch is a document database that helps users find specific documents. It creates different indexes for each document field. This means every document field can have one or more associated indexes enabling different types of searches.

For example, a reverse index allows users to search for documents with specific words. The index works by creating a lookup table of all used words. Each word is linked to a list of documents containing this given word. When searching for a comment, the engine doesn’t have to check each document for the word; it simply looks up the word in the index and returns the associated documents.

Unlike relational databases, which try to eliminate data duplicates to improve consistency and storage footprints, a search engine is optimized for data searchability and access. This means that a data set saved in a relational database is usually much smaller than when stored in a search engine, which has ramifications when it comes to storage costs.

Which Storage Factors Can Impact Search Performance and Costs?

When it comes to search engines, I’ve found that different storage factors influence the performance and costs of my searches.

Disk I/O

Using disks with low I/O specs has a negative impact on search performance. A slow disk can drastically prolong the response times of a search engine, especially when the workload includes huge indices or complex queries.

If you’re running an e-commerce site, you probably know slow responses are correlated with bounce rates, meaning saving on disk I/O can result in the indirect cost of losing customers. To keep your customers happy, you want to keep response times low. But the only way to do this while using slow disks is excessive caching in memory, which, in turn, raises costs again—and often to the degree that your savings on I/O evaporate.

Storage Capacity

Another issue I encountered was insufficient storage capacity. Search engines work by building indexes for the data they ingest. One approach for this is to create a lookup table that uses each word in the data as a key and a list of documents including that word as a value. Most words are used in many documents, so these indexes become large quickly.

Insufficient storage capacity can limit the size of these indexes and, in turn, their performance. But more storage can raise your monthly bill.

Take the e-commerce example. Some of these sites have thousands of products, in dozens of categories, and each of them has descriptions that need to be searchable. Since descriptions for products of the same category have a high probability of sharing many words, each index entry for a word can get really big because it’s used in many places. Storage limits will impact the size of the index, only allowing a subset of words or products to be searchable.

Then there’s the question of how long the data needs to be retained and how far back the backups should go. If I choose retention times that are too short, I might save money but reindex too often, which hurts performance. If retention times are too long, I can use my indexes for a longer period, but, again, I’ll increase costs. The same is true for backups. More storage for backups can lower the risk if things go wrong; but while cheaper than live storage for an active search engine, storage for backups isn’t free.

Improving Storage Performance and Cost Efficiency

I used several different optimization methods to help me get the most out of my search engines. Make sure to integrate each method into a recurring process to reevaluate each requirement with up-to-date production data.

Mapping Document Fields

Mapping is the process of defining how a search engine should index each document field. Usually, this process is done automatically when I save a new document, but creating mappings manually is a good idea to improve performance and save costs.

The default mapping algorithm generates two mappings for text fields—one for a full-text search index and one for a keyword search index.

Full-text search is good for fuzzy searches in continuous text because I might want to search for words that are similar to “scarf” and expect to find documents that may only include “scarves.”

Keyword mapping is less flexible, but if I have fields like clothing sizes, I know that we only sell five sizes, meaning I don’t need that fuzziness. Depending on the use case, having two indexes for each text field wastes storage and slows the system down.

I also disable the mapping of specific fields entirely, preventing the search engine from creating an index. This way, the search engine doesn’t need to index the entire data set, lowering the bill accordingly.

I always check what fields my documents have and make sure to choose the best mapping for each.

Defining Data Retention, Replication, and Backup Policies

There are a few best practices I follow to optimize storage performance and cost efficiency.

First, I define data retention policies based on business requirements and consider implementing data lifecycle management strategies to optimize storage costs. Moving older data into slower storage can save money while keeping the data around if it’s needed in the future. But the cheapest storage is no storage at all, so I check to know which data to retain. I want to keep fast indexes for popular products, but might want to save a bit of money on the more niche inventory.

Replication is important for reliability and performance. After all, I choose a search engine because it lets my users search for data faster. So replicating it close to users can lower latency, which in the case of my e-commerce stores, had a positive impact on revenue. Again, don’t go overboard here; subsecond responses might seem nice on paper but aren’t a requirement for all interactions on my website.

Finally, I assess my backup requirements. Replicas can reduce the risk of needing a backup, but can’t eliminate it completely. I make sure I have a backup to restore past states if my data is destroyed but also keep it reasonable.

Automating Recurring Storage Estimations

To get the most out of my search engine, I ensure my storage system can handle the required read-and-write operations efficiently. This means choosing the right storage type and provisioning it with the correct size. On the other hand, I don’t want to go overboard with my resources. Storage that isn’t used still costs money, so I want to provide as much as necessary, but no more. Usually, this requires a manual resource estimation process, but tools like Zesty Disk automate this chore.

Zesty Disk is a block storage auto-scaler that automatically expands and shrinks block storage. In fashion e-commerce stores, where products change each season, the indexes grow and shrink frequently, and with this, the storage requirements too. Zesty Disk will add volumes to my filesystem so I always have exactly what I need, plus a buffer for new data. And if I remove indexes or documents from the search engine, Zesty Disk will remove volumes and recoup expenses by removing capacity that’s no longer needed.

This behavior perfectly aligns with the need to re-estimate resource requirements regularly. I might not know how much storage a search engine will need in the future, but when resizing disks in short time frames, I can be as close as possible to the optimal space needed by using Zesty Disk.

Summary

Search engines, like Solr and Elasticsearch, let users query data in a flexible way, which is crucial in times of ever-growing mountains of information. But, resource allocation becomes an issue that requires an operator to reevaluate resource requirements continuously. Each season, I had to check how our inventory changed to ensure the resources could handle it.

Automatic scaling solutions made my life much easier. They can scan the current load resources have to handle and decide how to scale them up and down without constant manual intervention by an operator. It’s even better if such a solution is capable of leveraging performance optimizations like burst capacity by provisioning small storage volumes. That way, I’m not only saving money but also ensuring performance is never lacking.

This article was originally published on the Zesty Blog

Optimizing Resource Utilization: the Benefits and Challenges of Bin Packing in Kubernetes

Zesty_tech — Tue, 28 Nov 2023 08:39:41 +0000

By: Omer Hamerman, Principal DevOps Engineer at Zesty

Key Takeaways

Challenges in bin packing include balancing density versus workload isolation and distribution, as well as the risks of overpacking a node, which can lead to resource contention and performance degradation.
Kubernetes provides scheduling strategies such as resource requests and limits, pod affinity and anti-affinity rules, and pod topology spread constraints.
Examples of effective bin packing in Kubernetes include stateless applications, database instances, batch processing, and machine learning workloads, where resource utilization and performance can be optimized through strategic placement of containers.
Best practices for bin packing include careful planning and testing, right-sizing nodes and containers, and continuous monitoring and adjustment.
Implementing bin packing in Kubernetes can also have a positive environmental impact by reducing energy consumption and lowering greenhouse gas emissions.

Given Kubernetes' status as the de facto standard for container orchestration, organizations are continually seeking ways to optimize resource utilization in their clusters. One such technique is bin packing: the efficient allocation of resources within a cluster to minimize the number of nodes required for running a workload. Bin packing lets organizations save costs by reducing the number of nodes necessary to support their applications.

The concept of bin packing in Kubernetes involves strategically placing containers, or "bins," within nodes to maximize resource utilization while minimizing wasted resources. When done effectively, bin packing can lead to more efficient use of hardware resources and lower infrastructure costs. This is particularly important in cloud environments where infra spend makes up a significant portion of IT expenses.

In this article, we will explore the complications of bin packing in Kubernetes, discuss the challenges and trade-offs associated with this approach, and provide examples and best practices for implementing bin packing in your organization.

Challenges of Bin Packing in Kubernetes

While bin packing in Kubernetes offers significant benefits in terms of resource utilization and cost savings, it also presents some challenges that need to be addressed.

Density vs. Workload Isolation and Distribution

One of the main issues when implementing bin packing is finding a balance between maximizing resource density and maintaining workload isolation while ensuring the distribution of workloads across systems and availability zones (AZs) for resilience against hardware failures. Packing containers tightly onto nodes can lead to better resource utilization, but it can also increase the risk of contention for shared resources, such as CPU and memory.

This can result in performance degradation and potentially affect the stability of the entire cluster. Moreover, excessive bin packing can contradict the concept of distribution, presenting dangers to the system's ability to sustain hardware failures. Therefore, it is essential to apply bin packing strategies judiciously and only when the use case makes sense, taking into account both resource optimization and system resilience.

To further understand the implications of this trade-off, it's worth considering the impact of increasing density on the fault tolerance of your cluster. When containers are packed tightly onto a smaller number of nodes, the failure of a single node can have a more significant impact on the overall health and availability of your applications. This raises the question: how can you strike a balance between cost savings and ensuring your workloads are resilient to potential failures?

Risks of Over Centralizing Applications in the Node

The risk of excessively bin-packing applications in a node, is the opposite of maintaining the "best-practice" of a distributed deployment. It's the classic risk management mistake of having all your eggs in one basket. It's an operational risk so that if your node dies, it means a bigger chunk of your deployment will be down with it. Therefore, on the one hand, you want to be distributed as possible for the sake of resiliency. On the other hand, you want to keep your costs under control and bin packing is a good solution for this. The magic is in finding the sweet spot in this balance of considerations.

These issues become more pronounced when multiple containers vie for the limited resources, like memory or CPU, available on a single node, resulting in resource starvation and suboptimal application performance. Additionally, scaling the system in a non-gradual manner or in bursts can also cause unwanted failures, further exacerbating these challenges. To manage these inconsistencies it helps to set policy limits, where you can ensure the reliable supply of resources to applications.

Another aspect to consider when overpacking a node is the potential effect on maintenance and updates. With more containers running on a single node, the impact of maintenance tasks or software updates can be magnified, possibly leading to more extended periods of downtime or reduced performance for your applications. How can you manage updates and maintenance without negatively affecting the performance of your workloads when using bin packing is a critical question to consider.

Scheduling Strategies to Address the Challenges

Kubernetes provides several scheduling strategies to help remediate issues related to bin packing:

Resource requests and limits let you configure the Kubernetes scheduler to consider the available resources on each node when making scheduling decisions. This enables you to place containers on nodes with the appropriate amount of resources.
Pod affinity and anti-affinity rules allow you to specify which nodes a pod should or should not be placed on based on the presence of other pods. This can help ensure that workloads are spread evenly across the cluster or grouped together on certain nodes based on specific requirements. For example, data-critical systems, such as those handling essential customer data for production functionality, need to be distributed as much as possible to enhance reliability and performance. This approach can reduce the risk of single points of failure and promote better overall system resilience.
Pod topology spread constraints enable you to control how pods are distributed across nodes, considering factors such as zone or region. By using these, you can ensure that workloads are evenly distributed, minimizing the risk of overloading a single node and improving overall cluster resilience.

By carefully considering and implementing these scheduling strategies, you can effectively address the challenges of bin packing in Kubernetes while maintaining optimal resource utilization and performance.

Examples of Bin Packing in Kubernetes

There are various examples of how Kubernetes can effectively implement bin packing for different types of workloads, from stateless web applications to database instances and beyond. We'll explore some of them below.

Stateless Applications

Kubernetes can pack multiple instances of stateless applications into a single node while ensuring that each instance has sufficient resources. By using resource requests and limits, you can guide the Kubernetes scheduler to allocate the appropriate amount of CPU and memory for each instance. As long as the instances have enough resources, they will be up and running and ensure high availability for stateless applications such as web or client-facing apps

Database Instances

When dealing with databases, Kubernetes can effectively pack individual instances of different stateful applications into nodes to maximize throughput and minimize latency. By leveraging pod affinity rules, you can ensure that database instances are placed on nodes with the necessary volumes and proximity to other components, such as cache servers or application servers. This can help optimize resource usage while maintaining high performance and low latency for database operations.

Batch Processing and Machine Learning Workloads

Bin packing can also be beneficial for batch processing and machine learning workloads. Kubernetes can use pod topology spread constraints to ensure these workloads are evenly distributed across nodes, preventing resource contention and maintaining optimal performance.

Large Clusters with Many Nodes

In cases where a service needs to be distributed to a large number of nodes (e.g., 2,000 nodes), resource optimization remains a priority. While spreading these services out is essential for tolerance, bin packing should still be considered for the remaining services to increase the utilization of all nodes.

Kubernetes can manage this through topology spread configurations such as PodTopologySpreadArgs if specific resources from nodes are used. Cluster admins and cloud providers should ensure nodes are provisioned accordingly to balance the spread-out services and the bin-packed services.

By understanding and applying these examples in your Kubernetes environment, you can leverage bin packing to optimize resource utilization and improve the overall efficiency of your cluster.

Cost Efficiency Benefits of Bin Packing in Kubernetes

By efficiently allocating resources within a cluster and minimizing the number of nodes necessary to support workloads, bin packing can help reduce your infrastructure costs. This is achieved by consolidating multiple containers onto fewer nodes, which reduces the need for additional hardware or cloud-based resources. As a result, organizations can save on hardware, energy, and maintenance.

In cloud environments, where infrastructure costs are a significant portion of IT expenses, the cost savings from bin packing can be particularly impactful. Cloud providers typically charge customers based on the number and size of nodes used, so optimizing resource utilization through bin packing can directly translate to reduced cloud infrastructure bills.

Best Practices for Bin Packing in Kubernetes

To fully harness the benefits of bin packing in Kubernetes, it's essential to follow best practices to ensure optimal resource utilization while preventing performance problems. We highlight three below.

Careful Planning and Testing

Before implementing bin packing in your Kubernetes environment, it's crucial to carefully plan and test the placement of containers within nodes. This may involve analyzing the resource requirements of your workloads, determining the appropriate level of density, and testing the performance and stability of your cluster under various scenarios. Additionally, setting hard limits for memory is essential, as memory is a non-compressible resource and should be allocated carefully to avoid affecting surrounding applications. It is also important to account for potential memory leaks, ensuring that one leak does not cause chaos within the entire system.

By taking the time to plan and test, you can avoid potential pitfalls associated with bin packing, such as resource contention and performance degradation.

Right Sizing Nodes and Containers

Properly sizing nodes and containers is a key aspect of optimizing resource utilization in your Kubernetes environment. To achieve this, first assess the resource requirements of your applications, taking into account CPU, memory, and storage demands. This information helps in determining the most suitable node sizes and container resource limits to minimize waste and maximize efficiency. It is crucial to size nodes and containers appropriately for the workload because if your containers are too large and take up a significant proportion of the node, then you won't be able to fit additional containers onto the node. If you're running a very large container that takes up 75% of every node, for example, it will essentially force 25% waste regardless of how many bin packing rules were set. The resources allocated to a container and the resources a machine offers are critical factors to consider when optimizing your Kubernetes environment.

Monitoring and Adjusting Over Time

Continuous monitoring and adjustment are essential for maintaining optimal resource utilization in your Kubernetes clusters. As workloads and requirements evolve, you may need to reassess your bin packing strategy to ensure it remains effective.

Regular monitoring can help you identify issues early on, such as resource contention or underutilized nodes, allowing you to make adjustments before a problem escalates.

Utilizing Kubernetes Features for Bin Packing

Resource quotas allow you to limit the amount of resources a namespace can consume, ensuring that no single workload monopolizes the available resources in your cluster.
Resource requests and limits for your pods, already noted above, let you guide the Kubernetes scheduler to place containers on nodes with the appropriate amount of resources. This helps ensure workloads are allocated efficiently and resource contention is minimized.

One more aspect to consider is the environmental impact of your infrastructure. By optimizing resource utilization through bin packing, you can potentially reduce your organization's carbon footprint. Running fewer nodes means consuming less energy and generating less heat, which can contribute to lower greenhouse gas emissions and a smaller environmental impact. This raises an important question: How can businesses balance their goals for cost efficiency and performance with their social responsibility to reduce their environmental footprint?

Conclusion

Bin packing in Kubernetes plays a crucial role in optimizing resource utilization and reducing infrastructure costs. But it's also important to achieve the right balance between efficiency and performance when optimizing Kubernetes resources.

By strategically allocating resources within a cluster, organizations can minimize the number of nodes required to run workloads, ultimately resulting in lower spend and more efficient infrastructure management.

However, as discussed, there are some performance-related challenges and trade-offs associated with bin packing, as well as best practices for effectively employing bin packing in your Kubernetes environment. By understanding and leveraging these techniques, you can maximize resource utilization in your cluster, save on infrastructure costs, and improve overall efficiency.

This articles was originally published on InfoQ

Automatically Resize Amazon EBS Capacity with Zesty Disk for Cost Efficiency and Consistent Performance

Zesty_tech — Mon, 27 Nov 2023 10:03:18 +0000

by Aviram Levy, Adinah Brown, Aaron Curtis, Anudeep Burugula, and Siva Sadhu | on 20 NOV 2023

Cloud computing has helped organizations adapt their infrastructure to changing business needs. One of the key drivers for companies moving to the cloud is the ability to scale up and down infrastructure capacity as needed.

By dynamically adjusting infrastructure capacity based on usage fluctuations, organizations gain both operational and cost benefits. Aligning resources with actual needs reduces the operational load required to ensure there’s sufficient capacity and mitigates the risk of application failure.

On the flip side, there are improvements to cost efficiency. Businesses no longer need to provision resources in excess of what they need. Instead, they pay only for the resources they require, minimizing waste and enhancing financial efficiency.

In this post, we will share how Zesty Disk helps achieve greater cost efficiency and consistent performance when managing Amazon Elastic Block Store (Amazon EBS) volumes. The cloud-native solution optimizes storage at run time without adding any complexity or being exposed to any data on the disk.

Zesty was built with the vision of helping Amazon Web Services (AWS) customers make their cloud infrastructure more dynamic. Zesty is an AWS Specialization Partner and AWS Marketplace Seller with the Cloud Operations Competency. Zesty helps organizations to be more adaptable to changing business needs by making their cloud infrastructure more dynamic.

Modifying EBS Volumes Using Elastic Volumes

Amazon EBS Elastic Volumes allow you to increase volume size, change volume type, or adjust the performance of your EBS volumes. If your instance supports Elastic Volumes, the changes can be done without detaching the volume or restarting the instance. You can continue to use your application while the change takes effect.

Before modifying a volume that contains valuable data, it’s a best practice to create a snapshot of the volume in case you need to roll back your changes. Once you have the snapshot of the volume, you can request the volume modification.

If the size of the volume was modified, you must extend the volume’s file system to take advantage of the increased storage capacity.

There are a few approaches to automate the volume resizing process using AWS Step Functions and AWS Systems Manager which require engineering effort to set up the cloud infrastructure, catalog the existing EBS drives, use monitoring agents on Amazon Elastic Compute Cloud (Amazon EC2) instances, and so on. This generates noticeable overhead on implementation and maintenance.

Another key consideration is that you can only increase volume size. You can’t decrease the EBS volume size, but if a smaller volume is preferred you can create a smaller volume and then migrate your data to it using an application-level tool such as rsync.

How Zesty Disk Delivers Greater Cost Efficiency

With its unique proprietary algorithm, Zesty Disk makes it possible to automatically expand and shrink block storage without any risk to application stability. The block storage autoscaler shrinks and expands volumes at run time, effectively right-sizing storage, with value to be gained for application stability and cost reduction.

The solution’s artificial intelligence (AI) algorithm responds to changing application demand, ensuring users pay only for the storage they need.

Let’s dive deeper into some of the benefits Zesty Disk delivers for AWS customers using Amazon EBS.

Zesty Disk brings about cost optimization by dynamically adjusting storage capacity based on data requirements. By continuously monitoring usage metrics and instance metadata, Zesty Disk effectively determines when to shrink or expand storage sizes to match the data volume. This flexible approach ensures resources are allocated optimally, minimizing unnecessary costs.

When there is less data, Zesty Disk automatically shrinks the storage capacity to avoid overprovisioning. By decoupling large filesystem volumes into smaller volumes, it can remove unused or underutilized storage, freeing up resources and reducing costs. This proactive approach to right-sizing storage ensures efficient resource allocation without compromising application performance.

Conversely, when there’s an increase in data Zesty Disk seamlessly expands the storage capacity. It can add smaller disks to accommodate the growing data volume, and this elasticity allows businesses to scale their storage infrastructure on-demand, avoiding overprovisioning and unnecessary expenses.

By dynamically adjusting storage capacity based on data fluctuations, Zesty Disk delivers cost optimization by ensuring businesses only pay for the storage they require and use, saving valuable resources and enhancing overall financial performance.

Eliminating Downtime

By decoupling large filesystem volumes into smaller volumes, Zesty Disk can make real-time adjustments to support the application’s availability and ensure it doesn’t run into downtime due to insufficient disk space.

When a change in storage capacity is required, Zesty Disk leverages autoscaling technology to dynamically resize the storage volume. The ML algorithm analyzes usage metrics and instance metadata and accurately predicts the storage needs, adjusting the volume accordingly. This ensures the application has the optimal storage capacity to perform efficiently.

The process of adjusting the storage volume is executed smoothly. Zesty Disk serializes the filesystems on the disk, replacing a large volume with multiple smaller volumes. If additional storage is needed, Zesty Disk can add smaller disks seamlessly. On the other hand, when there is a need to reduce capacity, Zesty Disk can evict a disk by redistributing its data to other volumes before removing it. This is a process of redistributing data and adjusting the storage size.

By dynamically adjusting the storage volume without causing downtime, Zesty Disk provides uninterrupted application availability. Businesses can avoid costly disruptions and maintain continuous operations while efficiently managing their storage resources.

Improving IOPS Performance

The decoupling of the filesystem into multiple smaller volumes, each with its own dedicated allocation of input/output operations per second (IOPS), entails a significant boost in performance, speed, and enhanced user experience.

Traditionally, a single large volume would have a limited number of IOPS allocated to it, but Zesty Disk breaks this limitation by splitting the volume into multiple smaller volumes. With each smaller volume having its own allocation of IOPS, the total IOPS capacity increases proportionately. This means the application can benefit from a higher level of concurrent I/O operations, resulting in improved performance and responsiveness.

Moreover, this approach allows Zesty Disk to optimize throughput performance. By distributing the workload across multiple smaller volumes, the overall throughput capacity is increased, enabling faster data transfer rates and more efficient data processing.

By decoupling large filesystem volumes into multiple smaller volumes with dedicated IOPS allocations, Zesty Disk harnesses the full potential of the underlying storage infrastructure, delivering improvements in both IOPS and throughput performance. This translates to a superior user experience, ensuring applications can operate at peak performance levels while efficiently utilizing available resources.

Zesty Disk Under the Hood

The solution works by creating a virtual disk composed of several small storage volumes. It utilizes native AWS block storage devices, allowing users to maintain their existing tools, procedures, and service level agreements (SLAs). Users retain ownership and exclusive access to their data.

Figure 1 – Deployment architecture of Zesty Disk

Step-by-step instructions to install Zesty disk on an Amazon EC2 instance are available on the Zesty website.

Zesty Disk continuously monitors usage metrics, including capacity, IOPS, and read/write throughput, as well as instance and disk metadata.

Figure 2 – Dashboard showing changing disk capacity and average read/write IOPS

This data is securely sent to Zesty’s backend for processing. An AI model analyzes the metrics to generate a behavioral profile of the instance volume, predicting usage patterns and fluctuations.

Figure 3 – Zesty Disk’s operation in the client’s cloud environment

When a capacity change is required, Zesty’s backend issues API commands to the cloud provider, triggering the appropriate action. An update request is then sent to the Zesty Disk handler on the instance to adjust the capacity accordingly.

The filesystems on the disk are serialized, replacing a large volume with multiple smaller volumes. Disk eviction can occur by transferring data from a smaller volume to other disks before removing it.

Figure 4 – Defragmentation of the disk enables increase/ decrease in filesystem size

Customer Success Story: Securonix

As a security, information, and event management (SIEM) solution that detects advanced threats using innovative machine learning algorithms, Securonix collects massive amounts of data in real-time.

The high volume of data entailed it had to frequently allocate large terabytes of disk storage per Amazon EC2 instance. However, once all of the available capacity was consumed by the ingested data produced by its analytics engine, Securonix was paying for block storage capacity it wasn’t using in the time this took to fill up.

Securonix uses Amazon Elastic MapReduce (Amazon EMR) to run HBase (Hadoop database) and Spark, and used Zesty Disk (ZD) as a seamless service to ensure storage persistence when there is high data usage, creating greater demand on the disk required for the database.

A custom Amazon Machine Image (AMI) was developed that uses the ZD filesystem (Amazon Linux 2 with ZD). The fragmenting into small storage volumes is what enables the elasticity for the volume to grow as data is ingested, and to shrink as data is removed when a snapshot is taken. In this process, all native tools, procedures, and SLAs remain unchanged, and Securonix remained the owner of its data and the only ones that have control over it.

After running its EMR clusters with EBS volumes fully managed by Zesty Disk, Securonix was pleased with the seamless service availability even in the case of large data ingestion peaks.

Operationally, Zesty Disk avoids the hassle of reallocating storage across instances and eliminates on-call developer tasks related to maintaining EBS. With a net capacity utilization of 40% on its prior provisioned storage and shrinking provisioned capacity, Securonix saves 66% off its earlier EBS cost.

"Prior to implementing Zesty Disk, we were seeing an average storage utilization of less than 35%. This was difficult to optimize due to the potential to run out of storage and cause a production outage,” says Derrick Harcey, Chief Architect at Securonix. “Once we implemented Zesty Disk, we are able to maintain more than 75% EBS storage utilization to significantly reduce storage costs. In addition to reducing storage costs, the performance of our EBS volumes has increased significantly due to the inherent parallelism introduced with virtual disks with multiple underlying EBS volumes."

Conclusion

Zesty Disk offers a unique solution to maximize the value derived from Amazon EBS. With the ability to decouple large filesystem volumes into smaller volumes, organizations are able to experience improved performance and cost efficiency, achieving more with their existing storage resources.

The solution’s auto-scaling technology eliminates the need for manual adjustments, streamlining operations and reducing the cognitive load for developers. Overall, Zesty Disk provides a powerful tool for organizations to enhance cost storage infrastructure, ensure cost-effectiveness and operational efficiency, and increase value from EBS investments.

"Spearheading this initiative was made easy by the amazing team that we worked with at Securonix,” says Uri Naiman, Sales Engineering Team Lead at Zesty. “We were able to show significant cost savings, whilst ensuring there was no negative impact to either performance or to their application."

Learn more about Zesty Disk and request a demo. You can also explore Zesty products in AWS Marketplace.

This article was originally published on AWS Partner Network (APN) Blog

How to Shrink EBS Volumes

Zesty_tech — Sun, 26 Nov 2023 11:52:28 +0000

This article was originally published on The Stack

"Unfortunately, the size of an existing EBS volume cannot be decreased. Instead, it is possible to create a smaller volume and move the data using tools such as rsync, at the cost of pausing the system’s write operations during the migration..."

Developers frequently need to run applications on EC2 instances, a common approach with legacy or functionally complex apps in particular. The instances use Amazon EBS (Elastic Block Storage) as their permanent file systems. Since your applications will fail if your disks become full, some overprovisioning of EBS space is necessary.

It often happens that an EBS volume needs to be adjusted. For example, I once had to add a 4 TB volume per production instance to collect advanced logs about alleged bugs, writes Uri Naiman, Sales Engineering Team Lead at Zesty. But after I finished debugging, I no longer needed the volumes.

Running out of disk space is a serious risk that must be mitigated in order to prevent downtime and data loss. Determining the optimal size of the EBS volume can be extremely difficult. In some cases, I’ve needed to expand an EBS volume due to new application features requiring more disk space for debugging or for dealing with higher traffic, which can create more logs and temporal data than was initially anticipated.

Another common scenario is your application or database storage requiring more space over time. The safe bet is to provision way in excess of what you’re actually going to use to cover for all possible causes of data peaks, however, this will increase your cloud bill substantially.

When a software product is being developed, the focus is on delivery, while the exact requirements and costs are still unknown. In almost every company I’ve ever worked at, once the product was operational, we discovered large expenses in our cloud bill. This was often due to our EBS volume cost.

In addition to the hefty cloud bill, developers are often poised to dedicate valuable hours of the day to managing volumes manually. But this manual and repetitive task is a perfect opportunity for AI and machine learning to take over.

In this article, I’ll demonstrate how to shrink an EBS volume in order to lower your cloud bill. First, I’ll explain how to do this manually, a far more labor-intensive task than what may have been expected. Then, I’ll explain how this can be orchestrated automatically without downtime.

EBS Pricing Considerations

The cost of a single terabyte general-purpose disk (gp3) runs at least $80 per month. That means a dozen instances with such storage would set you back $1,000 per month. That’s not even taking into account any backups you’d need to make (priced at roughly 50% of gp3 storage cost). So this scenario is hardly pragmatic.

In order to save costs and be able to adjust continuously based on demand, you need to provide sufficient EBS storage without reserving too much. While in theory you could handle EBS sizes for just a small number of instances manually, the process is still tedious, requiring multiple steps. And once your system grows beyond a few instances, this simply becomes unmanageable without automation.

Expanding a Volume

Expanding an EBS volume in AWS involves submitting a request, where you can set the new size and tweak the performance parameters. But this approach has its limitations. For one, it requires a cooldown period of several hours between consequent modifications. While a reboot isn’t usually necessary, there have been times where I’ve needed to restart the instance to apply the changes or for the performance adjustments to take effect. There are also a few cases where changes to a desired volume configuration may not even be possible.

While the expansion of the volume can be automated, the customer must implement the required automation themselves, which can incur additional development and maintenance costs.

Shrinking a Volume

When peaks in demand subsist, shrinking EBS volumes to fit the reduced app demand would enable greater cost efficiency. Unfortunately, the size of an existing EBS volume cannot be decreased. Instead, it is possible to create a smaller volume and move the data using tools such as rsync, at the cost of pausing the system’s write operations during the migration.

Let’s take a look at how it’s done. We’ll consider an instance with two EBS volumes: one (root) for the system and another mounted to /data, for the application data:

$ df -h

Filesystem Size Used Avail Use% Mounted on

/dev/xvda1 16G 1.6G 15G 10% /

…

/dev/xvdb 7.8G 92K 7.4G 1% /data

The respective EBS volumes can be viewed using an AWS CLI command aws ec2 describe-volumes, which prints out instance and volume IDs. Suppose we determined that /data size was too big and wanted to replace it with a smaller volume.

Using AWS CLI, we can create a new small volume of 5 GB:

aws ec2 create-volume --availability-zone eu-west-1a --size 5 --volume-type gp3

The volume should be attached to the instance, after which it becomes visible in the system as /dev/xvdc:

aws ec2 attach-volume --volume-id "" --instance-id "" --device "/dev/xvdc"

Next, in order to continue, we need to log in to the instance and perform the following actions (Here, we’re using commands that work on Amazon Linux):

Create directory /newdata and mount the block device that corresponds to :

sudo mount /dev/xvdc /newdata
Create a filesystem on the new block device, for instance, of type ext3:

sudo mkfs -t ext3 /dev/xvdc
Update /etc/fstab so new device is reattached on reboot (Note that depending on your particular setup and filesystem of choice, this line might look different):
echo "UUID=$(lsblk -no UUID /dev/xvdc) /data ext3 defaults 0 0" | sudo tee -a /etc/fstab

Now that both /data and /newdata are mounted, we can migrate the content using rsync.

sudo rsync -aHAXxSP /data /newdata

By using these flags and specifying the appropriate source and destination paths, the rsync command will synchronize the files and directories between the source and destination—preserving attributes, hard links, ACLs, and EAs.
We must ensure that writes to /data don’t happen anymore to ensure consistency. To achieve that, we’ll most likely have to shut down our application during migration, leading to system downtime.

Next, unmount /data and erase its line in /etc/fstab. We also need to remount the /newdata filesystem as /data, adjust the /etc/fstab accordingly, reboot the instance to verify the correct mount point, and conduct any necessary checks.
$ df -h

Filesystem Size Used Avail Use% Mounted on

/dev/xvda1 16G 1.6G 15G 10% /

…

/dev/xvdc 4.9G 140K 4.6G 1% /data

Migration is now complete, and we can destroy the decommissioned larger EBS volume:

aws ec2 detach-volume --volume-id

aws ec2 delete-volume --volume-id

As I’ve demonstrated, shrinking poses many challenges. You need to deal with the AWS infrastructure and reliably migrate the data between volumes. Most likely, the application should be terminated or brought into a read-only mode to avoid inconsistent migration, which often leads to downtime. If there are any human or coding errors, there is a risk that data may be corrupted.

Automatically Scale EBS Volumes to Demand with Machine Learning

The ability to shrink EBS volumes can greatly reduce overspending by utilizing machine learning based technology to automatically resize volumes in response to demand. This minimizes costs and eliminates the need to continuously monitor or tweak the capacity as demand fluctuates.

Applying automated solutions towards managing disk volumes has the benefit of working 24/7 so they can instantly respond to unexpected changes in demand, they save significantly more than manual management, and they don’t require any downtime when decreasing storage size. Plus, they provide the intangible value of alleviating the burden and stress of having insufficient capacity should a data peak suddenly occur.

Not only can intelligent automated solutions free up valuable time for cloud operations teams, it also eliminates human error by making data-driven decisions based on disk utilization metrics, ensuring precise cost optimization.

The FinOps industry has certainly embraced automated solutions to save time and money on compute but storage seems to largely get ignored. Companies can of course continue to have their DevOps teams manually shrink EBS volumes. Why would you preference that over automated management, that more consistently scales provisioned storage to used consumption, ensuring both consistently sufficient capacity and excellent storage cost savings?

Understanding and Leveraging Kubernetes Controllers

Zesty_tech — Fri, 24 Nov 2023 08:48:18 +0000

This article was originally published on Cloud Native Now

As more businesses shift toward microservices, Kubernetes is turning into a go-to tool for handling the nitty-gritty of today’s IT world, and its API plays a pivotal role. Think of the K8s API as the control hub, making the management of the Kubernetes cluster a breeze and letting users spell out how they want their apps and infrastructure to look and act.

Kubernetes controllers are the unsung heroes, constantly working to ensure the system’s actual state matches the user’s needs. Here, the operator pattern is a real game changer, showing how flexible and user-focused Kubernetes can be.

The operator pattern was created to meet the varied needs of businesses and devs and enables Kubernetes to do even more, including managing custom resources it wasn’t originally built for. This means Kubernetes can be customized to handle all sorts of systems, proving it’s ready to adapt to whatever the tech world throws at it. Kubernetes really stands out when you consider the cool features and extensions the operator pattern brings to the table, offering solid answers to today’s tech hurdles.

Kubernetes-Native Controllers

By operating on the principle of desired state management, Kubernetes allows users to dictate their system’s configuration through a centralized control plane, which serves as the decision-making and monitoring hub. At the heart of this control plane are the Kubernetes-native controllers, purpose-built to manage specific resources within the ecosystem. These controllers continuously monitor their respective resources and ensure the system’s current state aligns seamlessly with the user-defined desired state; controllers automatically make necessary adjustments to maintain this balance.

For instance, consider the deployment controller. When you deploy an application in Kubernetes using a Deployment, this controller jumps into action. It ensures that the specified number of replicas of your application is maintained. If a pod crashes or becomes unresponsive, the deployment controller will recognize the discrepancy and initiate the creation of a new pod to maintain the desired state.

Similarly, the ReplicaSet controller maintains the correct number of pod replicas. It’s closely related to the deployment controller but operates at a slightly lower level, focusing specifically on pod replicas without the additional features that deployments offer.

These are just two examples, but Kubernetes boasts a plethora of native controllers, each tailored for specific tasks, like managing services, volumes or network policies. Together, they contribute to Kubernetes’ reliability and resilience, ensuring that your applications and infrastructure run smoothly and consistently.

Custom Controller Use Case: Tracking New Volumes

In my journey through the dynamic world of Kubernetes, I’ve found myself in situations where the built-in controllers couldn’t meet specific needs that cropped up. That’s when I realized the true power of custom controllers.

I worked in a large-scale organization where we were constantly deploying and scaling storage volumes. It became evident that we needed an efficient system to keep track of these deployments. I imagined how great it would be if I could receive a Slack notification every time a new storage volume was deployed. In addition, the volumes would be annotated automatically for monitoring systems without human intervention. While Kubernetes doesn’t offer these features natively, I figured out that a custom controller could be the perfect solution to bridge this gap.

So, I mapped out a workflow for the controller to handle this scenario, which looked something like this:

A new storage volume gets deployed in the Kubernetes cluster.
The custom controller, which I designed to keep an eye on storage volumes, spots this new deployment.
The controller reacts to this by triggering a predefined action—in this case, shooting off a notification to a Slack channel and annotating volumes for monitoring.
My team received the Slack notification, giving me a heads-up about the new volume deployment.
Armed with this info, I could quickly gauge whether the new storage was vital for a particular application or if we needed to make some tweaks.

This hands-on experience highlighted the versatility and adaptability of custom controllers. I was able to tailor Kubernetes to my specific needs, ensuring seamless integration with the other tools and platforms I relied on while maintaining the system in its desired state. It turned out to be a practical solution and helped to streamline operations and keep everything running smoothly.

Kubernetes-Native Way: kubebuilder

Kubernetes boasts a rich ecosystem that not only allows for the creation of custom controllers but also offers tools to facilitate this process. A standout tool in this realm is kubebuilder, a scaffolding framework designed to construct Kubernetes APIs and controllers. This tool greatly simplifies the task of integrating custom resources and logic into Kubernetes.

The preference for kubebuilder over custom scripts stems from several of its advantages:

It provides a structured project layout, streamlining the development and maintenance of controllers and custom resources.
It autogenerates much of the repetitive code essential for setting up controllers and APIs.
It integrates seamlessly with Kustomize for configuration customization and is backed by thorough documentation to guide developers through its functionalities.

Walkthrough: Creating a Controller With kubebuilder

To get going on creating a controller with Kubebuiler, I highly recommend The Kubebuilder Book. It provides a comprehensive walkthrough of creating a container, covering all the steps and components. Even for the relative experts out there, this guide is worth looking into to further sharpen your skills.

Advantages of Controllers

Kubernetes controllers, both inherent and custom-made, serve as the foundational pillars of the Kubernetes ecosystem. Acting as silent custodians, they ensure the cluster’s current state consistently mirrors the user’s desired specifications. These controllers offer a multitude of benefits:

High availability: Controllers are integral to Kubernetes’ promise of high availability. For instance, in the context of tracking newly created volumes, having a controller that notifies the team immediately ensures that any issues can be addressed promptly, maintaining the high availability of the volumes. This self-recovery feature ensures applications remain robust against failures.

Versatility: Controllers in Kubernetes are designed to cater to diverse needs. Leveraging them in tracking volume creations showcases their versatility in adapting to different operational needs, including batch jobs, stateful services or daemon processes. This allows Kubernetes to manage varied workloads effectively.

Appropriate permissions: By prioritizing security, controllers operate on a least-privilege principle. They possess only the essential permissions needed for their tasks, reducing potential security threats and limiting the impact of any compromised component.

Resource optimization: Beyond state maintenance, controllers emphasize efficiency. In the scenario of tracking volumes, it aids in resource optimization by providing real-time updates, facilitating immediate actions to optimize resources based on the current state and guaranteeing cost efficiency.

Extensibility: Kubernetes’ flexibility is evident in its support for custom controllers, allowing users to address unique needs beyond the capabilities of native controllers. This adaptability ensures Kubernetes stays relevant to changing business needs. For example, in the case of tracking volumes, it has extended its functionality to integrate seamlessly with tools like Slack, enhancing operational efficiency and responsiveness.

Conclusion

Controllers aren’t just a component of Kubernetes; they’re its lifeblood, ensuring that applications remain available, resilient and efficient. They are essentially the Ops engineers’ method of introducing automation to K8s in an elegant and resilient way and extending their capabilities. In most scenarios, controllers prove to be the optimal way to interact with clusters, outshining scripts and manual interventions. Controllers’ automated, continuous monitoring and action loops mean the system can stay in its desired state without constant human oversight.

It’s also worth delving deeper into how controllers aid in extending the system. The term “operator” defines a set of controllers and custom resource definitions (CRDs), which are—in essence—custom resources. I’ve touched upon this concept briefly in this article. Still, it’s fundamental to understand that operators allow for creating custom, application-specific controllers, thereby enhancing the extensibility of Kubernetes.

So, as you navigate your Kubernetes journey, remember the pivotal role of controllers and consider crafting your own. With the extensibility features of Kubernetes—especially through the use of operators—you stand to gain even more from the cloud-native infrastructure and ecosystem.

How to Master Databases to Get More from Your Kubernetes Clusters

Zesty_tech — Thu, 13 Apr 2023 21:35:43 +0000

Kubernetes is the de facto container orchestration system to run scalable and reliable microservices. Due to its modern, cloud-native approach, Kubernetes is one of the key technologies that brought about significant improvements in software development. Because of this, more and more applications have migrated to Kubernetes, including but not limited to databases.

In a traditional database, you store, manage, and serve the data required for your applications, customers, and analyses. Users, frontends, and other applications can connect to the database to query data. With the cloud-native approach, applications have become smaller and more numerous, leading to changes in how databases are used.

This blog will focus on running databases in Kubernetes and discuss the related advantages, limitations, and best practices.

Running Databases on Kubernetes

Kubernetes and containers were initially intended for stateless applications where no data is generated and saved for the future. With the technological development in data center infrastructure and storage, Kubernetes and containers are now also used for stateful applications such as message queues and databases.

There are two mainstream options for running databases on Kubernetes: StatefulSets and sidecar containers.

StatefulSets

StatefulSets are the native Kubernetes resources to manage stateful applications. They manage pods by assigning persistent identities for rescheduling and storage assignments, ensuring that pods always get the same unique ID and volume attachment when scheduled to another node. This sticky characteristic makes it possible to run databases on Kubernetes reliably and with the ability to scale.

Still, although Kubernetes does its best to run databases, deploying and scaling them is not straightforward. When you check the additional resources to deploy a MySQL Helm chart, you will see configmaps, services, roles, role bindings, secrets, network policies, and service accounts. Management of these Kubernetes resources with updates, backups, and restores can quickly become overwhelming.

There are three vital resources for deploying databases on Kubernetes:

ConfigMaps store the application configuration; you can use them as files, environment variables, or command-line arguments in pods.

Secrets are the Kubernetes resources for storing sensitive data like passwords, tokens, or keys. These do have one critical drawback: They are stored unencrypted in the underlying data store (etc). Check out this link to learn how to secure secrets in Kubernetes installations, such as using an external secrets manager.

Kubernetes Service resources allow other applications running in the Kubernetes cluster to connect and use the database instances.

Luckily, you do not need to create all these resources by yourself. There are widely used Helm charts to deploy popular database installations—including MySQL, PostgreSQL, or MariaDB—as StatefulSets with a wide range of configuration options.

Sidecar Containers

Here, Kubernetes pods encapsulate multiple containers and run them together as a single unit. This encapsulation and microservice architecture support creating small applications that focus on doing one thing—and doing it the best. This makes the sidecar pattern a popular approach for separating the main container from the business logic and adding additional sidecar containers to perform other tasks, such as log collection, metric publishing, or data caching.

Running sidecars next to the main applications comes with a huge benefit: low (even close to zero) latency. On the other hand, since there is not a centralized database, the consistency of the data between database instances will be extremely low. Therefore, it is highly beneficial to deploy cache databases such as Redis as sidecar containers. Because it’s an in-memory cache, you can add a Redis container to each instance of your application and deploy it to Kubernetes.

Reliability in Chaos

It is possible to run almost every kind of application on Kubernetes, including all major databases. However, you need to check some critical database characteristics against the high volatility—i.e., chaos—in Kubernetes clusters. In other words, it is common to see nodes go down, pods being rescheduled, and networks being fragmented. You need to ensure that the database you deploy on Kubernetes will resist these events and have the following characteristics.

Failover and Replication

It is common in Kubernetes to see some nodes being disconnected. If you have database instances running on these nodes, you and other database instances will lose access to them. Therefore, the database should support failover elections, data sharding, and replication to overcome any risks of data loss.

Caching

Kubernetes is designed to run a high number of small applications; this means that databases with more caching and small data layers are more appropriate to run on the clusters. A well-known example is the native approach of Elasticsearch with its sharing of indices across instances in the cluster.

Operators

Kubernetes and its API are designed to run with minimal human intervention. Because of this, it is beneficial to use a database with a Kubernetes Operator to handle configurations, the creation of new databases, scaling instances up or down, backups, and restores.

Managed Database vs. Database on Kubernetes

Sharing the same infrastructure and Kubernetes cluster for stateless applications and databases is tempting, but there are some factors you need to take into account before choosing this route. We’ve already discussed the Kubernetes-friendly characteristics your database should have, but there are additional features required.

If you've already ensured that the database has Kubernetes-friendly features, you’ll need to next consider the database workload and consequences.

If you expect high resource usage, running on a Kubernetes cluster could be pricey compared to a managed service. On the other hand, running on Kubernetes could be a better option if you need a real-time and latency-critical database.

Finally, you need to consider the operational requirements and your team structure. Databases have their own lifecycles, with patches, maintenance windows, and backups. Patches are inevitable and especially essential for security, as you need them to comply with security policies and certifications. Even if you use package managers or operators, you need to manually find the next version, which includes security patches.

In a Kubernetes installation, you’re responsible for handling and managing these operations, whereas in a managed service, the cloud provider performs these functions for you. Using a managed database service could also be beneficial for your applications if you have critical performance indicators or SLAs.

In short, you need to consider your requirements, budget, and operational capabilities before diving into a database installation on Kubernetes.

Best Practices for Running Databases on Kubernetes

If you decide that Kubernetes is the best place to run your database, there are some best practices you should follow:

Scalability: You should consider the horizontal and vertical scalability of your database in detail. It is suggested to use StatefulSets for horizontal scalability, and Kubernetes features such as vertical Pod autoscalers to extend for CPU and memory usage.

Operations: Automation is vital for a successful Kubernetes installation, and it is suggested to deploy Kubernetes Operators first and then create database instances as Kubernetes custom resources. However, if your databases are constantly built and destroyed, custom resources are not the best option since they can create residual Kubernetes resources in the cluster.

GitOps and configuration as code: Store every change in the source repository, and, following GitOps principles, deploy automatically when developers make changes.

Monitoring, visibility, and alerts: Ensure that you collect metrics and logs and create alerts based on usage, user access, and database health.

Troubleshooting with tools and playbooks: Failovers are unavoidable, and you need to be prepared to take action with predefined playbooks, tools, and helper scripts.

Conclusion

Kubernetes is the latest game-changer in cloud-native software development to run stateless applications and databases. However, where you deploy your database depends on your requirements, budget, and operational capabilities.

If you don’t have solid use cases, it may not be worth it to jump on the database-on-Kubernetes wagon simply due to the hype or for a single-line Helm installation.

If you do opt for Kubernetes, there is no silver-bullet approach. However, with the best practices and database characteristics discussed above, you should be able to successfully design your database deployments running on Kubernetes clusters.

This article was originally published on the Zesty Blog.

EBS vs. EFS: Which Storage System Is Right For You?

Zesty_tech — Fri, 07 Apr 2023 16:23:47 +0000

This post was originally on the Zesty blog

EBS vs. EFS–which makes the most sense for your business? Unfortunately, there’s no one size fits all approach.

Choosing the correct storage solution for your AWS workloads can sometimes be quite confusing–and this is especially true when you try to balance efficiency, performance, flexibility and costs for constantly changing applications. There are many services available with different storage types and feature sets, so it’s easy to get overwhelmed when you’re in the comparison stage of your cloud optimization journey.

In this article, we’ll compare two major storage services: Amazon Elastic Block Storage (Amazon EBS), and Amazon Elastic File Service (Amazon EFS). Both of these services offer great solutions if your application needs access to data via filesystem.

Amazon EBS

Amazon Elastic Block Store (EBS) is a highly performant block storage service that creates standalone virtual hard drives in the cloud and attaches those volumes to Amazon Elastic Compute Cloud (EC2) virtual machines.

AWS customers have been using EBS since its early days for almost all types of demanding workloads like databases, applications, email, file storage, backup, or websites. EBS volumes are easy to create and configure and can be scaled to deliver extremely high IO performance. These volumes are also highly available and durable. Although EBS volumes are not replicated across multiple Availability Zones, they are copied to multiple servers in the same AZ, thus offering 99.99% availability and up to 99.999% durability. Users can also encrypt EBS volumes for data security at rest.

Amazon Elastic File System (EFS)

Amazon Elastic File System (EFS) is a managed Network File System (NFS) designed for Linux-based EC2 instances, selected AWS managed services, and on-premise servers. There’s a similar storage system for Windows hosts called the Amazon FSx for Windows File Server. FSx uses the Server Message Block (SMB) protocol while EFS uses NFS.

Comparing EBS vs. EFS Systems

While EFS is a managed elastic file system designed for use across different machines and availability zones, EBS is designed as a fast and reliable block storage volume for single machines (although EBS multi-attach is an exception to this that applies only on very specialized scenarios).

There are other differences between the two storage systems which we’ll specify below.

Data Access

Like a physical hard drive, an EBS volume can be attached to a single EC2 instance (except for multi-attach use cases). The EC2 instance needs to be in the same availability zone as the EBS volume. Files in an EBS volume are accessible by filesystems like ext3, ext4, or xfs.

EFS filesystems, on the other hand, can be mounted on multiple machines from any availability zone or even from on-premise servers. Thousands of machines can connect to the same EFS folder. File system access is via the NFS protocol.

EBS volumes can be attached to both Windows and non-Windows EC2 machines, whereas EFS volumes are designed for Linux-based hosts only.

Storage Size

While the maximum size of an EBS volume can be up to 16 TB, EFS volume sizes are practically unlimited. The maximum size of a file in EFS is 47.9 TB.

Availability and Scalability

Although EBS volumes are not replicated across multiple Availability Zones, they are copied to multiple servers in the same AZ, thus offering 99.99% availability and up to 99.999% durability. Users can also encrypt EBS volumes for data security at rest.

Like EBS, EFS also offers high durability. However, the main difference lies in scalability. EFS volumes can scale up quickly and automatically to meet abrupt spikes in workload demand and scale down with a decreased load. This makes EFS more flexible and better at handling dynamic workloads than EBS.

This scalability also means EFS volumes don’t need to be pre-provisioned with a specific size for an anticipated load, which ultimately saves costs. Similar to EBS, you can also specify a provisioned throughput for EFS volumes.

Backup and Encryption

Backups and encryption-at-rest are available for both systems.

EFS also offers lifecycle management, a price-saving feature similar to S3 lifecycle management. EFS lifecycle management enables the automatic and transparent transfer of infrequently accessed data to a separate storage class.

Performance

You can configure EBS volumes to minimize disk latency. You can do this by choosing different types of storage (SSD, HDD, etc.), specifying provisioned IOPS, and selecting EBS-optimized EC2 instances.

EFS, on the other hand, isn’t as configurable as EBS. Although the baseline performance is fast enough for most workloads, it’s unable to provide low disk latency per IO operation like EBS. On the other hand, EFS – being a distributed file storage system – can handle a much higher throughput per second compared to EBS.

Check AWS documentation for more details on EFS performance, and download our EBS e-book to learn how to adjust EBS performance.

Cost

Costs will increase in both EBS or EFS with increasing provisioned performance. However, as a rule of thumb, EBS will be less expensive than EFS for the same performance per GB.

That said, mounting an EFS volume to multiple EC2 instances will have the same cost as mounting it to a single instance. In comparison, creating and attaching EBS volumes for every node may quickly add up to the bill.

Conclusion

So where should you use one and not the other? Here’s a checklist.

If you need to access data from different machines or from different availability zones, EFS is probably your best option.
EFS volumes are best suited for enterprise-wide file servers, backup systems, Big Data clusters, Massively Parallel Processing (MPP) systems, Content Distribution Networks (CDN), and other such large use cases.
Systems requiring a lot of throughput can also benefit from EFS.
If you need very low-latency disk operations, EBS is probably the best choice. EBS volumes are best suited for relational and NoSQL databases, enterprise applications like ERP systems, mail servers, SharePoint, web servers, directory servers, DNS servers, or middleware. That’s because these systems typically don’t run on large clusters, and therefore don’t need commonly mounted volume. Replication between servers is done on the application level, not on disk level. The performance requirements of these workloads can also be met by existing EBS volume types.

We hope you enjoyed this overview of two of AWS’s most popular storage systems, EBS and EFS. Whichever storage system you choose, we wish you an efficient and smooth cloud experience that enables you to scale as quickly and cost-effectively as possible.

Refer to this blog post to learn more about Zesty Disk, our solution for making EBS disks more dynamic and flexible than ever.