DataStax Developers

Posted on Jun 15, 2020

Testing Cloud Native Storage with Kubernetes and Cassandra on AWS

#kubernetes #database #storage #aws

Response times for applications impact revenue and customer satisfaction, and are therefore mission critical. Whether your application is user facing, performing computational analysis, or providing integration between services — no one wants to wait any longer than necessary.

Thousands of applications rely on Apache Cassandra to store and retrieve this data, and more enterprises are turning to Kubernetes for cloud native application management. But Kubernetes lacks storage management capabilities, so we at DataStax partnered with Arrikto to test whether a cloud native storage solution could help improve performance, cost, and operational agility for Cassandra users.

We tested two scenarios:

Cloud-managed attached disk. A 3 node Cassandra cluster running on a 4 node Kubernetes cluster with AWS-managed Elastic Block Storage (EBS) Disks.
Cloud native local storage. A 3 node Cassandra cluster running on a 4 node Kubernetes cluster with local ephemeral AWS local NVMe disk and Arrikto Rok.

The result was a 15x faster response time with a 22% transaction cost savings when using Cassandra with Arrikto Rok on AWS.

Testing Architecture and Configuration

We ran the open source NoSQL benchmark tool, nosqlbench, and compared the two different scenarios on AWS. The exact same architecture would apply for any other cloud provider or on-prem deployments. In the case of AWS, the comparison took place between:

Common cloud-managed attached disk

3 node Cassandra cluster running on a 4 node Kubernetes cluster with AWS managed Elastic Block Storage (EBS) Disks. The fourth Kubernetes node enables a “recover from failure” scenario, where we terminated a Kubernetes node in AZ1 and recovered it in a different zone (AZ2) — a more realistic example of desired customer architectures.

The Kubernetes nodes hosting the Cassandra pods run on AWS instances with AWS EBS volumes attached for cluster data storage. With EBS, all I/O requests go over the AWS backend storage network.

Cassandra was scheduled to run with only a single pod per Kubernetes node, with a spare Kubernetes instance to handle recovery operations.

In this example, all nodes were in the same Availability Zone (AZ1) due to single AZ limitations of EBS.

Figure 1: Cassandra on Kubernetes with EBS Architecture

When you need to recover a failed Cassandra node, you are reliant on EBS operations and speed. EBS detach and reattach operations are notoriously slow and unreliable. Importantly, you can only detach and reattach an EBS volume within the same Availability Zone (AZ). This introduces significant latency to recovery operations as well as operational overhead to ensure the disk was actually moved correctly.

Cloud native local storage

3 node Cassandra cluster running on a 4 node Kubernetes cluster with local ephemeral AWS local NVMe disk and Arrikto Rok. The fourth Kubernetes node enables a “recover from failure” scenario, where we terminated a Kubernetes node in AZ1 and recovered it in a different zone (AZ2) — a more realistic example of desired customer architectures.

In this architecture, Arrikto Rok gets deployed via the Rok operator on each Kubernetes node and connects to AWS S3 for storing the local volume snapshots. Cassandra consumes the locally attached ephemeral NVMe volumes. Thus all IO requests stay local to the Kubernetes node and do not traverse the AWS backend storage network. Data protection is provided automatically by local snapshots stored in S3.

Figure 2: Cassandra on Kubernetes with Rok Architecture

When you need to deploy a new Cassandra node, or recover from a failure, Arrikto Rok queries the versioned immutable snapshots stored in AWS S3 and performs a fast pull to restore to local disk and re-hydrate the data. This also allows you to restore to any Availability Zone — you are not limited to the original AZ.

Summary of Results

Using the same DataStax benchmark for both architectures, we found significant performance differences between the commonly used EBS model vs the cloud native local storage approach.

Overall, using Arrikto Rok we found performance improvements across the board in Operations per Second, Read Latency, and Write Latency.

Operations per Second saw improved performance starting at 10% and peaking at 55x faster than EBS[^1] before the test failed to complete.

Latency improvements across Read Heavy, Write Heavy, and Balanced Read Write was also significant. Read Heavy latency was 26x better with Arrikto Rok vs AWS EBS — an improvement of over 96%.

Write intensive latency also improved by 52% — a 2x speed increase.

When we take into consideration straight EC2 cost differences of using NVMe instances instead of EBS instances, you can save approximately 15% on your AWS bill — while also seeing massive performance increases. When you consolidate your instances into fewer larger instances, we saw cost savings up to 40%.

Drilling down even deeper, the cost per transaction ($ / OPS) also saw a reduction in cost of 22% for write intensive workloads.

Read the detailed results with graphs and charts that compare performance, cost, and operational overhead.

Conclusion

Cassandra users benefit from the integration of Arrikto Rok by enabling them to completely eliminate AWS EBS, deliver high levels of availability across multiple Availability Zones, reduce wasted staff time managing cloud disks, deploy smaller clusters with the same performance and lower cost, and slash software bills for Kubernetes. It’s a true cloud native approach to containerized storage and data management.

Learn more about Apache Cassandra, DataStax Enterprise, and Arrikto.

DEV Community

Testing Cloud Native Storage with Kubernetes and Cassandra on AWS

Testing Architecture and Configuration

Common cloud-managed attached disk

Cloud native local storage

Summary of Results

Conclusion

Top comments (0)

Read next

Amazon Q Developer Tips: 25 tips to supercharge your development

Supabase | My Way of Designing & Managing DB

Cost Comparison: Neon vs. Azure Database for PostgreSQL Flexible Server

Transaction Safety in Rails: Identifying and Addressing Non-Atomic Interactions