CAST AI

Posted on Oct 29, 2021 • Edited on Nov 4, 2021 • Originally published at cast.ai

CAST AI vs. GKE Autopilot: Where to manage Kubernetes on GKE?

#kubernetes #googlecloud #devops #cloud

Bonus content: Detailed simulation of cluster costs with CAST AI vs. GKE Autopilot

Running Kubernetes is a complex task, but luckily teams using Google Cloud Platform can choose from a few solutions that make the job easier.

Let’s take a closer look at two of them - CAST AI and the GKE Autopilot mode - to see which one is a better fit for efficient teams looking to automate their Kubernetes workloads and cut cloud costs.

TL;DR

CAST AI is a great serverless option for Kubernetes, it simplifies Kubernetes for both convenience and cost. CAST AI also reduces cloud expenses by at least 50%.

Autopilot is a great serverless option for Kubernetes, it simplifies Kubernetes for convenience. But it’s not free and comes with a cost increase.

CAST AI - full-scale GKE automation and cost optimization	GKE Autopilot - cluster automation for a hands-off GKE experience
CAST AI is cloud-native platform that automatically analyzes, monitors, and optimizes Kubernetes environments. Companies across e-commerce and adtech use CAST AI to cut their cloud bills by 50% to even 90%.	GKE Autopilot is one of the two modes of operation GKE offers to its users. In Autopilot, the provider both provisions and manages the cluster's underlying infrastructure to optimize the clusters running in GKE.

CAST AI vs. Google Autopilot - quick feature comparison

Detailed feature comparison of Google Autopilot and CAST AI

Observability, logging, and cost visibility
Automated cost optimization
Preemptible and Spot VMs automation
Full multi cloud optimization
Security
Pricing

1. Observability, logging, and cost visibility

Cost visibility

CAST AI divides cloud costs into project, cluster, namespace, and deployment levels. You can track expenses down to individual microservices before calculating the total cost of your cluster.

The solution uses industry-standard metrics that work with any cloud provider, not only Google Cloud Platform.

Cost allocation is also an option in CAST AI - it’s done per cluster and per node. The team plans to add features like control plane, network, egress, storage, and other cost dimensions soon.

GKE Autopilot comes with pre-configured use of Cloud Operations for GKE monitoring dashboards. Users can customize the system and workload logging to get the right metrics.

Multi cloud metrics

Many teams use more than one cloud platform today, so multi-cloud support is essential for visibility and optimization.

CAST AI comes with a range of multi-cloud capabilities. It works with any cloud service provider and provides cross-cloud visibility thanks to universal metrics from Grafana and Kibana.

GKE Autopilot only displays metrics for clusters running in GKE. If you use Amazon Elastic Kubernetes Service (EKS) or Azure Kubernetes Service (EKS), you won’t be able to compare the metrics from all your clusters in one place.

2. Automated cost optimization

Automated instance selection for best cost/performance

CAST AI selects the most cost-effective instance types and sizes to meet your application's requirements while reducing cloud spending. When a cluster requires more nodes, the automation engine selects the instances with the highest performance at the lowest cost. Engineers don’t need to do anything here because everything is automated.

Since using the same instance shape for every node in a cluster can lead to overprovisioning, CAST AI also allows you to create multi-shape clusters. It gives the application the optimal mix of several instance types.

GKE Autopilot uses E2 standard and E2 shared-core machine types that fail to offer an optimal balance between cost and performance. These instances are overcommitted, shared-core, and latency insensitive. GKE Autopilot also doesn't offer multi-shape clusters. GKE Autopilot doesn’t allow you to use GPU instances or multi-shape clusters.

Horizontal and vertical pod autoscaling

CAST AI automates pod scaling parameters to help companies reduce cloud waste. The Horizontal Pod Autoscaler determines the correct number of pod instances based on business KPIs. If no work needs to be done, the platform decreases the replica count of pods until it reaches 0 and then removes all pods.

CAST AI also ensures that the number of nodes in use is always adequate for the application's needs, scaling nodes up and down dynamically.

GKE Autopilot automatically scales cluster resources based on the user’s pod specifications, but users need to configure Horizontal and Vertical Pod Autoscalers on their own. You can implement Horizontal pod autoscaling to automatically increase or decrease the number of pods via the standard Kubernetes CPU or memory metrics, or with custom metrics in Cloud Monitoring.

Note that GKE Autopilot is not a suitable solution for very small pods since the minimum is 0.25 CPU and 0.5GB RAM per pod. Tailoring with the VPA exact requirements is challenging as well since 0.25CPU is used as the increment - you can't assign 0.66 CPUs to the pod because only values like 0.5 or 0.75 CPUs are allowed.

3. Preemptible and Spot VMs automation

When compared to pay-as-you-go VM instances, Preemptible VMs instances provide considerable cost savings - up to 91%! But Google Cloud Platform can reclaim these instances at any time. Teams wishing to take advantage of Preemptible VMs need to automate their processes.

In CAST AI, the replacement of interrupted Spot VMs is fully automated. Teams no longer need to worry about the capacity of their application running out. The platform continuously looks for the best instance alternatives and spins up new instances in milliseconds to provide high availability.

GKE Autopilot doesn’t support Preemptible VMs at the moment.

4. Full multi cloud optimization

As we enter the era of multi cloud, it's more important than ever to monitor, manage, and optimize cloud costs across providers.

CAST AI provides a number of multi cloud capabilities to meet this need:

Active-Active Multi Cloud - the solution distributes apps and replicates data over many cloud services to ensure that even if one fails, the applications continue to operate, ensuring business continuity.
Traffic distribution - CAST AI distributes traffic among all cloud services in use and always picks up and healthy endpoints for global server load balancing.
Metrics across clouds - thanks to data from Grafana and Kibana, the platform provides cost allocation insights across cloud services.

GKE Autopilot doesn’t offer multi cloud support at the moment.

5. Security

GKE Autopilot comes with a number of security limitations:

The VMs in Autopilot don’t run in the customer tenancy, so you don't get the security visibility at the level of the host operating system level. In CAST AI, VMs run inside a customer tenancy or VPC and customers retain full access to the VM host (including security visibility).

This point is often overlooked. It’s uncomfortable that – as a DevOps engineer – you cannot control who else you may share a node with.

It was just a few years ago when sharing the same Intel processors over several VMs resulted in memory leak issues. Meltdown and Spectre vulnerability allowed a rogue process to read all memory, even when it wasn’t authorized to do so.

GKE Autopilot clusters don’t support the Kubernetes PodSecurityPolicy. In GKE versions older than 1.21, OPA Gatekeeper and Policy Controller aren’t supported either.

GKE Autopilot doesn’t support security features such as binary authorization, Kubernetes Alpha APIs, or legacy authentication options

CAST AI comes with a comprehensive set of security features such as encryption at rest/in transit, secrets management, network security, logging, visibility, and more. It provides automatic patching and upgrades to VMs and Kubernetes, so setups are always kept up to date and the chance of errors in your clusters is eliminated.

6. Pricing

To check for potential savings, users can run the CAST AI free Cluster Analyzer. The read-only agent evaluates their infrastructure and makes specific recommendations free of charge. Users can then implement these results manually or turn automatic cost optimization features on, choosing between two options (both with a free trial): Growth and Enterprise. Cost reductions of at least 50% are guaranteed using CAST AI.

GKE Autopilot clusters come at a flat fee of $0.10/h per cluster for every cluster after the free tier, adding to that the CPU, memory, and ephemeral storage compute resources provisioned for the pods. The Autopilot control plane and simple GKE cost $72 per month. But compared to standard GKE, the CPU and RAM costs in Autopilot are double.

For example, an e2-standard-2 machine costs $0.075462 per hour. With Autopilot, the same instance will cost $0.1445536 (calculated for the Northern Virginia region).

Pricing simulation

Here's a pricing simulation that explains the difference between the optimization results from applying CAST AI and GKE Autopilot.

Let's start with a look at GKE Autopilot pricing:

In this scenario, we have a manually optimized cluster that wastes only 25% of resources. We pay some $20k for the cluster but the actual pod requests amount to c. $15k. By switching to GKE Autopilot, the cluster costs rise to almost $30k. If you run the free analysis at CAST AI and implement its recommendations manually, you can slash the cost of your GKE cluster by 50%.

What if you're dealing with a much larger waste volume? Using GKE Autopilot helps to reduce the costs significantly. But CAST AI brings even greater savings, as visualized in this example:

Overall winner: CAST AI

Both GKE Autopilot and CAST AI are great solutions for automating many important features of workloads running on GKE.

While Google Autopilot offers several helpful automation features, it comes with many limitations. CAST AI provides teams with a rich array of automation features and gives them customization opportunities for more flexibility. By picking the best VMs - including the heavily-discounted Preemptible VMs - CAST AI guarantees cloud cost savings of at least 50%.

Combined with unique multi cloud functionality and cloud-native architecture, this positions CAST AI as the top cloud cost optimization platform.

P.S. If you'd like to start with something more hands-on, run the free CAST AI Cost Analyzer to check how much you could save and how to get there.

DEV Community