DEV Community

Cover image for Hosted control plane: when it simplifies operations and when it adds complexity
Daya Shankar
Daya Shankar

Posted on

Hosted control plane: when it simplifies operations and when it adds complexity

hosted control plane moves Kubernetes control-plane components off your worker fleet either into a provider-managed boundary (EKS) or onto a separate hosting cluster as pods (HyperShift). 

It simplifies ops when you want predictable upgrades, less per-cluster snowflake work, and cleaner separation between “management” and “workloads.” 

It adds complexity when control-plane connectivity, IAM, and shared blast radius become your new failure modes especially with private clusters. 

Define hosted control plane in concrete terms

If you can’t say where the API server and etcd live, you can’t model risk.

“Hosted control plane” is a placement decision.

EKS: hosted by AWS in an EKS-managed VPC

AWS owns the masters; you own nodes and workloads.

AWS documents that the EKS-managed control plane runs inside an AWS-managed VPC and includes Kubernetes API server nodes and an etcd cluster. API server nodes run in an Auto Scaling group across at least two AZs; etcd nodes span three AZs. 

What that means operationally:

  • You don’t patch control-plane instances.
  • You don’t rebuild etcd.
  • You do still own access, RBAC, node lifecycle, and add-ons.

kubeadm on EC2: not hosted, you host it

You run the masters, the etcd, the upgrades, and the recovery drills.

Kubeadm HA requires you to pick a topology (stacked etcd vs external etcd) and wire up the endpoints (often via a load balancer DNS name). External etcd needs explicit endpoint configuration; stacked etcd is “managed automatically” by kubeadm’s topology. 

What that means operationally:

  • You patch and upgrade the control plane.
  • You own etcd snapshots and restore tests.
  • You own certificates and rotation edge cases.

HyperShift (hosted control planes): control planes as pods on a hosting cluster

You consolidate many control planes onto one management cluster.

Red Hat’s hosted control planes model runs control planes as pods on a management/hosting cluster, without dedicated VMs per control plane. 

HyperShift then introduces a new question: where do those control plane pods land? Docs show “shared everything” by default, and you can dedicate nodes for control plane workloads via labels/taints. 

Side-by-side: what gets simpler, what gets harder

Feature lists lie. Ownership and failure modes don’t.

Model

What simplifies

What gets harder

The new “pager line”

EKS hosted control plane

Control plane HA, scaling, replacement; less etcd babysitting 

Endpoint access + SG design for private clusters; version planning

“Can we reach the API endpoint from the right networks?” 

kubeadm on EC2

Full control; no managed constraints

Everything: HA wiring, etcd ops, upgrades, certs 

“etcd is sick” is your incident

HyperShift

Reduce per-cluster control-plane VMs; faster cluster churn; multi-tenant mgmt

Hosting cluster becomes shared blast radius; two-layer debugging 

“Hosting cluster health” pages everyone

When a hosted control plane simplifies operations

Hosted control planes help when your bottleneck is “running too many control planes.”

1) You operate many clusters (multi-tenant SaaS, env sprawl)

Cluster count is the multiplier.

If you run 20+ clusters, self-managed control planes become a tax:

  • patch windows multiply
  • certificate and etcd risk multiplies
  • “one-off cluster drift” becomes normal

EKS removes the control plane instances from your fleet and gives you a standardized control plane architecture across AZs. 

HyperShift goes further: it removes dedicated control-plane machines per cluster and runs them as pods on a hosting cluster. 

2) You want predictable control-plane availability without building an etcd practice

etcd is not hard until it’s hard at 3 AM.

kubeadm HA docs are clear: external etcd adds configuration surface area (explicit endpoints); stacked etcd is simpler but still your operational problem. 

If your team doesn’t want to own etcd restores as a practiced drill, a hosted control plane removes that class of work from your team’s backlog.

3) You need fast cluster create/delete (ephemeral clusters, tenant clusters)

Provisioning speed is operational leverage.

HyperShift is designed around the concept of creating control planes as pods on a management cluster, which reduces the need to “spin up” dedicated control-plane machines per hosted cluster. 

That’s useful when:

  • you create short-lived clusters for CI
  • you provision tenant clusters and churn them
  • you want cluster lifecycle to look like Deploying an app

4) You’re private-cluster-heavy and want a supported endpoint model

Private changes the operational shape more than any “feature.”

EKS lets you run a private-only API server endpoint (no public access), where kubectl must come from within the VPC or connected networks. Access to the private endpoint is controlled by rules on the cluster security group. 

That’s not “simpler” in absolute terms. It’s simpler because it’s a supported, documented pattern with fewer moving parts than self-hosting your own API endpoint VIP/LB and cert story. 

When a hosted control plane adds complexity

You trade “masters on VMs” for “network + IAM + shared blast radius.”

1) Control-plane connectivity becomes a first-class dependency

The API server is now “across a boundary,” and boundaries fail.

With EKS private-only clusters:

  • your kubectl, CI runners, and controllers must live inside the VPC or connected networks
  • your security group rules become part of cluster availability 

With public endpoint access, the default behavior has historically been public enabled / private disabled (and you can toggle both).  
Either way, endpoint mode is now a design choice you must document, test, and audit.

What changes for on-call:

  • “API is down” might really be “route to endpoint is broken”
  • DNS, TGW/peering, SG rules, and client network become suspects

2) Identity boundaries get sharper (and easier to misconfigure)

Hosted control planes push you into “who can reach what” decisions.

Private endpoint + security group control is good. It’s also easy to get wrong:

  • over-broad SG rules turn “private endpoint” into “private but reachable from everything”
  • too-tight rules break controllers and CI/CD in weird ways 

Hosted doesn’t remove IAM work. It moves it to the center of the blast radius.

3) HyperShift’s hosting cluster becomes shared infrastructure

You didn’t delete control planes. You consolidated them.

HyperShift runs control planes as pods on a hosting cluster.  
Docs show that hosted control plane pods can be scheduled broadly (“shared everything”), and you can taint/label nodes to dedicate capacity. 

This is the operational trade:

  • Pro: fewer dedicated control-plane machines per tenant cluster
  • Con: hosting cluster saturation, upgrades, or outages can hit multiple hosted clusters at once

If you adopt HyperShift, treat the hosting cluster like tier-0 infrastructure:

  • separate node pools
  • aggressive monitoring
  • strict change control
  • tested disaster recovery

4) Debug becomes two-layer

Symptoms show up in the guest cluster; root cause can live elsewhere.

With EKS, control plane is managed. You troubleshoot via endpoint reachability, AWS telemetry, and cluster behavior. You can’t SSH into masters, and that’s the point.

With HyperShift, you can often inspect control plane pods on the hosting cluster. That’s powerful and it means your runbooks must cover two clusters:

  • guest cluster symptoms
  • hosting cluster root cause

Private clusters: the “hosted” decision that matters most

Private mode turns networking into part of the control plane.

EKS private endpoint: supported, but policy-heavy

SG rules are now part of cluster uptime.

AWS states that for private-only API servers:

  • there is no public access from the internet
  • kubectl must come from the VPC or connected network
  • cluster security group rules control private endpoint access 

This is clean if you already run:

  • TGW / VPC peering / Direct Connect
  • private DNS resolution patterns
  • locked-down egress

It’s messy if your ops tooling lives outside the network boundary and you aren’t ready to move it.

kubeadm private: you own the endpoint and its failure modes

You don’t get a managed endpoint; you build one.

kubeadm HA guides assume you Configure a load balancer in front of the control plane nodes and wire up DNS names and endpoints. 

That’s flexible. It’s also more work:

  • API endpoint LB health checks
  • TLS/cert rotation
  • routing changes during upgrades

HyperShift private: you design exposure between hosting and guest clusters

Hosted control planes still need reachable endpoints.

Hosted control plane pods live on the hosting cluster. That’s good for consolidation. It also means you must design:

  • how guest nodes reach the hosted API server
  • how admins reach it (private networks, bastions, CI runners)
  • how you segment tenants

The exact networking patterns vary by environment, but the invariant is: private hosted control planes increase the importance of network design.

Terraform: what you actually manage in each model

IaC doesn’t disappear. The resource graph changes.

EKS Terraform surface area

You configure endpoint modes, SGs, node groups, and IAM.

Minimum Terraform concerns:

  • endpoint access mode (public/private/both)
  • cluster security group rules for private access 
  • node groups and AMI strategy
  • IRSA and IAM boundaries

Hosted control plane simplifies the “masters” part. It does not simplify the access-control part.

kubeadm Terraform surface area

Terraform becomes your control-plane installer, not just a cluster creator.

You end up managing:

  • control plane EC2 instances
  • LB/VIP in front of API servers (common HA pattern) 
  • etcd instances (external) or colocated etcd (stacked) 
  • bootstrap scripts, cert distribution, upgrade workflows

This can be clean if you have mature automation. If not, it’s a lot of state to keep consistent.

HyperShift Terraform surface area

You manage the hosting cluster like a platform, then declaratively create hosted clusters.

HyperShift adds:

  • hosting cluster lifecycle (upgrade, capacity, resilience)
  • hosted cluster objects and their infra mappings
  • scheduling policies for control plane pods (dedicated nodes via labels/taints) 

Terraform can drive parts of this, but you’ll also lean on cluster-native controllers.

Prometheus: what you need to watch so hosted doesn’t surprise you

Hosted control planes move failure modes. Your dashboards must follow.

At minimum, split monitoring into two planes:

  1. Workload plane (guest cluster apps)
  2. request rates, latency, errors
  3. node saturation
  4. queue depth / retries
  5. Control plane plane
  6. API server availability/latency from where your clients run
  7. controller health signals
  8. for HyperShift: hosting cluster resource pressure, because control planes are pods 

For private clusters, add synthetic checks from the networks that matter:

  • from CI runner network
  • from admin network
  • from in-cluster controllers

If the API endpoint is unreachable from your automation network, you don’t have a cluster. You have a museum exhibit.

Decision checklist for SaaS and platform teams

Answer these honestly and the right model usually falls out.

  1. How many clusters will you run in 12 months?
  2. If the number is growing fast, hosted control plane saves toil.
  3. Do you have an etcd practice?
  4. If “restore drill” isn’t something you run quarterly, kubeadm HA is a risk trade. 
  5. Is private-only mandatory?
  6. If yes, model endpoint reachability and SG rules as part of uptime. 
  7. Can you tolerate shared blast radius?
  8. HyperShift consolidates control planes. Treat hosting cluster as tier-0. 
  9. What do you want to debug at 3 AM: VMs or networks?
  10. kubeadm tends toward VM-level debugging.
  11. hosted control planes tend toward network/identity debugging.

Where AceCloud fits

Hosted control plane only helps if the day-2 loop is owned and scripted.

If you’re buying hosted control plane benefits but don’t want to run the surrounding ops (endpoint policies, Terraform hygiene, Prometheus wiring, upgrade runbooks), a managed Kubernetes provider like AceCloud can own that platform loop while your team focuses on workload correctness and SLOs.

Bottom line

Hosted control plane is not “less complexity.” It’s different complexity.

Pick a hosted control plane (EKS) when you want AWS to own control plane HA, scaling, and replacement across AZs.  
Pick kubeadm when you need maximum control and you’re willing to own HA topology, etcd ops, and endpoint plumbing.  
Pick HyperShift when you need to run many clusters and you’re ready to operate a tier-0 hosting cluster that runs control planes as pods. 

The correct choice is the one that gives every failure mode a clear owner—and keeps your pager quiet for the right reasons.

 

Top comments (0)