A hosted control plane moves Kubernetes control-plane components off your worker fleet either into a provider-managed boundary (EKS) or onto a separate hosting cluster as pods (HyperShift).
It simplifies ops when you want predictable upgrades, less per-cluster snowflake work, and cleaner separation between “management” and “workloads.”
It adds complexity when control-plane connectivity, IAM, and shared blast radius become your new failure modes especially with private clusters.
Define hosted control plane in concrete terms
If you can’t say where the API server and etcd live, you can’t model risk.
“Hosted control plane” is a placement decision.
EKS: hosted by AWS in an EKS-managed VPC
AWS owns the masters; you own nodes and workloads.
AWS documents that the EKS-managed control plane runs inside an AWS-managed VPC and includes Kubernetes API server nodes and an etcd cluster. API server nodes run in an Auto Scaling group across at least two AZs; etcd nodes span three AZs.
What that means operationally:
- You don’t patch control-plane instances.
- You don’t rebuild etcd.
- You do still own access, RBAC, node lifecycle, and add-ons.
kubeadm on EC2: not hosted, you host it
You run the masters, the etcd, the upgrades, and the recovery drills.
Kubeadm HA requires you to pick a topology (stacked etcd vs external etcd) and wire up the endpoints (often via a load balancer DNS name). External etcd needs explicit endpoint configuration; stacked etcd is “managed automatically” by kubeadm’s topology.
What that means operationally:
- You patch and upgrade the control plane.
- You own etcd snapshots and restore tests.
- You own certificates and rotation edge cases.
HyperShift (hosted control planes): control planes as pods on a hosting cluster
You consolidate many control planes onto one management cluster.
Red Hat’s hosted control planes model runs control planes as pods on a management/hosting cluster, without dedicated VMs per control plane.
HyperShift then introduces a new question: where do those control plane pods land? Docs show “shared everything” by default, and you can dedicate nodes for control plane workloads via labels/taints.
Side-by-side: what gets simpler, what gets harder
Feature lists lie. Ownership and failure modes don’t.
|
Model |
What simplifies |
What gets harder |
The new “pager line” |
|
EKS hosted control plane |
Control plane HA, scaling, replacement; less etcd babysitting |
Endpoint access + SG design for private clusters; version planning |
“Can we reach the API endpoint from the right networks?” |
|
kubeadm on EC2 |
Full control; no managed constraints |
Everything: HA wiring, etcd ops, upgrades, certs |
“etcd is sick” is your incident |
|
HyperShift |
Reduce per-cluster control-plane VMs; faster cluster churn; multi-tenant mgmt |
Hosting cluster becomes shared blast radius; two-layer debugging |
“Hosting cluster health” pages everyone |
When a hosted control plane simplifies operations
Hosted control planes help when your bottleneck is “running too many control planes.”
1) You operate many clusters (multi-tenant SaaS, env sprawl)
Cluster count is the multiplier.
If you run 20+ clusters, self-managed control planes become a tax:
- patch windows multiply
- certificate and etcd risk multiplies
- “one-off cluster drift” becomes normal
EKS removes the control plane instances from your fleet and gives you a standardized control plane architecture across AZs.
HyperShift goes further: it removes dedicated control-plane machines per cluster and runs them as pods on a hosting cluster.
2) You want predictable control-plane availability without building an etcd practice
etcd is not hard until it’s hard at 3 AM.
kubeadm HA docs are clear: external etcd adds configuration surface area (explicit endpoints); stacked etcd is simpler but still your operational problem.
If your team doesn’t want to own etcd restores as a practiced drill, a hosted control plane removes that class of work from your team’s backlog.
3) You need fast cluster create/delete (ephemeral clusters, tenant clusters)
Provisioning speed is operational leverage.
HyperShift is designed around the concept of creating control planes as pods on a management cluster, which reduces the need to “spin up” dedicated control-plane machines per hosted cluster.
That’s useful when:
- you create short-lived clusters for CI
- you provision tenant clusters and churn them
- you want cluster lifecycle to look like Deploying an app
4) You’re private-cluster-heavy and want a supported endpoint model
Private changes the operational shape more than any “feature.”
EKS lets you run a private-only API server endpoint (no public access), where kubectl must come from within the VPC or connected networks. Access to the private endpoint is controlled by rules on the cluster security group.
That’s not “simpler” in absolute terms. It’s simpler because it’s a supported, documented pattern with fewer moving parts than self-hosting your own API endpoint VIP/LB and cert story.
When a hosted control plane adds complexity
You trade “masters on VMs” for “network + IAM + shared blast radius.”
1) Control-plane connectivity becomes a first-class dependency
The API server is now “across a boundary,” and boundaries fail.
With EKS private-only clusters:
- your kubectl, CI runners, and controllers must live inside the VPC or connected networks
- your security group rules become part of cluster availability
With public endpoint access, the default behavior has historically been public enabled / private disabled (and you can toggle both).
Either way, endpoint mode is now a design choice you must document, test, and audit.
What changes for on-call:
- “API is down” might really be “route to endpoint is broken”
- DNS, TGW/peering, SG rules, and client network become suspects
2) Identity boundaries get sharper (and easier to misconfigure)
Hosted control planes push you into “who can reach what” decisions.
Private endpoint + security group control is good. It’s also easy to get wrong:
- over-broad SG rules turn “private endpoint” into “private but reachable from everything”
- too-tight rules break controllers and CI/CD in weird ways
Hosted doesn’t remove IAM work. It moves it to the center of the blast radius.
3) HyperShift’s hosting cluster becomes shared infrastructure
You didn’t delete control planes. You consolidated them.
HyperShift runs control planes as pods on a hosting cluster.
Docs show that hosted control plane pods can be scheduled broadly (“shared everything”), and you can taint/label nodes to dedicate capacity.
This is the operational trade:
- Pro: fewer dedicated control-plane machines per tenant cluster
- Con: hosting cluster saturation, upgrades, or outages can hit multiple hosted clusters at once
If you adopt HyperShift, treat the hosting cluster like tier-0 infrastructure:
- separate node pools
- aggressive monitoring
- strict change control
- tested disaster recovery
4) Debug becomes two-layer
Symptoms show up in the guest cluster; root cause can live elsewhere.
With EKS, control plane is managed. You troubleshoot via endpoint reachability, AWS telemetry, and cluster behavior. You can’t SSH into masters, and that’s the point.
With HyperShift, you can often inspect control plane pods on the hosting cluster. That’s powerful and it means your runbooks must cover two clusters:
- guest cluster symptoms
- hosting cluster root cause
Private clusters: the “hosted” decision that matters most
Private mode turns networking into part of the control plane.
EKS private endpoint: supported, but policy-heavy
SG rules are now part of cluster uptime.
AWS states that for private-only API servers:
- there is no public access from the internet
- kubectl must come from the VPC or connected network
- cluster security group rules control private endpoint access
This is clean if you already run:
- TGW / VPC peering / Direct Connect
- private DNS resolution patterns
- locked-down egress
It’s messy if your ops tooling lives outside the network boundary and you aren’t ready to move it.
kubeadm private: you own the endpoint and its failure modes
You don’t get a managed endpoint; you build one.
kubeadm HA guides assume you Configure a load balancer in front of the control plane nodes and wire up DNS names and endpoints.
That’s flexible. It’s also more work:
- API endpoint LB health checks
- TLS/cert rotation
- routing changes during upgrades
HyperShift private: you design exposure between hosting and guest clusters
Hosted control planes still need reachable endpoints.
Hosted control plane pods live on the hosting cluster. That’s good for consolidation. It also means you must design:
- how guest nodes reach the hosted API server
- how admins reach it (private networks, bastions, CI runners)
- how you segment tenants
The exact networking patterns vary by environment, but the invariant is: private hosted control planes increase the importance of network design.
Terraform: what you actually manage in each model
IaC doesn’t disappear. The resource graph changes.
EKS Terraform surface area
You configure endpoint modes, SGs, node groups, and IAM.
Minimum Terraform concerns:
- endpoint access mode (public/private/both)
- cluster security group rules for private access
- node groups and AMI strategy
- IRSA and IAM boundaries
Hosted control plane simplifies the “masters” part. It does not simplify the access-control part.
kubeadm Terraform surface area
Terraform becomes your control-plane installer, not just a cluster creator.
You end up managing:
- control plane EC2 instances
- LB/VIP in front of API servers (common HA pattern)
- etcd instances (external) or colocated etcd (stacked)
- bootstrap scripts, cert distribution, upgrade workflows
This can be clean if you have mature automation. If not, it’s a lot of state to keep consistent.
HyperShift Terraform surface area
You manage the hosting cluster like a platform, then declaratively create hosted clusters.
HyperShift adds:
- hosting cluster lifecycle (upgrade, capacity, resilience)
- hosted cluster objects and their infra mappings
- scheduling policies for control plane pods (dedicated nodes via labels/taints)
Terraform can drive parts of this, but you’ll also lean on cluster-native controllers.
Prometheus: what you need to watch so hosted doesn’t surprise you
Hosted control planes move failure modes. Your dashboards must follow.
At minimum, split monitoring into two planes:
- Workload plane (guest cluster apps)
- request rates, latency, errors
- node saturation
- queue depth / retries
- Control plane plane
- API server availability/latency from where your clients run
- controller health signals
- for HyperShift: hosting cluster resource pressure, because control planes are pods
For private clusters, add synthetic checks from the networks that matter:
- from CI runner network
- from admin network
- from in-cluster controllers
If the API endpoint is unreachable from your automation network, you don’t have a cluster. You have a museum exhibit.
Decision checklist for SaaS and platform teams
Answer these honestly and the right model usually falls out.
- How many clusters will you run in 12 months?
- If the number is growing fast, hosted control plane saves toil.
- Do you have an etcd practice?
- If “restore drill” isn’t something you run quarterly, kubeadm HA is a risk trade.
- Is private-only mandatory?
- If yes, model endpoint reachability and SG rules as part of uptime.
- Can you tolerate shared blast radius?
- HyperShift consolidates control planes. Treat hosting cluster as tier-0.
- What do you want to debug at 3 AM: VMs or networks?
- kubeadm tends toward VM-level debugging.
- hosted control planes tend toward network/identity debugging.
Where AceCloud fits
Hosted control plane only helps if the day-2 loop is owned and scripted.
If you’re buying hosted control plane benefits but don’t want to run the surrounding ops (endpoint policies, Terraform hygiene, Prometheus wiring, upgrade runbooks), a managed Kubernetes provider like AceCloud can own that platform loop while your team focuses on workload correctness and SLOs.
Bottom line
Hosted control plane is not “less complexity.” It’s different complexity.
Pick a hosted control plane (EKS) when you want AWS to own control plane HA, scaling, and replacement across AZs.
Pick kubeadm when you need maximum control and you’re willing to own HA topology, etcd ops, and endpoint plumbing.
Pick HyperShift when you need to run many clusters and you’re ready to operate a tier-0 hosting cluster that runs control planes as pods.
The correct choice is the one that gives every failure mode a clear owner—and keeps your pager quiet for the right reasons.
Top comments (0)