<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrew Bagan</title>
    <description>The latest articles on DEV Community by Andrew Bagan (@butlerlabs).</description>
    <link>https://dev.to/butlerlabs</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3941339%2F69d3e6fb-755c-43fc-841f-a1dcbb67c4d2.png</url>
      <title>DEV Community: Andrew Bagan</title>
      <link>https://dev.to/butlerlabs</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/butlerlabs"/>
    <language>en</language>
    <item>
      <title>Butler: Kubernetes-as-a-Service for the Rest of Us</title>
      <dc:creator>Andrew Bagan</dc:creator>
      <pubDate>Tue, 26 May 2026 23:34:19 +0000</pubDate>
      <link>https://dev.to/butlerlabs/butler-kubernetes-as-a-service-for-the-rest-of-us-2c9c</link>
      <guid>https://dev.to/butlerlabs/butler-kubernetes-as-a-service-for-the-rest-of-us-2c9c</guid>
      <description>&lt;p&gt;Butler is an open-source Kubernetes-as-a-Service platform. We built it because every platform team we've worked with spent twelve to eighteen months building the same internal tool, only for the platform to ossify when the engineer who built it left.&lt;/p&gt;

&lt;p&gt;This walks through what Butler is, how the management cluster gets built from scratch, what happens when you create a TenantCluster, and how all the pieces fit together: architecture, CRDs, and provisioning flows.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Platform Engineering Bottleneck
&lt;/h2&gt;

&lt;p&gt;Platform teams get asked to build Kubernetes-as-a-Service internally. Self-service cluster provisioning, multi-tenancy, day-2 operations, a console, a CLI, GitOps integration. The typical approach is to stitch together Cluster API with custom controllers, build a homegrown dashboard, and write months of glue code. Twelve to eighteen months later, you have something that works for the three teams who helped design it.&lt;/p&gt;

&lt;p&gt;The problem isn't Kubernetes itself. It's everything around it. Provisioning VMs on your infrastructure, bootstrapping nodes, installing CNI and storage and ingress, managing certificates, enforcing tenant isolation, providing a self-service interface that developers actually use. Each piece is solvable on its own. The integration is what kills you.&lt;/p&gt;

&lt;p&gt;Butler is our answer to that integration problem. Open source, CRD-driven, infrastructure-agnostic.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is Butler
&lt;/h2&gt;

&lt;p&gt;You define a TenantCluster resource, and Butler provisions everything: hosted control plane via Steward, worker VMs through standard Cluster API providers, networking, storage, ingress, certificates. The entire API is Kubernetes CRDs. &lt;code&gt;kubectl apply -f&lt;/code&gt; and you have a cluster.&lt;/p&gt;

&lt;p&gt;Infrastructure-agnostic means the architecture supports any provider with a Cluster API implementation. Harvester and Nutanix are fully stable for both management and tenant clusters. AWS, GCP, and Azure are stable for bootstrapping a management cluster, with tenant cluster support in development. Proxmox and VMware are in development for both layers. Butler targets the organizations running their own infrastructure, the ones who need KaaS the most and have the fewest options.&lt;/p&gt;

&lt;p&gt;Complete platform means it ships with an API server, a web console, a dual CLI, and a GitOps export pipeline. Not a toolkit that requires assembly. Deploy Butler on a management cluster and start provisioning tenant clusters.&lt;/p&gt;

&lt;p&gt;Who is this for? Platform engineering teams building internal Kubernetes platforms. Infrastructure teams running on-prem or hybrid environments who need self-service cluster provisioning without buying a commercial product. Homelabbers and small teams who want the same multi-tenancy and lifecycle management that large organizations get from vendor platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bootstrapping the Management Cluster
&lt;/h2&gt;

&lt;p&gt;Before you can provision tenant clusters, you need a management cluster. &lt;code&gt;butleradm&lt;/code&gt; ships two ways to do this from zero: an interactive dashboard with a wizard, and a config-file path for CI.&lt;/p&gt;

&lt;p&gt;The interactive path: &lt;code&gt;butleradm tui&lt;/code&gt; launches the platform dashboard. For platform admins, tab 0 is a bootstrap wizard built on Charm &lt;code&gt;huh&lt;/code&gt; (forms) with Bubbletea + Bubbles + Lipgloss for the surrounding dashboard and live progress views. The wizard walks you through provider selection (Harvester, Nutanix, Proxmox, AWS, Azure, GCP), credentials, async infrastructure discovery (networks, storage classes, VM images, machine sizes), resource selection, cluster sizing, and networking. Review screen. Optional image factory sync. Then it hands the assembled config to the orchestrator.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcaabyo4phhaq1fb45u7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxcaabyo4phhaq1fb45u7.png" alt=" " width="800" height="440"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The headless path: &lt;code&gt;butleradm bootstrap --config cluster.yaml&lt;/code&gt; takes the same config from a file and drives the same orchestrator. Useful in CI or for repeatable infrastructure provisioning where you want the bring-up checked into Git.&lt;/p&gt;

&lt;p&gt;The orchestrator creates a temporary KIND cluster on your workstation, deploys CRDs, a bootstrap controller, and a lightweight provider controller, then creates a ClusterBootstrap resource. The bootstrap controller takes over and drives the management cluster through a series of phases:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ProvisioningMachines&lt;/strong&gt; creates VMs on your infrastructure via internal MachineRequest CRDs. These are fulfilled by lightweight provider controllers (butler-provider-harvester and butler-provider-nutanix today, with additional providers following the same pattern) that only exist for this one-time bootstrap. Once the management cluster is running, tenant clusters use standard CAPI providers instead.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;ConfiguringTalos&lt;/strong&gt; generates Talos Linux machine configs and applies them to each node. Talos is the immutable Linux distribution covered in the Immutable Infrastructure section below; the bootstrap flow generates its declarative machine configs and pushes them to each node. Single-node and HA topologies are both supported.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;BootstrappingCluster&lt;/strong&gt; runs &lt;code&gt;talosctl bootstrap&lt;/code&gt; on the first control plane node, initializes etcd, and waits for the Kubernetes API to come up. Once it does, the bootstrap controller retrieves a kubeconfig and stores it in the ClusterBootstrap status.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;InstallingAddons&lt;/strong&gt; installs platform components in dependency order on the new cluster: Cilium for CNI, Steward for hosted control planes, CAPI infrastructure providers, butler-controller, butler-server, butler-console. When everything is healthy, the temporary KIND cluster is torn down. You have a self-managing management cluster running on Talos.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2tk14g728d7s0gb9s6e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa2tk14g728d7s0gb9s6e.png" alt=" " width="799" height="372"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;A TenantCluster resource. This is the entire interface for requesting a Kubernetes cluster:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;apiVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;butler.butlerlabs.dev/v1alpha1&lt;/span&gt;
&lt;span class="na"&gt;kind&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;TenantCluster&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;production&lt;/span&gt;
  &lt;span class="na"&gt;namespace&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;platform-engineering&lt;/span&gt;
&lt;span class="na"&gt;spec&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;kubernetesVersion&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;v1.32.2&lt;/span&gt;
  &lt;span class="na"&gt;providerConfigRef&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;harvester&lt;/span&gt;
  &lt;span class="na"&gt;workers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;replicas&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;machineTemplate&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;cpu&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
      &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;8Gi&lt;/span&gt;
      &lt;span class="na"&gt;diskSize&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;25Gi&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Apply this. butler-controller reconciles it into a complete Kubernetes cluster:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A StewardControlPlane resource is created. Steward runs the tenant API server, controller-manager, and scheduler as pods in the management cluster.&lt;/li&gt;
&lt;li&gt;A standard CAPI Cluster and MachineDeployment are created. The CAPI infrastructure provider (CAPK for Harvester, CAPX for Nutanix, etc.) creates the VMs.&lt;/li&gt;
&lt;li&gt;TenantAddon resources reconcile the baseline addons onto the cluster. The set varies by provider: on-prem clusters get a CNI, a load balancer (MetalLB), a storage layer (Longhorn), an ingress controller (Traefik), and cert-manager. Cloud clusters skip the layers the cloud provider already gives you natively and keep the rest.&lt;/li&gt;
&lt;li&gt;IP allocations are requested from NetworkPools for node IPs and load balancer pools.&lt;/li&gt;
&lt;li&gt;Certificates are issued and rotated automatically.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;When all of this is done, you have a fully-provisioned tenant cluster. The TenantCluster status reflects every step.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Makes Steward Different
&lt;/h2&gt;

&lt;p&gt;Butler uses Steward for hosted control planes. Steward runs tenant API servers as Deployments in the management cluster. Each tenant gets its own kube-apiserver, controller-manager, and scheduler. Workers connect back via Konnectivity tunnels. etcd is shared (multi-tenant) or dedicated, your choice. Control planes as pods, not VMs. At fifty tenant clusters, that's the difference between 150+ VMs and 50 Deployments.&lt;/p&gt;

&lt;p&gt;Steward is Apache 2.0 and targeting CNCF Sandbox. Two specific capabilities are worth calling out.&lt;/p&gt;

&lt;h3&gt;
  
  
  Open-source Ingress and Gateway API exposure modes
&lt;/h3&gt;

&lt;p&gt;In most hosted control plane operators, exposing tenant API servers through a shared Ingress controller or a Gateway API listener is gated behind a commercial license. We needed it for our own deployments and shipped it open-source from day one.&lt;/p&gt;

&lt;p&gt;Steward supports three exposure modes, all in the Apache 2.0 build:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;LoadBalancer&lt;/strong&gt;: one IP per tenant cluster. The simplest case.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Ingress&lt;/strong&gt;: one IP for unlimited tenant clusters via SNI routing through Traefik, HAProxy, NGINX, or any standard ingress controller.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Gateway API&lt;/strong&gt;: native Gateway API routing through any Gateway controller, with full ParentRef support.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;IP exhaustion is a real problem at scale. One hundred tenant clusters at one LoadBalancer IP each costs roughly $1,600/month of cloud LB charges, or a serious chunk of an on-prem IP pool. The math drives the feature. The license shouldn't.&lt;/p&gt;

&lt;h3&gt;
  
  
  Talos Linux tenant workers on hosted control planes
&lt;/h3&gt;

&lt;p&gt;This is the more novel piece. As far as we know, Butler is the only platform shipping it today.&lt;/p&gt;

&lt;p&gt;Hosted control plane operators run the control plane as pods on a management cluster, then attach tenant worker VMs that connect back via Konnectivity. Standard Linux workers (Ubuntu, Rocky, Flatcar) work because they bring their own kubelet certificates and the API server signs the CSRs.&lt;/p&gt;

&lt;p&gt;Talos Linux workers do not work out of the box with hosted control planes. Talos relies on a &lt;code&gt;trustd&lt;/code&gt; gRPC service to sign apid certificates, and that service runs on a Talos control plane node by design. When the control plane is a pod instead of a Talos node, there is no trustd. Talos workers get stuck in the &lt;code&gt;Booting&lt;/code&gt; stage with recurring &lt;code&gt;secrets.APIController&lt;/code&gt; errors, and &lt;code&gt;talosctl&lt;/code&gt; against them fails.&lt;/p&gt;

&lt;p&gt;We wrote &lt;strong&gt;steward-trustd&lt;/strong&gt;: a Talos-compatible trustd service that Steward attaches as a sidecar inside the TenantControlPlane pod when the tenant cluster's workers are Talos. It implements the Talos &lt;code&gt;SecurityService&lt;/code&gt; gRPC interface and signs apid certificates with a Steward-managed OS CA. Tenant clusters on Rocky, Flatcar, or other supported OSes don't get the sidecar; the attachment is per-tenant, conditional on the worker OS. For Talos tenants, workers reach &lt;code&gt;Running&lt;/code&gt;, &lt;code&gt;talosctl&lt;/code&gt; works against them, and the immutable-Linux story holds top to bottom on every Talos tenant Butler provisions.&lt;/p&gt;

&lt;p&gt;steward-trustd lives at &lt;code&gt;github.com/butlerdotdev/steward-trustd&lt;/code&gt;, Apache 2.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  The CRD Interface
&lt;/h2&gt;

&lt;p&gt;Butler's entire API is Kubernetes CRDs under &lt;code&gt;butler.butlerlabs.dev/v1alpha1&lt;/code&gt;. No proprietary REST API. No database. The same CRDs work via kubectl, the CLI, the console, or any GitOps pipeline pointed at the manifests.&lt;/p&gt;

&lt;p&gt;CRDs are the API contract. kubectl, the CLI, the console, and GitOps export all create or operate on the same resources. There's no drift between what the UI shows and what exists in the cluster. Debugging uses standard tools: &lt;code&gt;kubectl get tenantclusters&lt;/code&gt;, &lt;code&gt;kubectl describe team backend&lt;/code&gt;, &lt;code&gt;kubectl get ipallocations&lt;/code&gt;. Migration to or from Butler is just YAML. Nothing lives in a proprietary database you'd have to translate out of.&lt;/p&gt;

&lt;h2&gt;
  
  
  Immutable Infrastructure, Top to Bottom
&lt;/h2&gt;

&lt;p&gt;We're opinionated about immutability. The management cluster runs Talos Linux: no SSH, no package manager, read-only filesystem. All configuration comes from declarative YAML. You can't log in and tweak things. That's the point.&lt;/p&gt;

&lt;p&gt;Steward runs tenant control planes as pods, not VMs. Control plane upgrades are pod restarts, not in-place package updates. Roll out a new Kubernetes version and Steward replaces the Deployment. If a control plane pod dies, Kubernetes restarts it. Standard container semantics.&lt;/p&gt;

&lt;p&gt;Tenant worker nodes can run Talos Linux, Rocky Linux, Flatcar Container Linux, Bottlerocket, or Kairos depending on what the team needs. Node upgrades are VM replacements: create new VM, drain old VM, destroy old VM. No SSH-ing in to run &lt;code&gt;apt upgrade&lt;/code&gt;. Any node can be replaced at any time because nothing on a node is special.&lt;/p&gt;

&lt;p&gt;This is cattle, not pets. All the way down.&lt;/p&gt;

&lt;h2&gt;
  
  
  Teams and RBAC
&lt;/h2&gt;

&lt;p&gt;Multi-tenancy in Butler isn't a feature that got bolted on. It's the foundation.&lt;/p&gt;

&lt;p&gt;Butler has two layers of access control: &lt;strong&gt;platform-level roles&lt;/strong&gt; and &lt;strong&gt;team-level roles&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;At the platform level, there are two roles. A &lt;strong&gt;platform admin&lt;/strong&gt; has full platform access: manage all teams, users, providers, identity providers, addon catalog, and platform configuration. A &lt;strong&gt;platform viewer&lt;/strong&gt; (also referred to as shadow view or shadow mode in the console) has the same cross-team visibility but read-only. Shadow mode exists for the people who need to see the whole platform without being able to change it: support engineers, auditors, on-call operators investigating an incident, executives who want to peek at fleet state without holding a destructive credential. Console UI shifts color when you're in either platform-level role so there's no ambiguity about the context you're operating in.&lt;/p&gt;

&lt;p&gt;Within a team, three roles control access: &lt;strong&gt;admin&lt;/strong&gt; (full team control including membership management), &lt;strong&gt;operator&lt;/strong&gt; (create and manage clusters and addons, can't change team settings), and &lt;strong&gt;viewer&lt;/strong&gt; (read-only). The console visually distinguishes all three contexts so users always know what role they're acting with.&lt;/p&gt;

&lt;p&gt;A Team resource owns a namespace. All resources in that namespace (TenantClusters, TenantAddons, team-scoped ProviderConfigs) belong to that team. Team membership is re-resolved on every API request. The SessionMiddleware in butler-server reads the JWT, then checks the current Team CRD to verify the user still has access. Remove someone from the Team resource and they lose access immediately. No cache, no delay, no waiting for a token to expire.&lt;/p&gt;

&lt;p&gt;Resource quotas are enforced per team. Admission webhooks validate Team / TenantCluster / NetworkPool / ProviderConfig on every create and update, so a team that has hit its cluster-count limit gets rejected at the API server, not after the controller has already started provisioning.&lt;/p&gt;

&lt;h3&gt;
  
  
  Environments
&lt;/h3&gt;

&lt;p&gt;Inside a team, you can carve clusters into named environments: dev, stage, prod, per-user sandboxes, shared utilities, whatever shape your delivery pipeline takes. Environments are an explicit feature on the Team CRD, not a naming convention you enforce by hand. Operators pick the names; Butler doesn't lock you into a fixed taxonomy.&lt;/p&gt;

&lt;p&gt;Each environment can carry its own limits (max clusters, max clusters per member, capped within the team ceiling), its own cluster defaults (Kubernetes version, machine sizing, OS, addons that get applied automatically when a cluster is created in that environment), and its own additive access block that elevates team members within just that environment without giving them broader team privileges. A developer can be an operator in dev and a viewer in prod; a service team can have its own sandbox environment without touching anyone else's.&lt;/p&gt;

&lt;p&gt;Clusters land in an environment via the &lt;code&gt;butler.butlerlabs.dev/environment&lt;/code&gt; label. The label is set at cluster creation by whichever interface the user picks (console form, CLI flag, or YAML). Moving a cluster between environments goes through a dedicated API (&lt;code&gt;PUT /clusters/{namespace}/{name}/environment&lt;/code&gt;) so the move respects the destination environment's limits and defaults. Existing unlabeled clusters keep working and count against the team total; &lt;code&gt;butleradm env migrate&lt;/code&gt; can backfill labels when you decide to start using environments.&lt;/p&gt;

&lt;p&gt;The console's team and cluster views are environment-aware. Switching environments scopes the cluster list, addon catalog, and observability views without leaving the team.&lt;/p&gt;

&lt;h2&gt;
  
  
  SSO and Authentication
&lt;/h2&gt;

&lt;p&gt;Butler supports OIDC-based single sign-on with any compliant provider: Google Workspace, Microsoft Entra ID, Okta, Keycloak, Auth0, or anything that implements OIDC Discovery. An IdentityProvider CRD configures the connection.&lt;/p&gt;

&lt;p&gt;When a user logs in via SSO, Butler auto-creates a User resource with their email, name, and SSO provider info. No manual user provisioning for SSO users.&lt;/p&gt;

&lt;p&gt;OIDC group sync maps identity provider groups to team roles. Google Workspace groups are fetched via Admin SDK (since Google doesn't include them in the OIDC token). Microsoft Entra and Okta groups come from the &lt;code&gt;groups&lt;/code&gt; claim in the JWT. Group name normalization handles domain suffixes and LDAP DN format. Your identity provider groups map directly to team roles.&lt;/p&gt;

&lt;p&gt;Internal users (email and password) are supported too, with invite flow, bcrypt password hashing, and account lockout on failed attempts. Both auth methods produce the same JWT session.&lt;/p&gt;

&lt;p&gt;The CLI authenticates via OAuth Device Flow. &lt;code&gt;butlerctl login --server https://butler.example.com&lt;/code&gt; runs an RFC 8628 device authorization. butler-server returns a kubeconfig containing a Kubernetes ServiceAccount scoped to your teams. The SA name encodes your Butler user identity (visible in Kubernetes audit logs). Roles and RoleBindings enforce your Butler role (admin, operator, viewer) at the Kubernetes API. The practical result: every CLI operation is subject to the same RBAC as the console. There is no path where a user can bypass team membership by holding a different kubeconfig.&lt;/p&gt;

&lt;h2&gt;
  
  
  Built-in IPAM
&lt;/h2&gt;

&lt;p&gt;On-premises Kubernetes has an IP address problem. Every cluster needs IPs for nodes and load balancers. Most teams manage this with spreadsheets or static assignments.&lt;/p&gt;

&lt;p&gt;Butler has a built-in IPAM system modeled after the PVC/PV pattern. NetworkPool CRDs define a CIDR with reserved ranges that should be excluded (management cluster IPs, infrastructure devices, anything else off-limits) and a &lt;code&gt;tenantAllocation&lt;/code&gt; block that bounds the actual allocatable sub-range. Per-tenant defaults specify how many node IPs and load balancer IPs a new tenant gets; the defaults are 5 node IPs and 8 LB IPs per tenant.&lt;/p&gt;

&lt;p&gt;IPAllocation CRDs represent individual allocations with a Pending to Allocated lifecycle. One allocation per type (nodes or load balancer) per tenant cluster. The NetworkPool controller is the sole allocator using best-fit search across the pool's free blocks. No distributed locks needed because controller-runtime serializes reconciles per pool. Allocations are returned in CIDR notation when the range is power-of-2 aligned, or &lt;code&gt;start-end&lt;/code&gt; when it isn't.&lt;/p&gt;

&lt;p&gt;NetworkPool status surfaces totalIPs, allocatedIPs, availableIPs, allocationCount, fragmentationPercent, and largestFreeBlock so a platform operator can see exactly how healthy the pool is at any moment. Load balancer pools grow and shrink elastically based on cluster utilization.&lt;/p&gt;

&lt;p&gt;Define your network once, Butler handles allocation for every cluster.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Addon Catalog
&lt;/h2&gt;

&lt;p&gt;Butler bootstraps tenant clusters with a set of reasonable defaults appropriate to the provider. On-prem clusters get a CNI (Cilium), a load balancer (MetalLB), a storage layer (Longhorn), cert-manager, and an ingress controller (Traefik). Cloud clusters skip the layers the cloud already covers natively: managed load balancers and managed ingress, for example, come with the cloud, so installing MetalLB and Traefik on top would be redundant. The defaults are pragmatic per-provider rather than a fixed list applied everywhere. Beyond that, the catalog includes optional addons across more than a dozen categories: observability, GitOps, backup, security, service mesh, databases, DNS, messaging, and more.&lt;/p&gt;

&lt;p&gt;The catalog is yours to customize. Every addon is an AddonDefinition CRD with a chart repository, default version, category, dependencies, and default Helm values. Add your own AddonDefinition resources for internal charts. Remove optional addons you don't use. Change default versions or values. The catalog is a starting point, not a fixed list.&lt;/p&gt;

&lt;p&gt;Installing an addon on a tenant cluster creates a TenantAddon resource. The controller respects dependency ordering (MetalLB waits for Cilium), handles Helm lifecycle (install, upgrade, rollback), and tracks phase progression. You can also install arbitrary Helm charts that aren't in the catalog by specifying the chart details directly in the TenantAddon spec.&lt;/p&gt;

&lt;h2&gt;
  
  
  Three Ways In
&lt;/h2&gt;

&lt;p&gt;Butler has three interaction modes. They're equals, not a hierarchy. All three create the same CRDs. A cluster created via console is identical to one created via CLI or API.&lt;/p&gt;

&lt;h3&gt;
  
  
  Console
&lt;/h3&gt;

&lt;p&gt;The React web UI with real-time WebSocket updates:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Cluster creation wizard. Select provider, Kubernetes version, node sizing, OS, addons. Creates the same TenantCluster resource that kubectl would.&lt;/li&gt;
&lt;li&gt;Real-time cluster status. WebSocket connection for live phase updates, worker node health, addon installation progress.&lt;/li&gt;
&lt;li&gt;Addon management. Enable, disable, and configure addons per cluster. Each action creates or modifies TenantAddon resources.&lt;/li&gt;
&lt;li&gt;In-browser terminal. xterm.js-based kubectl access to tenant clusters directly from the console.&lt;/li&gt;
&lt;li&gt;Team switcher. Switch teams with a click. Every view is scoped to the active team.&lt;/li&gt;
&lt;li&gt;Audit log. Every console and API operation captured with the resolving Butler identity.&lt;/li&gt;
&lt;li&gt;GitOps export. One-click migration from imperative Helm installs to declarative Flux or ArgoCD manifests.&lt;/li&gt;
&lt;li&gt;Identity provider configuration. Presets for Google Workspace, Microsoft Entra ID, and Okta.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjq2xsdjrpmq0b0vvvv82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjq2xsdjrpmq0b0vvvv82.png" alt=" " width="799" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  CLI
&lt;/h3&gt;

&lt;p&gt;Two binaries following the kubeadm/kubectl pattern:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;butlerctl&lt;/code&gt; for platform users: cluster operations, addon management, kubeconfig export, GitOps export&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;butleradm&lt;/code&gt; for platform operators: bootstrap, upgrade, provider and addon catalog management&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;butlerctl login --server&lt;/code&gt; runs the OAuth device flow against butler-server and writes a kubeconfig scoped to your team and role. After login, the CLI talks directly to the Kubernetes API on the management cluster. The scoped kubeconfig means CLI operations enforce the same Butler RBAC as the console.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# Login&lt;/span&gt;
butlerctl login &lt;span class="nt"&gt;--server&lt;/span&gt; https://butler.example.com

&lt;span class="c"&gt;# Create a cluster&lt;/span&gt;
butlerctl cluster create &lt;span class="nt"&gt;--name&lt;/span&gt; my-cluster &lt;span class="nt"&gt;--kubernetes-version&lt;/span&gt; v1.32.2

&lt;span class="c"&gt;# Get the kubeconfig&lt;/span&gt;
butlerctl cluster kubeconfig &lt;span class="nt"&gt;--name&lt;/span&gt; my-cluster &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; kubeconfig.yaml

&lt;span class="c"&gt;# Scale workers&lt;/span&gt;
butlerctl cluster scale &lt;span class="nt"&gt;--name&lt;/span&gt; my-cluster &lt;span class="nt"&gt;--workers&lt;/span&gt; 5

&lt;span class="c"&gt;# Enable an addon&lt;/span&gt;
butlerctl addon &lt;span class="nb"&gt;enable&lt;/span&gt; &lt;span class="nt"&gt;--cluster&lt;/span&gt; my-cluster &lt;span class="nt"&gt;--addon&lt;/span&gt; longhorn
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Server API
&lt;/h3&gt;

&lt;p&gt;The REST + WebSocket API backs the console, but it's also a first-class integration point. CI/CD pipelines, billing systems, CRM integrations, Slack bots, anything that speaks HTTP can create and manage clusters. The API uses the same JWT authentication and team-scoped RBAC as the console.&lt;/p&gt;

&lt;h2&gt;
  
  
  GitOps Export
&lt;/h2&gt;

&lt;p&gt;You have a tenant cluster running Helm releases. Some were installed by Butler, some by your team, some by a script someone wrote six months ago. You want to move to GitOps. Normally that means manually writing Flux HelmRelease manifests for every chart, getting the versions and values right, structuring your repo, and hoping you didn't miss anything.&lt;/p&gt;

&lt;p&gt;Butler's GitOps export automates the entire thing. Enable GitOps on a cluster from the console. Butler installs Flux on the target cluster, then queries Helm's secret storage to discover every release running there. Not just Butler-managed addons. Every Helm release, regardless of how it was installed. For releases that match Butler's addon catalog, it pulls repo URLs and categories automatically. For everything else, it extracts metadata from the chart itself. Either way, you get complete Flux HelmRelease, HelmRepository, and Kustomization manifests in a clean directory structure. ArgoCD Application manifests too, if that's your engine.&lt;/p&gt;

&lt;p&gt;Preview before committing. Export one release or bulk migrate everything at once. Commit directly to your Git repo or open a pull request. Supports GitHub, GitLab, and Bitbucket. From that point on, the cluster reconciles from Git. The same flow works for the management cluster itself.&lt;/p&gt;

&lt;p&gt;The state model being CRDs is what makes the export clean: nothing has to be translated out of a proprietary database. What lands in Git is portable YAML.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihhzd1xq5wk9sve84spe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fihhzd1xq5wk9sve84spe.png" alt=" " width="800" height="405"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on Licensing
&lt;/h2&gt;

&lt;p&gt;Multi-cluster Kubernetes platforms exist already, and several of them solve real engineering problems. The pattern we kept running into wasn't whether the engineering worked. It was the licensing.&lt;/p&gt;

&lt;p&gt;The open-core model has become the dominant business pattern in this space. A vendor open-sources a core project, builds a community around it, and over time moves the features that platform engineers actually need (SSO, RBAC, multi-tenancy, advanced exposure modes, fleet management at scale) behind a commercial license. The license itself comes in a half-dozen flavors: a per-node fee, a per-cluster fee, a per-control-plane fee, a "platform" tier that's quoted on a sales call, a "support subscription" that's required for any production use, a "free for development" tier with a hidden CPU-core cap. Each new release brings either a new gated feature or a re-tiering of an existing one.&lt;/p&gt;

&lt;p&gt;The features that get gated aren't novelties. They're the table-stakes capabilities for running Kubernetes in any organization with more than one team: identity sync, role-based access control, audit logging, hosted control planes that scale past trivial cluster counts, advanced ingress modes, and so on. Engineers know these are necessary. Procurement teams discover the cost only after the platform is already deployed and the migration cost is sunk.&lt;/p&gt;

&lt;p&gt;Butler is built the other way around. The features you'd need to operate Butler at any reasonable scale are in the Apache 2.0 source: SSO, RBAC, resource quotas, multi-tenancy, hosted control planes with all exposure modes, the addon catalog, the GitOps export. There is no enterprise tier. There is no "platform" SKU you find out about during evaluation. The code on GitHub is the code we run.&lt;/p&gt;

&lt;p&gt;Butler Labs isn't a product company. We're a boutique consultancy, a platform engineering accelerator. We help organizations build a platform engineering practice end to end: assessment, strategy, accelerated deployment, day-2 operations. Butler, Butler Portal, and the broader ecosystem we build are tools that accelerate the methodology; the practice is the point. The platforms underneath stay open-source because they're toolkits, not product lines.&lt;/p&gt;

&lt;h2&gt;
  
  
  Current State
&lt;/h2&gt;

&lt;p&gt;Honest accounting of where we are.&lt;/p&gt;

&lt;h3&gt;
  
  
  What's Running Today
&lt;/h3&gt;

&lt;p&gt;Butler Labs runs Butler in production on Harvester. Two enterprise customers run it in production on Nutanix.&lt;/p&gt;

&lt;p&gt;Provider support matrix today:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Harvester:&lt;/strong&gt; fully stable for management and tenant clusters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Nutanix:&lt;/strong&gt; fully stable for management and tenant clusters&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;AWS, GCP, Azure:&lt;/strong&gt; stable for bootstrapping the management cluster; tenant cluster provisioning in development&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Proxmox, VMware:&lt;/strong&gt; in development for both management and tenant clusters&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Beyond infrastructure providers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Hosted control planes via Steward&lt;/strong&gt;: LoadBalancer, Ingress, and Gateway API exposure modes&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Talos tenant workers on hosted control planes&lt;/strong&gt;: steward-trustd sidecar attached per-tenant when workers are Talos&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Web console&lt;/strong&gt;: cluster management, addon lifecycle, in-browser terminal, team management, audit log, certificate rotation status, GitOps export, admin dashboard&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;CLI&lt;/strong&gt;: butlerctl and butleradm. &lt;code&gt;butlerctl login&lt;/code&gt; runs OAuth device flow producing a scoped kubeconfig. Interactive bootstrap via &lt;code&gt;butleradm tui&lt;/code&gt; (tab 0). Headless bootstrap via &lt;code&gt;butleradm bootstrap --config&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;SSO&lt;/strong&gt;: OIDC with Google Workspace, Microsoft Entra, Okta, or any OIDC provider. Internal users with invite flow.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Multi-tenancy&lt;/strong&gt;: Teams with named environments (dev/stage/prod/sandboxes), three team-level roles (admin/operator/viewer) plus two platform-level roles (admin and viewer / shadow mode), resource quotas enforced by admission webhooks, OIDC group sync&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Addon catalog&lt;/strong&gt;: baseline addons chosen per provider (on-prem clusters get CNI, load balancer, storage, ingress, cert-manager; cloud clusters skip what the cloud already covers natively), optional addons across more than a dozen categories&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Management cluster bootstrap&lt;/strong&gt;: &lt;code&gt;butleradm tui&lt;/code&gt; (interactive dashboard, tab 0 is the wizard) or &lt;code&gt;butleradm bootstrap --config&lt;/code&gt; (headless from a file). Both drive the same orchestrator.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Immutable infrastructure&lt;/strong&gt;: Talos Linux management nodes, VM-replacement worker upgrades&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Built-in IPAM&lt;/strong&gt;: NetworkPool / IPAllocation CRDs with best-fit allocation and elastic LB pools&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;GitOps export&lt;/strong&gt;: one-command migration from imperative Helm installs to declarative Flux or ArgoCD manifests&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What's Next
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Importing existing clusters&lt;/strong&gt;: bring clusters that were provisioned outside Butler under management without rebuilding them, so existing fleets can be brought onto the platform incrementally&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Standalone (non-hosted) control plane support&lt;/strong&gt;: an option to provision tenant clusters with full control planes deployed on dedicated nodes, for the teams that want it. Hosted control planes via Steward remain the default; standalone is for the cases where the resource-efficiency tradeoff goes the other way.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Cluster templates&lt;/strong&gt;: reusable cluster configurations via ClusterClass for standardized provisioning&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tenant cluster support on AWS, GCP, Azure&lt;/strong&gt;: management cluster bootstrap is stable today; tenant provisioning is the next thing in flight&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Backup and restore&lt;/strong&gt;: scheduled cluster backups with point-in-time recovery&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We're building in public. The code is on GitHub now, not "coming soon." You can watch our progress, read our commits, and see our TODOs. Every repository is public and accepting issues.&lt;/p&gt;

&lt;h2&gt;
  
  
  Get Involved
&lt;/h2&gt;

&lt;p&gt;All Butler repositories are open source under Apache 2.0. Here's how to get started:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Try it.&lt;/strong&gt; Follow the &lt;a href="https://docs.butlerlabs.dev/getting-started" rel="noopener noreferrer"&gt;getting started guide&lt;/a&gt; to bootstrap a management cluster and provision your first tenant cluster.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Read the code.&lt;/strong&gt; Every component is public on GitHub: butler-controller, butler-server, butler-console, butler-cli, butler-charts, steward, steward-trustd, and the rest.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;File issues.&lt;/strong&gt; Found a bug? Have a feature request? Open an issue on the relevant repo.&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Join the discussion.&lt;/strong&gt; Architecture questions, use case discussions, and feedback on our Discord server or GitHub Discussions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Butler Labs:&lt;/strong&gt; &lt;a href="https://butlerlabs.dev" rel="noopener noreferrer"&gt;https://butlerlabs.dev&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/butlerdotdev" rel="noopener noreferrer"&gt;https://github.com/butlerdotdev&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Steward:&lt;/strong&gt; &lt;a href="https://github.com/butlerdotdev/steward" rel="noopener noreferrer"&gt;https://github.com/butlerdotdev/steward&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;steward-trustd:&lt;/strong&gt; &lt;a href="https://github.com/butlerdotdev/steward-trustd" rel="noopener noreferrer"&gt;https://github.com/butlerdotdev/steward-trustd&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://docs.butlerlabs.dev" rel="noopener noreferrer"&gt;https://docs.butlerlabs.dev&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Discord:&lt;/strong&gt; &lt;a href="https://discord.gg/cAzWG9qz3K" rel="noopener noreferrer"&gt;https://discord.gg/cAzWG9qz3K&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you want to keep up with the architecture pieces as they ship: the TenantCluster reconciler internals, CAPI provider integration patterns, addon lifecycle, and the console experience are all on the &lt;a href="https://butlerlabs.dev/blog" rel="noopener noreferrer"&gt;butlerlabs.dev/blog&lt;/a&gt; feed.&lt;/p&gt;

</description>
      <category>platformengineering</category>
      <category>opensource</category>
      <category>kubernetes</category>
      <category>cloudnative</category>
    </item>
  </channel>
</rss>
