DEV Community

Cover image for The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally.
Noah Makau
Noah Makau

Posted on

The Disk-Pressure Incident That Taught Me to Always Set LimitRanges and Other Lessons from Mirroring EKS Locally.

Part 6 of 7 — The Mac Kubernetes Lab: A Production-Mirror Setup from Scratch.

Previously in Part 5: We installed Istio with revision-based upgrades, MetalLB for LoadBalancer IPs, and practised traffic management with Gateways, VirtualServices, and fault injection. The cluster behaves. Now we wire up the last three pieces that turn it from “a working local cluster” into “a real mirror of our production EKS.”


The cluster works. Istio is running. MetalLB is handing out IPs. But it’s still missing three layers that make the production parity actually meaningful:

  • Vault Kubernetes auth — pods authenticate to Vault with their service account tokens, the same way they do in production. No hardcoded secrets, no static credentials.
  • Crossplane with the AWS provider — infrastructure compositions you can develop and test locally before they touch real AWS resources, or any other thought of OpenStack?
  • LimitRanges — default resource requests on every namespace. This one comes from a real incident I want to talk about.

The LimitRange story is the most important of the three, so I’ll tell it properly when we get there. First, the auth layer.


Vault Kubernetes auth.

Vault’s Kubernetes auth method lets pods authenticate by presenting their service account JWT. Vault validates the token against the Kubernetes API server and exchanges it for a Vault token with the appropriate policies attached.

On the production EKS clusters at work, this is how microservices retrieve database credentials, API keys, and TLS certificates: no hard-coded secrets, no secret sprawl, every issuance audit-logged in Vault.

Setting it up locally means I can test the full injection workflow without a VPN, and debug failures on a cluster where the stakes are zero.

Vault Kubernetes auth flow — pod presents service account token, Vault validates, pod receives a Vault token.<br>

Installing the Vault agent injector.

We deploy just the Vault agent injector in the lab cluster. It points to the external Vault VM rather than running its own Vault server:

# 💻 Mac
kubectx lab-cluster

helm repo add hashicorp https://helm.releases.hashicorp.com
helm repo update

# Get the vault VM IP - does not persist across sessions
export VAULT_IP=$(orb run -m vault hostname -I | awk '{print $1}')
echo "VAULT_IP=$VAULT_IP"
helm install vault hashicorp/vault \
  --namespace vault --create-namespace \
  --set "injector.externalVaultAddr=http://$VAULT_IP:8200"

kubectl get pods -n vault
# vault-agent-injector-xxx   1/1   Running   0   30s
Enter fullscreen mode Exit fullscreen mode

Configuring K8s auth on Vault.

Run this on the vault VM, pointing Vault at the lab cluster’s API server:

# 🖥️ VM: vault

# Re-export - always required, doesn't persist across sessions
export VAULT_ADDR='http://127.0.0.1:8200'
export VAULT_ROOT_TOKEN=$(grep 'Initial Root Token' ~/vault-init.txt | awk '{print $NF}')

# If Vault is sealed after a reboot:
# vault operator unseal $(grep 'Unseal Key 1' ~/vault-init.txt | awk '{print $NF}')
vault login $VAULT_ROOT_TOKEN

# Get CP_IP from the Mac terminal: orb run -m cp01 hostname -I | awk '{print $1}'
export CP_IP=<cp01-ip>

# Regenerate the CA cert if /tmp was cleared after reboot
vault read -field=certificate pki_k8s/issuer/default > /tmp/lab-ca.crt

# Enable Kubernetes auth (safe to re-run - ignores "already enabled")
vault auth enable -path=lab-k8s kubernetes 2>/dev/null || echo "already enabled"

# Configure - point Vault at the lab cluster API server
vault write auth/lab-k8s/config \
  kubernetes_host="https://$CP_IP:6443" \
  kubernetes_ca_cert=@/tmp/lab-ca.crt

vault read auth/lab-k8s/config
Enter fullscreen mode Exit fullscreen mode

Testing K8s auth.

Create a simple role and test it from a pod:

# 🖥️ VM: vault

# Create a policy
vault policy write read-secrets - <<EOF
path "secret/data/myapp/*" {
  capabilities = ["read"]
}
EOF
# Create a K8s auth role
vault write auth/lab-k8s/role/myapp \
  bound_service_account_names=myapp \
  bound_service_account_namespaces=default \
  policies=read-secrets \
  ttl=1h

# Write a test secret
vault secrets enable -path=secret kv-v2 2>/dev/null || true
vault kv put secret/myapp/config db_password="supersecret"
Enter fullscreen mode Exit fullscreen mode
# 💻 Mac — deploy a pod with Vault annotations
kubectl apply -f - <<EOF
apiVersion: v1
kind: ServiceAccount
metadata:
  name: myapp
  namespace: default
---
apiVersion: v1
kind: Pod
metadata:
  name: vault-test
  namespace: default
  annotations:
    vault.hashicorp.com/agent-inject: "true"
    vault.hashicorp.com/role: "myapp"
    vault.hashicorp.com/agent-inject-secret-config: "secret/data/myapp/config"
spec:
  serviceAccountName: myapp
  containers:
  - name: app
    image: busybox
    command: ["sleep", "3600"]
EOF

# Check the secret was injected
kubectl exec vault-test -c app -- cat /vault/secrets/config
# db_password: supersecret
Enter fullscreen mode Exit fullscreen mode

If that last line returns the password, the whole chain works: service account JWT → Vault validation → Vault token → secret retrieval → file injection. Every link of the chain is what a real production app does.


Crossplane.

Crossplane turns a Kubernetes cluster into a universal control plane for cloud infrastructure. Instead of Terraform modules or CloudFormation stacks, you define infrastructure as Kubernetes custom resources, and Crossplane reconciles them continuously.
I use it at work to provision AWS resources (EKS node groups, RDS, S3 buckets, IAM roles) and VMware Cloud Director resources through a custom provider. The lab version mirrors the AWS side of that.
Crossplane composites — an XR triggers a Composition that creates multiple Managed Resources across cloud providers.

Installation:

# 💻 Mac
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update

# Composition Functions are enabled by default in recent versions.
# The --enable-composition-functions flag was removed.
helm install crossplane crossplane-stable/crossplane \
  --namespace crossplane-system --create-namespace

kubectl get pods -n crossplane-system -w
# NAME                                       READY   STATUS    AGE
# crossplane-xxx                             1/1     Running   60s
# crossplane-rbac-manager-xxx               1/1     Running   60s
Enter fullscreen mode Exit fullscreen mode

Installing the AWS provider

💻 Mac

kubectl apply -f - <<EOF
apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
name: provider-aws-ec2
spec:
package: xpkg.upbound.io/upbound/provider-aws-ec2:latest
EOF

kubectl get pkg

NAME INSTALLED HEALTHY PACKAGE AGE

provider-aws-ec2 True True xpkg.upbound.io/upbound/provider-... 60s

A minimal composition:

A bare-minimum ProviderConfig enough to verify the install is working:

# 💻 Mac
kubectl apply -f - <<EOF
apiVersion: aws.upbound.io/v1beta1
kind: ProviderConfig
metadata:
  name: default
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: aws-creds
      key: creds
EOF
Enter fullscreen mode Exit fullscreen mode

In a real setup, you create a IRSA ( IAM Role for Service Account) to authenticate and give the provider permission to create and monitor resources. For local validation, the provider installs, and the compositions can be validated structurally without ever calling AWS.

The LimitRange story.

This is the one that came from a real incident at work.

We had repeated disk-pressure events in our production EKS cluster. Pods with no resource requests had crept into a few namespaces — someone deployed a YAML that omitted resources: entirely, and nobody caught it in review. The Kubernetes scheduler had no signal about their consumption, so nodes ended up overcommitted. Then ephemeral storage filled up, eviction kicked in, and a couple of unrelated pods went down with it. Total downtime measured in tens of minutes. Cause-and-effect chain that took a while to untangle.

The fix is one of the most boring features in Kubernetes: LimitRanges. They set default resource requests and limits at the namespace level. Any container that doesn’t specify its own requests gets the defaults applied automatically by the admission controller. The scheduler always has a signal. Overcommit becomes a deliberate choice, not an accident.

# 💻 Mac
kubectl apply -f - <<EOF
apiVersion: v1
kind: LimitRange
metadata:
  name: default-limits
  namespace: default
spec:
  limits:
  - default:
      memory: 512Mi
      cpu: 500m
    defaultRequest:
      memory: 128Mi
      cpu: 100m
    max:
      ephemeral-storage: 2Gi
    type: Container
EOF
Enter fullscreen mode Exit fullscreen mode

Apply this to every namespace that hosts workloads. In production, I now apply it as a post-provisioning step on every new namespace:

# 💻 Mac — apply to multiple namespaces
for ns in default vault crossplane-system istio-system; do
  kubectl apply -f limitrange.yaml -n $ns
done
Enter fullscreen mode Exit fullscreen mode

The ephemeral-storage max is the part that specifically addresses the disk-pressure failure mode — it bounds how much scratch space a container can consume, which is what spirals when ephemeral storage runs unbounded.

Verifying the complete EKS mirror.

Let’s confirm the whole stack is up:

# 💻 Mac
kubectl get nodes -o wide
# NAME       STATUS   ROLES           VERSION
# cp01       Ready    control-plane   v1.34.x
# worker01   Ready    <none>          v1.34.x
# worker02   Ready    <none>          v1.34.x

kubectl get pods -A
# Cilium/Calico, CoreDNS, istiod-1-26, MetalLB, Crossplane, Vault injector - all Running
Enter fullscreen mode Exit fullscreen mode

Full EKS mirror stack running — all components healthy.<br>


Local vs production — what’s the same and what differs:

Local vs production — what’s the same and what differs:
The only meaningful differences are the CNI (because of OrbStack’s VM capabilities, as we covered in Part 4) and the LoadBalancer implementation. Everything else is identical in configuration. The mental model from this lab transfers directly to the production cluster, and vice versa.


In the final article: How to stop and start the lab without losing state, the CKS exam scenarios this cluster was purpose-built for, and the shell aliases that make the whole thing pleasant to live with.

← Part 5: How I Practise Istio Upgrades Locally Before Touching Production EKS | Part 7: The Day 2 Reality of Running a Kubernetes Lab on Your Mac: Stop/Start, CKS Scenarios, and What I Learned Building It


I’m Noah Makau, a DevSecOps engineer based in Nairobi. I run a small DevOps consultancy and hold CKA, CKAD, and the AWS Solutions Architect Professional certifications , currently preparing for CKS. I write about Kubernetes, Vault, Crossplane, and the day-to-day of running platforms that actually have to stay up.
originally published at blog.arkilasystems.com

Top comments (0)