Every DevOps engineer loves options.
Ingress Controller? "Let's evaluate Nginx, Traefik, HAProxy, Contour, and the AWS one."
Service Mesh? "Istio vs Linkerd vs Cilium."
Secrets? "Vault vs AWS Secrets Manager vs SOPS vs Sealed Secrets."
IaC? "Terraform vs Pulumi vs CDK vs Crossplane."
We call this "flexibility." But let's be honest about what it actually produces: drift.
After 15 years in DevOps, I've watched the same pattern repeat across dozens of companies:
- A small team (1-5 DevOps engineers) starts building an "Internal Developer Platform"
- They spend 12-18 months stitching together tools
- Every engineer picks their favorite for each layer
- By the time it's "done," half the choices are outdated
- Nobody dares touch it because nobody understands the full picture
- The original engineer leaves
- The new hire starts over
This is not platform engineering. This is infrastructure archaeology.
The Factory Model
I started thinking about this differently when I looked at how actual factories work.
A car factory doesn't ask: "What kind of welding robot do you prefer?" It has one welding robot. It's tested. It's maintained. It produces consistent output.
The factory's value isn't in choice. It's in consistency and speed.
So I applied this to Kubernetes infrastructure:
The Rules
1. Null-Choice Architecture
You don't pick your ingress controller. You don't pick your GitOps tool. You don't pick your policy engine.
The stack is:
- ALB via AWS Load Balancer Controller (not Nginx)
- ArgoCD for GitOps (not Flux)
- Kyverno for policy (not OPA/Gatekeeper)
- Karpenter for autoscaling (not Managed Node Groups)
- External Secrets + AWS Secrets Manager (not Vault)
- OpenTofu for IaC (not Terraform, due to licensing)
"But what if I want Nginx?"
Then this approach isn't for you. And that's okay. The constraint is the feature.
2. Glass Box, Not Black Box
This isn't a PaaS. You're not renting a platform from someone.
You own the Git repo. You own the OpenTofu modules. You own the Helm charts. You own the AWS account. Everything is visible, auditable, inspectable.
The difference from a PaaS:
- Black Box (Heroku, Vercel): "Trust us, it works." → Until it doesn't, and you can't debug it.
- Glass Box: "Here's exactly how it works. Don't modify it, but you can see everything."
If the vendor disappears tomorrow, your infrastructure keeps running.
3. Layered Architecture
The entire stack is split into 4 independent layers:
Layer 0 — Bootstrap
S3 state bucket, DynamoDB lock, IAM trust role
Run once. Never touch again.
Layer 1 — Network
VPC, Route53, ACM certificates
Persistent. Survives cluster replacement.
Layer 2 — Data
Aurora Serverless v2, ECR, S3, EFS
Persistent. Survives cluster replacement.
Layer 3 — Cluster
EKS, Karpenter, ArgoCD, Kyverno, addons
REPLACEABLE. Destroy and rebuild without data loss.
The key insight: Layer 3 is disposable.
Instead of painful in-place EKS upgrades, you destroy the cluster layer and rebuild it. Your databases, DNS, and certificates stay untouched. ArgoCD re-syncs all workloads automatically.
Layers are connected by data sources and tags — no terraform_remote_state, no hardcoded IDs.
4. Kyverno is Law
The factory has rules. If your deployment doesn't follow them, it gets rejected:
# No deployment without resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-limits
spec:
validationFailureAction: Enforce
rules:
- name: check-limits
match:
any:
- resources:
kinds:
- Pod
validate:
message: "CPU and memory limits are required."
pattern:
spec:
containers:
- resources:
limits:
memory: "?*"
cpu: "?*"
No limits? Rejected. No liveness probe? Rejected. Image from untrusted registry? Rejected.
This isn't optional. This is how the factory works. Developers learn the rules once and never think about them again.
The Math
A typical small company spends:
| Item | Monthly Cost |
|---|---|
| 2 DevOps engineers (EU avg) | ~€15,500 |
| They spend ~60% on maintenance | ~€9,300 wasted |
| A pre-built factory subscription | ~€990 |
You're not replacing your engineers. You're freeing them to work on product instead of plumbing.
When This Doesn't Work
Let me be clear about the limitations:
- Multi-cloud: If you need Azure or GCP, this is AWS-only.
- Custom ingress: If you need Nginx or Traefik, the factory uses ALB.
- Self-managed Prometheus: The factory uses CloudWatch Container Insights.
- 24/7 on-call: The factory vendor provides business-hours support only.
- Massive scale: If you're running 500+ nodes, you probably need a dedicated platform team anyway.
The Mindset Shift
The hardest part isn't the technology. It's accepting that less choice is more speed.
Every configuration option you add is a configuration option that can drift. Every alternative you support is an alternative you have to test, document, and maintain.
Constraints are not limitations. Constraints are what make the factory predictable.
If this resonates, I built a productized version of this approach called Paved Stack. It's a pre-built EKS platform you deploy into your own AWS account.
Details here → pavedstack.com
Top comments (0)