DEV Community

Cover image for Why Opinionated Beats Flexible: The Factory Model for Kubernetes
Jozef Polcik
Jozef Polcik

Posted on

Why Opinionated Beats Flexible: The Factory Model for Kubernetes

Every DevOps engineer loves options.

Ingress Controller? "Let's evaluate Nginx, Traefik, HAProxy, Contour, and the AWS one."
Service Mesh? "Istio vs Linkerd vs Cilium."
Secrets? "Vault vs AWS Secrets Manager vs SOPS vs Sealed Secrets."
IaC? "Terraform vs Pulumi vs CDK vs Crossplane."

We call this "flexibility." But let's be honest about what it actually produces: drift.

After 15 years in DevOps, I've watched the same pattern repeat across dozens of companies:

  1. A small team (1-5 DevOps engineers) starts building an "Internal Developer Platform"
  2. They spend 12-18 months stitching together tools
  3. Every engineer picks their favorite for each layer
  4. By the time it's "done," half the choices are outdated
  5. Nobody dares touch it because nobody understands the full picture
  6. The original engineer leaves
  7. The new hire starts over

This is not platform engineering. This is infrastructure archaeology.

The Factory Model

I started thinking about this differently when I looked at how actual factories work.

A car factory doesn't ask: "What kind of welding robot do you prefer?" It has one welding robot. It's tested. It's maintained. It produces consistent output.

The factory's value isn't in choice. It's in consistency and speed.

So I applied this to Kubernetes infrastructure:

The Rules

1. Null-Choice Architecture

You don't pick your ingress controller. You don't pick your GitOps tool. You don't pick your policy engine.

The stack is:

  • ALB via AWS Load Balancer Controller (not Nginx)
  • ArgoCD for GitOps (not Flux)
  • Kyverno for policy (not OPA/Gatekeeper)
  • Karpenter for autoscaling (not Managed Node Groups)
  • External Secrets + AWS Secrets Manager (not Vault)
  • OpenTofu for IaC (not Terraform, due to licensing)

"But what if I want Nginx?"

Then this approach isn't for you. And that's okay. The constraint is the feature.

2. Glass Box, Not Black Box

This isn't a PaaS. You're not renting a platform from someone.

You own the Git repo. You own the OpenTofu modules. You own the Helm charts. You own the AWS account. Everything is visible, auditable, inspectable.

The difference from a PaaS:

  • Black Box (Heroku, Vercel): "Trust us, it works." → Until it doesn't, and you can't debug it.
  • Glass Box: "Here's exactly how it works. Don't modify it, but you can see everything."

If the vendor disappears tomorrow, your infrastructure keeps running.

3. Layered Architecture

The entire stack is split into 4 independent layers:

Layer 0 — Bootstrap
  S3 state bucket, DynamoDB lock, IAM trust role
  Run once. Never touch again.

Layer 1 — Network
  VPC, Route53, ACM certificates
  Persistent. Survives cluster replacement.

Layer 2 — Data
  Aurora Serverless v2, ECR, S3, EFS
  Persistent. Survives cluster replacement.

Layer 3 — Cluster
  EKS, Karpenter, ArgoCD, Kyverno, addons
  REPLACEABLE. Destroy and rebuild without data loss.
Enter fullscreen mode Exit fullscreen mode

The key insight: Layer 3 is disposable.

Instead of painful in-place EKS upgrades, you destroy the cluster layer and rebuild it. Your databases, DNS, and certificates stay untouched. ArgoCD re-syncs all workloads automatically.

Layers are connected by data sources and tags — no terraform_remote_state, no hardcoded IDs.

4. Kyverno is Law

The factory has rules. If your deployment doesn't follow them, it gets rejected:

# No deployment without resource limits
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-limits
spec:
  validationFailureAction: Enforce
  rules:
  - name: check-limits
    match:
      any:
      - resources:
          kinds:
          - Pod
    validate:
      message: "CPU and memory limits are required."
      pattern:
        spec:
          containers:
          - resources:
              limits:
                memory: "?*"
                cpu: "?*"
Enter fullscreen mode Exit fullscreen mode

No limits? Rejected. No liveness probe? Rejected. Image from untrusted registry? Rejected.

This isn't optional. This is how the factory works. Developers learn the rules once and never think about them again.

The Math

A typical small company spends:

Item Monthly Cost
2 DevOps engineers (EU avg) ~€15,500
They spend ~60% on maintenance ~€9,300 wasted
A pre-built factory subscription ~€990

You're not replacing your engineers. You're freeing them to work on product instead of plumbing.

When This Doesn't Work

Let me be clear about the limitations:

  • Multi-cloud: If you need Azure or GCP, this is AWS-only.
  • Custom ingress: If you need Nginx or Traefik, the factory uses ALB.
  • Self-managed Prometheus: The factory uses CloudWatch Container Insights.
  • 24/7 on-call: The factory vendor provides business-hours support only.
  • Massive scale: If you're running 500+ nodes, you probably need a dedicated platform team anyway.

The Mindset Shift

The hardest part isn't the technology. It's accepting that less choice is more speed.

Every configuration option you add is a configuration option that can drift. Every alternative you support is an alternative you have to test, document, and maintain.

Constraints are not limitations. Constraints are what make the factory predictable.


If this resonates, I built a productized version of this approach called Paved Stack. It's a pre-built EKS platform you deploy into your own AWS account.

Details here → pavedstack.com

Top comments (0)