There's a blog post I've always found frustrating: the kind that shows you a perfect Dockerfile, a clean terraform apply, and a screenshot of everything working on the first try. No errors. No wrong turns.
This isn't that post.
I'm a DevOps and Cloud Engineer — in practice, I do DevOps and SRE work: Kubernetes clusters, CI/CD pipelines, AWS infrastructure. I containerise things regularly. But I'd never sat down and worked through five different stacks back to back, treating each one as a distinct challenge.
So I did. Here's what I built, what broke, and what I learned.
The Setup
Five demo apps. Each represents a different monolith archetype. Each has a /health endpoint and one meaningful route — simple enough that the app isn't the distraction. AI helped with creating the apps so I could focus on the DevOps aspects of the project.
| App | Stack | What it teaches |
|---|---|---|
app-node-api |
Node.js + Express |
node_modules, .dockerignore discipline |
app-python-api |
Python + Flask | Slim vs Alpine tradeoffs |
app-nestjs-api |
NestJS (TypeScript) | Three-stage builds: deps → compile → runtime |
app-react-spa |
React + Nginx | Static asset serving, SPA routing |
app-go-service |
Go | Distroless images, static binaries |
The full repo: eks-monolith-migration — IaC, Dockerfiles, k8s manifests, CI/CD, all of it.
Phase 1: The Dockerfiles
App 1 — Node.js: The .dockerignore Wake-Up Call
Node.js is where most people make the biggest beginner mistake: not using .dockerignore, so Docker sends your entire node_modules as build context on every build.
My .dockerignore:
node_modules
npm-debug.log
.env
.git
My Dockerfile: two-stage. Install deps in stage one, copy only what runs in stage two.
FROM node:24-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
FROM node:24-alpine AS runtime
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY src/ ./src/
ENV NODE_ENV=production
USER node
EXPOSE 3001
CMD ["node", "src/index.js"]
USER node is easy to forget and almost never done in tutorials. Running as root inside a container means a container escape gives an attacker root on the host. One line fixes it.
Image size: 235MB vs 1.1GB naive.
App 2 — Python/Flask: The Alpine Trap
My first instinct: python:3.12-alpine. Alpine is tiny. Tiny equals good.
The problem: Alpine uses musl libc. Many Python packages with C extensions (numpy, psycopg2, cryptography) either have no Alpine-compatible wheels or compile from source — turning a 30-second build into a 5-minute build. Sometimes it just breaks.
Use python:3.12-slim instead. Debian-based, glibc, pre-compiled wheels.
FROM python:3.12-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir --prefix=/install -r requirements.txt
FROM python:3.12-slim AS runtime
WORKDIR /app
COPY --from=builder /install /usr/local
COPY app.py .
ENV PYTHONDONTWRITEBYTECODE=1 \
PYTHONUNBUFFERED=1
RUN useradd -m appuser && USER appuser
EXPOSE 3002
CMD ["gunicorn", "--bind", "0.0.0.0:3002", "app:app"]
Two things worth noting:
-
PYTHONUNBUFFERED=1— without this, stdout is buffered. Your logs go missing in Kubernetes until the buffer flushes. -
gunicorninstead of Flask's dev server — the dev server is single-threaded and literally prints "do not use in production" on startup. Take it at its word.
Image size: 186MB.
App 3 — NestJS: The Three-Stage Build
NestJS is TypeScript. TypeScript compiles to JavaScript. That means your build process is: install deps → compile → run. Three distinct stages.
The trap: carrying your TypeScript compiler, devDependencies, and .ts source files into the production image.
FROM node:24-alpine AS deps
WORKDIR /app
COPY package*.json ./
RUN npm ci
FROM node:24-alpine AS build
WORKDIR /app
COPY --from=deps /app/node_modules ./node_modules
COPY . .
RUN npm run build
FROM node:24-alpine AS runtime
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production
COPY --from=build /app/dist ./dist
ENV NODE_ENV=production
USER node
EXPOSE 3003
CMD ["node", "dist/main.js"]
The runtime stage does its own npm ci --only=production. It doesn't copy node_modules from the build stage — those include devDependencies. Fresh install, production only, then just the compiled dist/.
Your TypeScript source never touches the production image. Your test runner isn't there. Your type definitions aren't there.
Image size: 295MB.
App 4 — React + Nginx: The SPA Routing Gotcha
React apps are static files. After npm run build, you have an index.html and some JS bundles. You don't need Node at runtime — you need a web server.
Everyone knows this in theory. Fewer people get the Nginx config right.
The issue: React Router. If a user navigates directly to /about or refreshes on /dashboard, Nginx looks for a file at that path. There isn't one. 404.
The fix is one directive: try_files $uri $uri/ /index.html;
server {
listen 3004;
root /usr/share/nginx/html;
index index.html;
location / {
try_files $uri $uri/ /index.html;
}
}
FROM node:24-alpine AS build
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM nginx:alpine AS runtime
COPY --from=build /app/dist /usr/share/nginx/html
COPY nginx/nginx.conf /etc/nginx/conf.d/default.conf
EXPOSE 3004
Image size: 74.4MB. Smallest of the five — Nginx Alpine is tiny and we're just serving static files.
App 5 — Go: The Satisfying One
Go compiles to a single static binary. No runtime, no interpreter, no VM. Just a binary.
This means you can build in one container and copy the binary into a container that has almost nothing — gcr.io/distroless/static.
FROM golang:1.22-alpine AS builder
WORKDIR /app
COPY go.mod go.sum ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux go build -o server .
FROM gcr.io/distroless/static-debian12 AS runtime
WORKDIR /app
COPY --from=builder /app/server .
EXPOSE 3005
CMD ["/app/server"]
CGO_ENABLED=0 disables C bindings. GOOS=linux targets Linux explicitly. Together they ensure the binary is truly static and will run in distroless.
No shell in that container. No ls, no curl, no package manager. kubectl exec into it and try to run bash — nothing. This is the point. Near-zero attack surface.
Image size: 8.88MB.
The Before vs. After Table
| App | Naive | Optimized |
|---|---|---|
| Node.js API | ~1.1GB | 235MB |
| Python API | ~920MB | 186MB |
| NestJS API | ~1.3GB | 295MB |
| React SPA | ~400MB | 74.4MB |
| Go Service | ~800MB | 8.88MB |
The Go number is not a typo. I was surprised myself.
Phase 2: Infrastructure with Terraform
Three modules: vpc, eks, ecr.
The VPC creates public subnets for the load balancer, private subnets for worker nodes, and a NAT gateway so private nodes can pull images. Standard layout.
The EKS module provisions a managed node group (t3.medium — the minimum that comfortably runs EKS system pods alongside workloads) and enables OIDC, which is what makes IRSA work.
IRSA — IAM Roles for Service Accounts — is how you give pods AWS permissions without credentials anywhere. An IAM role attaches to a Kubernetes service account. Pods get temporary credentials via OIDC token exchange. No AWS_ACCESS_KEY_ID in your manifests.
The ECR module creates five repos with lifecycle policies:
policy = jsonencode({
rules = [{
rulePriority = 1
description = "Keep last 10 images"
selection = {
tagStatus = "any"
countType = "imageCountMoreThan"
countNumber = 10
}
action = { type = "expire" }
}]
})
Without lifecycle policies, ECR accumulates images indefinitely. Ten images covers comfortable rollback. More than that is sentiment, not operations.
What broke: The AWS Load Balancer Controller IAM policy. The controller needs a specific IAM policy to provision ALBs. If the IRSA annotation on the controller's service account doesn't match the IAM role ARN exactly, the controller runs but your Ingress resources never get an ADDRESS. I spent an hour on this.
Phase 3: Kubernetes Manifests
Two things I enforced on every deployment that most tutorials skip:
Resource requests and limits:
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "256Mi"
cpu: "500m"
Without requests, the scheduler can't place your pod intelligently. Without limits, a misbehaving pod starves everything else on the node.
Minimum two replicas via HPA:
spec:
minReplicas: 2
maxReplicas: 5
One replica is a single point of failure. A node rotation takes down your service. Two replicas means you survive a pod restart gracefully.
Phase 4: GitOps with ArgoCD
ArgoCD watches your Git repository for manifest changes and syncs them to the cluster. The cluster pulls its desired state from Git — Git is the source of truth. A rogue kubectl apply directly on the cluster? ArgoCD reverts it on the next sync cycle.
I used a single ApplicationSet instead of five separate Application resources:
apiVersion: argoproj.io/v1alpha1
kind: ApplicationSet
metadata:
name: monolith-apps
namespace: argocd
spec:
generators:
- list:
elements:
- app: app-node-api
- app: app-python-api
- app: app-nestjs-api
- app: app-react-spa
- app: app-go-service
template:
metadata:
name: '{{app}}'
spec:
source:
path: 'k8s/apps/{{app}}'
syncPolicy:
automated:
prune: true
selfHeal: true
One manifest. Five apps. Adding a sixth is one line in the elements list.
Seeing all five apps show Synced and Healthy simultaneously in the ArgoCD dashboard was genuinely satisfying.
Phase 5: GitHub Actions CI/CD
Two workflows:
ci-build-push.yml — triggers on push to main. Matrix build across all five apps, tags each image with the git commit SHA, pushes to ECR. Authenticates with AWS via OIDC — no static credentials stored anywhere.
cd-update-manifests.yml — runs after the build. Updates the image tag in k8s/apps/<app>/deployment.yaml, commits back to the repo. ArgoCD detects the drift and syncs within 30 seconds.
- name: Authenticate to AWS
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions-ecr
aws-region: us-east-1
The commit SHA as the image tag matters. latest tells you nothing about what's actually running. A commit SHA is immutable and traceable — you can answer "what code is in production?" with a single kubectl get deployment.
What I'd Do Differently
Separate the manifests repo. When app code and k8s manifests live together, the CI commit that updates image tags triggers another CI run on the same repo. With branch filtering, you avoid a loop, but a dedicated *-k8s repo is cleaner.
Secrets management from the start. I used hardcoded values for the demo. Retrofitting the External Secrets Operator + AWS Secrets Manager later is painful. Wire it up day one.
Distroless everywhere, not just Go. Distroless images exist for Node.js and Python, too. I used Alpine variants for easier debugging — in production, I'd push toward distroless across the board.
Takeaways
Five stacks. Five Dockerfiles. One EKS cluster. One ApplicationSet. One pipeline.
The image optimisation alone — from multi-gigabyte naive images to sub-200MB across the board — is something concrete you can demonstrate. The Go service at 12MB in a distroless container with near-zero attack surface is something worth building just to see it work.
The deeper lesson is GitOps. Self-healing deployments, Git as the source of truth, no manual kubectl apply in production — these are what make Kubernetes manageable at scale, not just powerful on a laptop.
Full repo: github.com/poppyszn/eks-monolith-migration
Questions on any specific part — the OIDC setup, the ApplicationSet pattern, the ALB Controller IAM issue, the Nginx SPA config — drop them in the comments.
Top comments (0)