DEV Community

Aisalkyn Aidarova
Aisalkyn Aidarova

Posted on

final work for batch #2: ENTERPRISE DEVOPS GOVERNANCE LAB

enterprise-grade DevOps strategy, not tools.
our focus is:

  • Shared pipelines
  • Version control strategy
  • Safe rollouts
  • Image security governance
  • Multi-service blue-green
  • Centralized base images
  • Distributed tracing
  • Rollback strategy
  • Enterprise routing design
  • Upgrade enforcement

So this is a FULL ENTERPRISE DEVOPS LAB

This lab simulates a real enterprise with:

  • 10 microservices
  • Shared CI/CD templates
  • Centralized base image repo
  • Blue-Green deployment
  • Image vulnerability enforcement
  • Versioned Terraform modules
  • ArgoCD GitOps
  • Distributed tracing
  • Rollback strategy

🏗️ Architecture Overview

Client
→ Load Balancer
→ Ingress
→ 10 Microservices
→ Shared Base Image
→ Centralized CI Templates
→ ArgoCD
→ EKS Cluster

Monitoring:

  • Prometheus
  • Grafana
  • Jaeger (Tracing)
  • ELK (Logging)

PART 1 — CENTRALIZED BASE IMAGE STRATEGY

Step 1: Create Base Image Repository

Repo: enterprise-base-images

Dockerfile:

FROM amazonlinux:2023

RUN yum update -y && \
    yum install -y python3

LABEL maintainer="platform-team"
Enter fullscreen mode Exit fullscreen mode

Push to ECR:

docker build -t enterprise/python-base:1.0 .
docker push <ecr>/enterprise/python-base:1.0
Enter fullscreen mode Exit fullscreen mode

Step 2: Application Repo Uses Approved Base

Each microservice repo:

FROM <ecr>/enterprise/python-base:1.0

COPY app.py /app/
CMD ["python3", "/app/app.py"]
Enter fullscreen mode Exit fullscreen mode

Now base image is centralized and controlled.


PART 2 — SHARED CI/CD TEMPLATE

Create repo: ci-templates

ci-template.yml

stages:
  - build
  - scan
  - deploy

build:
  script:
    - docker build -t $IMAGE_TAG .
    - docker push $IMAGE_TAG

scan:
  script:
    - trivy image $IMAGE_TAG

deploy:
  script:
    - git commit -am "update image"
Enter fullscreen mode Exit fullscreen mode

Each microservice repo references it.

Now shared logic = centralized.


PART 3 — VERSIONED MODULE STRATEGY

Terraform modules repo:

modules/
  vpc/
  eks/
  rds/
Enter fullscreen mode Exit fullscreen mode

Applications reference:

module "eks" {
  source  = "git::https://repo//modules/eks?ref=v1.2.0"
}
Enter fullscreen mode Exit fullscreen mode

Upgrade safely by bumping version.


PART 4 — BLUE GREEN DEPLOYMENT PER MICROSERVICE

For each service:

Blue Deployment:

metadata:
  labels:
    app: users
    version: blue
Enter fullscreen mode Exit fullscreen mode

Green Deployment:

metadata:
  labels:
    app: users
    version: green
Enter fullscreen mode Exit fullscreen mode

Service:

selector:
  app: users
  version: blue
Enter fullscreen mode Exit fullscreen mode

Switch:

selector:
  app: users
  version: green
Enter fullscreen mode Exit fullscreen mode

PART 5 — 100 SERVICES SCENARIO

Only 10 services modified.

Only those 10 get green deployments.

Routing remains unchanged for other 90.


PART 6 — IMAGE GOVERNANCE SCENARIO

Interview question:
“How do you know which services use bad base image?”

Answer implementation:

grep -r "python-base:1.0" .
Enter fullscreen mode Exit fullscreen mode

Cluster check:

kubectl get pods -A -o jsonpath="{..image}"
Enter fullscreen mode Exit fullscreen mode

Enforce via policy:

Use OPA / Kyverno:

deny:
  message: "Unapproved base image"
Enter fullscreen mode Exit fullscreen mode

PART 7 — DISTRIBUTED TRACING

Install Jaeger.

Each service:

trace_id = request.headers.get("X-Trace-ID")
Enter fullscreen mode Exit fullscreen mode

Logs include trace ID.

Search in ELK:

trace_id: 12345
Enter fullscreen mode Exit fullscreen mode

Jaeger UI shows full request graph.


PART 8 — PATCHING + ROLLBACK LAB

Before Linux upgrade:

aws ec2 create-snapshot
Enter fullscreen mode Exit fullscreen mode

If failure:

aws ec2 create-volume --snapshot-id snap-xxxx
Enter fullscreen mode Exit fullscreen mode

Or:

yum downgrade package-name
Enter fullscreen mode Exit fullscreen mode

PART 9 — SAFE SHARED TEMPLATE UPDATE

  1. Create new template version v2
  2. Test against sample apps (3 representative apps)
  3. Validate in staging
  4. Gradual rollout
  5. Monitor
  6. Rollback if needed

PART 10 — FORCED SECURITY UPGRADE SCENARIO

If base image vulnerable:

  1. Build new base image 1.1
  2. Update CI policy to block old tag
  3. Auto-create PRs to update Dockerfiles
  4. Argo deploy
  5. Monitor

PART 11 — ENTERPRISE ROUTING

Single Ingress:

rules:
  - path: /users
    backend: users-service
  - path: /orders
    backend: orders-service
Enter fullscreen mode Exit fullscreen mode

Blue-green switching happens at Service level, not Ingress.


WHY INTERVIEWER can ask

  • Do you understand governance?
  • Do you understand scale?
  • Do you understand versioning?
  • Do you understand impact isolation?
  • Do you understand rollout safety?
  • Do you understand enterprise control model?

Not just Kubernetes commands.

You now answer with:

  • Versioning strategy
  • Centralized control
  • Gradual rollout
  • Governance enforcement
  • Observability validation
  • Rollback plan

Top comments (0)