S, Sanjay

Posted on Mar 24

Hackers Tried to Breach My Pipeline at 3 AM — A DevSecOps Survival Guide 🛡️

#kubernetes #security #devsecops #devops

🎬 The Slack Message Nobody Wants to See

#security-incidents — Today at 4:47 AM
🚨 @channel CRITICAL SECURITY INCIDENT
Defender for Cloud detected cryptomining activity on aks-prod-eastus.
Pod 'web-proxy-7f8d9' in namespace 'default' is communicating with
known C2 server at 185.x.x.x. Containment in progress.

Welcome to DevSecOps — where we learn to catch attackers before they find your credit card processing system, steal your customer database, or turn your cluster into a Bitcoin mining farm.

This isn't theoretical. Every incident in this blog is based on real events. Let's make sure they don't happen to you.

🔄 Shift-Left: Moving Security From "Their Problem" to "Our Problem"

Traditional security is a gate at the end — code is done, someone from security reviews it, finds 47 issues, sends it back. The developer who wrote it three weeks ago barely remembers the context. Everything is late.

DevSecOps shifts security left — into every stage of the pipeline:

Traditional:
  Code → Build → Test → ████ SECURITY GATE ████ → Deploy → 😱
                         (3-week bottleneck)

DevSecOps:
  🔒IDE     🔒PreCommit  🔒PR Gate   🔒Build    🔒Deploy   🔒Runtime
  Secret    SAST         Full SAST   Container  Admission  WAF
  detection lint         SCA scan    image      control    Runtime
  in editor              Dependency  scanning   Image      protection
                         audit       SBOM       signing

The mindset shift: Security findings are bugs. Bugs have SLAs:

Severity	SLA	Example
Critical	Fix within 24 hours	Known exploited CVE, leaked production secret
High	Fix within 7 days	SQL injection, missing auth check
Medium	Fix within 30 days	Missing HTTPS redirect, verbose error messages
Low	Fix within 90 days	Minor info disclosure, missing security headers

🔗 Supply Chain Security: The Attack You Don't See Coming

The Scariest Attacks in DevOps

These aren't hypothetical — they happened:

📦 SolarWinds (2020): Attackers compromised the BUILD SYSTEM.  
   Backdoored code was part of the signed, legitimate update.
   18,000 organizations affected.

📦 Codecov (2021): Attackers modified a bash uploader script.
   CI/CD pipelines sent environment variables (including secrets)
   to attacker's server.

📦 ua-parser-js (2021): Maintainer's npm account was compromised.
   Malicious version published to npm. Installed cryptominer
   and password stealer. 7M+ weekly downloads affected.

📦 Log4Shell (2021): CVE in Log4j library. Remote code execution
   via a LOG MESSAGE. If your app logged user input (almost all do)
   → instant remote access.

Your Supply Chain: Attack Vectors & Defenses

        Source Code        Build Process       Dependencies
            │                  │                    │
            ▼                  ▼                    ▼
        Attack:            Attack:              Attack:
        Unauthorized       Tampered build       Malicious package
        code change        Compromised runner   Typosquatting
                                                Dependency confusion
            │                  │                    │
            ▼                  ▼                    ▼
        Defense:           Defense:              Defense:
        Signed commits     Ephemeral runners    Lock files (always)
        Branch protection  Reproducible builds  Dependabot / Snyk
        PR reviews         Provenance           Private registry
        CODEOWNERS         attestation          Version pinning

🚨 Real-World Disaster #1: The Dependency Confusion

What Happened: A company had an internal npm package called @company/auth-utils hosted on their private registry. An attacker published auth-utils (without the scope) on the public npm registry with version 99.0.0.

When the CI pipeline ran npm install, npm's resolution logic found the public package with a higher version number and installed the attacker's package instead of the internal one. The malicious package exfiltrated all environment variables (including secrets) during the postinstall script.

The Fix:

# 1. Always use scoped packages with registry mapping
echo "@company:registry=https://company.pkgs.dev.azure.com/_packaging/feed/npm/registry/" > .npmrc

# 2. Use npm audit and lockfile-lint
npx lockfile-lint --path package-lock.json --type npm \
  --allowed-hosts npm company.pkgs.dev.azure.com

# 3. Enable upstream source restrictions in Azure Artifacts
# Only allow specific public packages, not everything

🗝️ Secrets Management: The Tier System

Tier 1: Eliminate Secrets Entirely (Best Option)

App → Azure Resource? Use MANAGED IDENTITY
  "Hey Azure, I'm this VM. Give me access to that SQL database."
  "OK, you're registered. Here's a short-lived token."
  → No password stored anywhere. Ever.

K8s Pod → Azure Resource? Use WORKLOAD IDENTITY
  "Hey Azure, I'm this Kubernetes service account."
  "OK, your identity is federated. Here's a token."  
  → No secret in the pod. No secret in Key Vault. Nothing to rotate.

CI/CD → Azure? Use OIDC FEDERATION
  "Hey Azure, I'm this GitHub Actions workflow."
  "OK, your repo and branch are verified. Here's a token."
  → No client secret. Token lives for minutes.

Tier 2: Centralized Vault (When Secrets Are Unavoidable)

Sometimes you NEED a secret (third-party API key, legacy system password). In that case:

Azure Key Vault Configuration (non-negotiable settings):
  ✅ Soft delete:       Enabled (30 day retention)
  ✅ Purge protection:  Enabled (can't permanently delete)
  ✅ Network access:    Private Endpoint ONLY (no public)
  ✅ Access model:      RBAC (not access policies)
  ✅ Diagnostics:       All logs → Log Analytics
  ✅ Rotation:          Automated where possible

Tier 3: Kubernetes Secrets (Acceptable With Encryption)

# Better: Secrets Store CSI Driver (mounts Key Vault secrets as files)
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
  name: azure-kv-secrets
spec:
  provider: azure
  parameters:
    keyvaultName: "kv-prod-eastus"
    objects: |
      array:
        - |
          objectName: db-connection-string
          objectType: secret
    tenantId: "xxx"
  secretObjects:        # Also sync to K8s secret (if needed by app)
    - secretName: db-secret
      type: Opaque
      data:
        - objectName: db-connection-string
          key: connectionString

🚨 Real-World Disaster #2: The Git Commit That Leaked Production Credentials

The Git Log:

commit a1b2c3d
Author: dev@company.com
Message: "add database config"

+DATABASE_URL=postgresql://admin:SuperSecretP@ssw0rd!@prod-db.postgres.database.azure.com:5432/payments

Timeline:

Developer commits connection string with password to Git
Code review misses it (reviewer focused on logic, not config)
PR merged to main
6 months later, company enables GitHub's public visibility for the repo (for open-sourcing)
Bot scrapes public GitHub repos for credentials → finds the password
Database compromised within 4 hours

The Fix (Multiple Layers):

# Prevention Layer 1: Pre-commit hooks
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/gitleaks/gitleaks
    rev: v8.18.0
    hooks:
      - id: gitleaks

# Prevention Layer 2: GitHub Secret Scanning (free!)
# Settings → Code security → Secret scanning → Enable

# Prevention Layer 3: Pipeline check
- name: Scan for secrets
  run: |
    gitleaks detect --source . --verbose
    if [ $? -ne 0 ]; then
      echo "🚨 Secrets detected in code! Fix before merging."
      exit 1
    fi

If it's already committed: Rotating the secret is NOT enough. You must:

Rotate the secret immediately (change the password)
Revoke the old secret (disable old connection string)
Audit access logs (did anyone use the leaked credential?)
Rewrite Git history (the commit is forever in history otherwise)

🐳 Container Security: What Lurks Inside Your Images

The Container Image is Just a Filesystem

Your "secure application" runs on top of an OS image that might contain hundreds of known vulnerabilities:

$ trivy image myapp:latest

myapp:latest (debian 12.4)
═══════════════════════════════════════
Total: 142 (CRITICAL: 3, HIGH: 28, MEDIUM: 67, LOW: 44)

┌──────────────┬──────────────────┬──────────┬────────────────────┐
│ Library      │ Vulnerability    │ Severity │ Fixed Version      │
├──────────────┼──────────────────┼──────────┼────────────────────┤
│ libssl3      │ CVE-2024-XXXX    │ CRITICAL │ 3.0.13-1           │
│ libcurl4     │ CVE-2024-YYYY    │ CRITICAL │ 7.88.1-10+deb12u5  │
│ zlib1g       │ CVE-2024-ZZZZ    │ HIGH     │ 1.2.13+dfsg-1      │
└──────────────┴──────────────────┴──────────┴────────────────────┘

The Container Security Checklist

# 1. Use minimal base images
FROM node:20-alpine          # ✅ Alpine = ~5MB base
# NOT FROM node:20           # ❌ Full Debian = ~350MB + 200 CVEs

# 2. Don't run as root
RUN addgroup -S app && adduser -S app -G app
USER app                     # ✅ Run as non-root user

# 3. Multi-stage builds (don't ship build tools)
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM node:20-alpine AS runtime
COPY --from=builder /app/dist /app/dist
COPY --from=builder /app/node_modules /app/node_modules
USER 1000                    # Non-root
EXPOSE 8080
CMD ["node", "/app/dist/index.js"]

# 4. Pin versions and use digests in production
FROM node:20.11.1-alpine3.19@sha256:abc123...  # Immutable reference

🚨 Real-World Disaster #3: The Log4Shell Panic (And How Scanning Would Have Caught It)

December 9, 2021. The Log4Shell vulnerability (CVE-2021-44228) was publicly disclosed. CVSS score: 10.0 (maximum severity). Any Java application using Log4j 2.x that logged user input was vulnerable to Remote Code Execution.

The Panic Timeline:

Hour 0:  CVE published
Hour 2:  Exploit code on GitHub
Hour 6:  Mass scanning across the internet
Hour 12: "Is our app vulnerable?" "Uh... we don't know"
Hour 24: Still manually checking every service
Hour 48: "We THINK we found all instances..."
Hour 72: Third-party vendor says they were affected too

Teams WITH container scanning:

# Automated scan found it in 30 minutes
$ trivy image payment-service:v2.1.0

payment-service:v2.1.0 (java)
┌───────────────┬─────────────────┬──────────┐
│ Library       │ Vulnerability   │ Severity │
├───────────────┼─────────────────┼──────────┤
│ log4j-core    │ CVE-2021-44228  │ CRITICAL │
│ 2.14.1        │                 │          │
└───────────────┴─────────────────┴──────────┘

# SBOM showed exactly which services used Log4j
$ grype sbom:payment-service.spdx.json
  → payment-service: AFFECTED
  → user-service: NOT affected
  → notification-service: AFFECTED (transitive dependency!)

The Lesson: SBOMs (Software Bill of Materials) let you answer "are we affected by CVE-X?" in minutes instead of days. Generate SBOMs in your pipeline:

# Generate SBOM during build
syft myapp:latest -o spdx-json > sbom.spdx.json

# Attach SBOM to container image as attestation
cosign attest --predicate sbom.spdx.json myacr.azurecr.io/myapp:v2.1.0

# Later: scan the SBOM for vulnerabilities
grype sbom:sbom.spdx.json

🏰 Zero-Trust Network Security

"Never Trust, Always Verify"

Traditional model:
  Outside firewall = untrusted  🔴
  Inside firewall = trusted     🟢  ← This assumption kills you

Zero-trust model:
  Everything = untrusted 🔴
  Every request = verified ✅
  Even internal services must authenticate and be authorized

Zero-Trust in Kubernetes

# Step 1: Default deny ALL traffic in namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: default-deny-all
  namespace: payments
spec:
  podSelector: {}
  policyTypes: [Ingress, Egress]

# Step 2: Explicitly allow only what's needed
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: allow-api-to-payments
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payment-service
  policyTypes: [Ingress]
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: api-gateway
      ports:
        - protocol: TCP
          port: 8080

# Step 3: Allow egress only to known destinations
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: payment-egress  
  namespace: payments
spec:
  podSelector:
    matchLabels:
      app: payment-service
  policyTypes: [Egress]
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: databases
      ports:
        - protocol: TCP
          port: 5432         # PostgreSQL only
    - to:                    # Allow DNS
        - namespaceSelector:
            matchLabels:
              kubernetes.io/metadata.name: kube-system
      ports:
        - protocol: UDP
          port: 53

🚨 Real-World Disaster #4: The Lateral Movement

What Happened: An attacker exploited a Server-Side Request Forgery (SSRF) vulnerability in a public-facing web app. From inside the cluster, they could reach every other service because there were no Network Policies. They laterally moved from the web app → internal API → database admin service → production database. Full customer data exfiltrated.

With Network Policies: The SSRF would still have worked, but the attacker couldn't reach anything beyond the web app's explicitly-allowed dependencies. Lateral movement blocked at step 1.

🛡️ Admission Control: The Last Line of Defense

Even if a developer writes an insecure deployment manifest, admission controllers can catch and block it before it reaches the cluster:

# Kyverno policy: Block containers running as root
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
  name: require-non-root
spec:
  validationFailureAction: Enforce
  rules:
    - name: check-non-root
      match:
        any:
          - resources:
              kinds: ["Pod"]
      validate:
        message: "Containers must not run as root. Set runAsNonRoot: true"
        pattern:
          spec:
            containers:
              - securityContext:
                  runAsNonRoot: true

What happens when you try to deploy as root:

$ kubectl apply -f bad-deployment.yaml

Error from server: admission webhook "validate.kyverno.svc-fail"
denied the request:

resource Deployment/default/bad-app was blocked due to the following
policies:
  require-non-root:
    check-non-root: 'Containers must not run as root.
    Set runAsNonRoot: true'

# THE GATE HELD. 🛡️

🎯 Key Takeaways

Supply chain attacks are the new frontier — SBOMs, image signing, and dependency pinning aren't optional
Eliminate secrets first (Managed Identity, OIDC), vault them second, never commit them
Container images are attack surface — minimal base images, non-root, scan everything
Network Policies = micro-segmentation — default deny, explicit allow
Shift-left doesn't mean dump security on developers — automate it in the pipeline
Pre-commit hooks catch secrets BEFORE they're in Git history — where they live forever

🔥 Homework

Run gitleaks detect --source . on your repo right now. Fix what you find.
Run trivy image <your-production-image> — count the CRITICAL vulnerabilities.
Check if your production Kubernetes namespaces have Network Policies: kubectl get networkpolicies -A
Find one service using service principal + client secret. Replace it with Managed Identity.

Next up in the series: **SRE Explained: Because "It Works on My Machine" is Not an SLO* — where we decode SLOs, error budgets, incident management, and chaos engineering.*

💬 Ever found a secret in your Git history? How did you handle it? Share below — this is a judgment-free zone. (We've all been there. ALL of us.) 🫣

DEV Community