π¬ The Slack Message Nobody Wants to See
#security-incidents β Today at 4:47 AM
π¨ @channel CRITICAL SECURITY INCIDENT
Defender for Cloud detected cryptomining activity on aks-prod-eastus.
Pod 'web-proxy-7f8d9' in namespace 'default' is communicating with
known C2 server at 185.x.x.x. Containment in progress.
Welcome to DevSecOps β where we learn to catch attackers before they find your credit card processing system, steal your customer database, or turn your cluster into a Bitcoin mining farm.
This isn't theoretical. Every incident in this blog is based on real events. Let's make sure they don't happen to you.
π Shift-Left: Moving Security From "Their Problem" to "Our Problem"
Traditional security is a gate at the end β code is done, someone from security reviews it, finds 47 issues, sends it back. The developer who wrote it three weeks ago barely remembers the context. Everything is late.
DevSecOps shifts security left β into every stage of the pipeline:
Traditional:
Code β Build β Test β ββββ SECURITY GATE ββββ β Deploy β π±
(3-week bottleneck)
DevSecOps:
πIDE πPreCommit πPR Gate πBuild πDeploy πRuntime
Secret SAST Full SAST Container Admission WAF
detection lint SCA scan image control Runtime
in editor Dependency scanning Image protection
audit SBOM signing
The mindset shift: Security findings are bugs. Bugs have SLAs:
| Severity | SLA | Example |
|---|---|---|
| Critical | Fix within 24 hours | Known exploited CVE, leaked production secret |
| High | Fix within 7 days | SQL injection, missing auth check |
| Medium | Fix within 30 days | Missing HTTPS redirect, verbose error messages |
| Low | Fix within 90 days | Minor info disclosure, missing security headers |
π Supply Chain Security: The Attack You Don't See Coming
The Scariest Attacks in DevOps
These aren't hypothetical β they happened:
π¦ SolarWinds (2020): Attackers compromised the BUILD SYSTEM.
Backdoored code was part of the signed, legitimate update.
18,000 organizations affected.
π¦ Codecov (2021): Attackers modified a bash uploader script.
CI/CD pipelines sent environment variables (including secrets)
to attacker's server.
π¦ ua-parser-js (2021): Maintainer's npm account was compromised.
Malicious version published to npm. Installed cryptominer
and password stealer. 7M+ weekly downloads affected.
π¦ Log4Shell (2021): CVE in Log4j library. Remote code execution
via a LOG MESSAGE. If your app logged user input (almost all do)
β instant remote access.
Your Supply Chain: Attack Vectors & Defenses
Source Code Build Process Dependencies
β β β
βΌ βΌ βΌ
Attack: Attack: Attack:
Unauthorized Tampered build Malicious package
code change Compromised runner Typosquatting
Dependency confusion
β β β
βΌ βΌ βΌ
Defense: Defense: Defense:
Signed commits Ephemeral runners Lock files (always)
Branch protection Reproducible builds Dependabot / Snyk
PR reviews Provenance Private registry
CODEOWNERS attestation Version pinning
π¨ Real-World Disaster #1: The Dependency Confusion
What Happened: A company had an internal npm package called @company/auth-utils hosted on their private registry. An attacker published auth-utils (without the scope) on the public npm registry with version 99.0.0.
When the CI pipeline ran npm install, npm's resolution logic found the public package with a higher version number and installed the attacker's package instead of the internal one. The malicious package exfiltrated all environment variables (including secrets) during the postinstall script.
The Fix:
# 1. Always use scoped packages with registry mapping
echo "@company:registry=https://company.pkgs.dev.azure.com/_packaging/feed/npm/registry/" > .npmrc
# 2. Use npm audit and lockfile-lint
npx lockfile-lint --path package-lock.json --type npm \
--allowed-hosts npm company.pkgs.dev.azure.com
# 3. Enable upstream source restrictions in Azure Artifacts
# Only allow specific public packages, not everything
ποΈ Secrets Management: The Tier System
Tier 1: Eliminate Secrets Entirely (Best Option)
App β Azure Resource? Use MANAGED IDENTITY
"Hey Azure, I'm this VM. Give me access to that SQL database."
"OK, you're registered. Here's a short-lived token."
β No password stored anywhere. Ever.
K8s Pod β Azure Resource? Use WORKLOAD IDENTITY
"Hey Azure, I'm this Kubernetes service account."
"OK, your identity is federated. Here's a token."
β No secret in the pod. No secret in Key Vault. Nothing to rotate.
CI/CD β Azure? Use OIDC FEDERATION
"Hey Azure, I'm this GitHub Actions workflow."
"OK, your repo and branch are verified. Here's a token."
β No client secret. Token lives for minutes.
Tier 2: Centralized Vault (When Secrets Are Unavoidable)
Sometimes you NEED a secret (third-party API key, legacy system password). In that case:
Azure Key Vault Configuration (non-negotiable settings):
β
Soft delete: Enabled (30 day retention)
β
Purge protection: Enabled (can't permanently delete)
β
Network access: Private Endpoint ONLY (no public)
β
Access model: RBAC (not access policies)
β
Diagnostics: All logs β Log Analytics
β
Rotation: Automated where possible
Tier 3: Kubernetes Secrets (Acceptable With Encryption)
# Better: Secrets Store CSI Driver (mounts Key Vault secrets as files)
apiVersion: secrets-store.csi.x-k8s.io/v1
kind: SecretProviderClass
metadata:
name: azure-kv-secrets
spec:
provider: azure
parameters:
keyvaultName: "kv-prod-eastus"
objects: |
array:
- |
objectName: db-connection-string
objectType: secret
tenantId: "xxx"
secretObjects: # Also sync to K8s secret (if needed by app)
- secretName: db-secret
type: Opaque
data:
- objectName: db-connection-string
key: connectionString
π¨ Real-World Disaster #2: The Git Commit That Leaked Production Credentials
The Git Log:
commit a1b2c3d
Author: dev@company.com
Message: "add database config"
+DATABASE_URL=postgresql://admin:SuperSecretP@ssw0rd!@prod-db.postgres.database.azure.com:5432/payments
Timeline:
- Developer commits connection string with password to Git
- Code review misses it (reviewer focused on logic, not config)
- PR merged to main
- 6 months later, company enables GitHub's public visibility for the repo (for open-sourcing)
- Bot scrapes public GitHub repos for credentials β finds the password
- Database compromised within 4 hours
The Fix (Multiple Layers):
# Prevention Layer 1: Pre-commit hooks
# .pre-commit-config.yaml
repos:
- repo: https://github.com/gitleaks/gitleaks
rev: v8.18.0
hooks:
- id: gitleaks
# Prevention Layer 2: GitHub Secret Scanning (free!)
# Settings β Code security β Secret scanning β Enable
# Prevention Layer 3: Pipeline check
- name: Scan for secrets
run: |
gitleaks detect --source . --verbose
if [ $? -ne 0 ]; then
echo "π¨ Secrets detected in code! Fix before merging."
exit 1
fi
If it's already committed: Rotating the secret is NOT enough. You must:
- Rotate the secret immediately (change the password)
- Revoke the old secret (disable old connection string)
- Audit access logs (did anyone use the leaked credential?)
- Rewrite Git history (the commit is forever in history otherwise)
π³ Container Security: What Lurks Inside Your Images
The Container Image is Just a Filesystem
Your "secure application" runs on top of an OS image that might contain hundreds of known vulnerabilities:
$ trivy image myapp:latest
myapp:latest (debian 12.4)
βββββββββββββββββββββββββββββββββββββββ
Total: 142 (CRITICAL: 3, HIGH: 28, MEDIUM: 67, LOW: 44)
ββββββββββββββββ¬βββββββββββββββββββ¬βββββββββββ¬βββββββββββββββββββββ
β Library β Vulnerability β Severity β Fixed Version β
ββββββββββββββββΌβββββββββββββββββββΌβββββββββββΌβββββββββββββββββββββ€
β libssl3 β CVE-2024-XXXX β CRITICAL β 3.0.13-1 β
β libcurl4 β CVE-2024-YYYY β CRITICAL β 7.88.1-10+deb12u5 β
β zlib1g β CVE-2024-ZZZZ β HIGH β 1.2.13+dfsg-1 β
ββββββββββββββββ΄βββββββββββββββββββ΄βββββββββββ΄βββββββββββββββββββββ
The Container Security Checklist
# 1. Use minimal base images
FROM node:20-alpine # β
Alpine = ~5MB base
# NOT FROM node:20 # β Full Debian = ~350MB + 200 CVEs
# 2. Don't run as root
RUN addgroup -S app && adduser -S app -G app
USER app # β
Run as non-root user
# 3. Multi-stage builds (don't ship build tools)
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine AS runtime
COPY --from=builder /app/dist /app/dist
COPY --from=builder /app/node_modules /app/node_modules
USER 1000 # Non-root
EXPOSE 8080
CMD ["node", "/app/dist/index.js"]
# 4. Pin versions and use digests in production
FROM node:20.11.1-alpine3.19@sha256:abc123... # Immutable reference
π¨ Real-World Disaster #3: The Log4Shell Panic (And How Scanning Would Have Caught It)
December 9, 2021. The Log4Shell vulnerability (CVE-2021-44228) was publicly disclosed. CVSS score: 10.0 (maximum severity). Any Java application using Log4j 2.x that logged user input was vulnerable to Remote Code Execution.
The Panic Timeline:
Hour 0: CVE published
Hour 2: Exploit code on GitHub
Hour 6: Mass scanning across the internet
Hour 12: "Is our app vulnerable?" "Uh... we don't know"
Hour 24: Still manually checking every service
Hour 48: "We THINK we found all instances..."
Hour 72: Third-party vendor says they were affected too
Teams WITH container scanning:
# Automated scan found it in 30 minutes
$ trivy image payment-service:v2.1.0
payment-service:v2.1.0 (java)
βββββββββββββββββ¬ββββββββββββββββββ¬βββββββββββ
β Library β Vulnerability β Severity β
βββββββββββββββββΌββββββββββββββββββΌβββββββββββ€
β log4j-core β CVE-2021-44228 β CRITICAL β
β 2.14.1 β β β
βββββββββββββββββ΄ββββββββββββββββββ΄βββββββββββ
# SBOM showed exactly which services used Log4j
$ grype sbom:payment-service.spdx.json
β payment-service: AFFECTED
β user-service: NOT affected
β notification-service: AFFECTED (transitive dependency!)
The Lesson: SBOMs (Software Bill of Materials) let you answer "are we affected by CVE-X?" in minutes instead of days. Generate SBOMs in your pipeline:
# Generate SBOM during build
syft myapp:latest -o spdx-json > sbom.spdx.json
# Attach SBOM to container image as attestation
cosign attest --predicate sbom.spdx.json myacr.azurecr.io/myapp:v2.1.0
# Later: scan the SBOM for vulnerabilities
grype sbom:sbom.spdx.json
π° Zero-Trust Network Security
"Never Trust, Always Verify"
Traditional model:
Outside firewall = untrusted π΄
Inside firewall = trusted π’ β This assumption kills you
Zero-trust model:
Everything = untrusted π΄
Every request = verified β
Even internal services must authenticate and be authorized
Zero-Trust in Kubernetes
# Step 1: Default deny ALL traffic in namespace
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: payments
spec:
podSelector: {}
policyTypes: [Ingress, Egress]
# Step 2: Explicitly allow only what's needed
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-api-to-payments
namespace: payments
spec:
podSelector:
matchLabels:
app: payment-service
policyTypes: [Ingress]
ingress:
- from:
- namespaceSelector:
matchLabels:
name: api-gateway
ports:
- protocol: TCP
port: 8080
# Step 3: Allow egress only to known destinations
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: payment-egress
namespace: payments
spec:
podSelector:
matchLabels:
app: payment-service
policyTypes: [Egress]
egress:
- to:
- namespaceSelector:
matchLabels:
name: databases
ports:
- protocol: TCP
port: 5432 # PostgreSQL only
- to: # Allow DNS
- namespaceSelector:
matchLabels:
kubernetes.io/metadata.name: kube-system
ports:
- protocol: UDP
port: 53
π¨ Real-World Disaster #4: The Lateral Movement
What Happened: An attacker exploited a Server-Side Request Forgery (SSRF) vulnerability in a public-facing web app. From inside the cluster, they could reach every other service because there were no Network Policies. They laterally moved from the web app β internal API β database admin service β production database. Full customer data exfiltrated.
With Network Policies: The SSRF would still have worked, but the attacker couldn't reach anything beyond the web app's explicitly-allowed dependencies. Lateral movement blocked at step 1.
π‘οΈ Admission Control: The Last Line of Defense
Even if a developer writes an insecure deployment manifest, admission controllers can catch and block it before it reaches the cluster:
# Kyverno policy: Block containers running as root
apiVersion: kyverno.io/v1
kind: ClusterPolicy
metadata:
name: require-non-root
spec:
validationFailureAction: Enforce
rules:
- name: check-non-root
match:
any:
- resources:
kinds: ["Pod"]
validate:
message: "Containers must not run as root. Set runAsNonRoot: true"
pattern:
spec:
containers:
- securityContext:
runAsNonRoot: true
What happens when you try to deploy as root:
$ kubectl apply -f bad-deployment.yaml
Error from server: admission webhook "validate.kyverno.svc-fail"
denied the request:
resource Deployment/default/bad-app was blocked due to the following
policies:
require-non-root:
check-non-root: 'Containers must not run as root.
Set runAsNonRoot: true'
# THE GATE HELD. π‘οΈ
π― Key Takeaways
- Supply chain attacks are the new frontier β SBOMs, image signing, and dependency pinning aren't optional
- Eliminate secrets first (Managed Identity, OIDC), vault them second, never commit them
- Container images are attack surface β minimal base images, non-root, scan everything
- Network Policies = micro-segmentation β default deny, explicit allow
- Shift-left doesn't mean dump security on developers β automate it in the pipeline
- Pre-commit hooks catch secrets BEFORE they're in Git history β where they live forever
π₯ Homework
- Run
gitleaks detect --source .on your repo right now. Fix what you find. - Run
trivy image <your-production-image>β count the CRITICAL vulnerabilities. - Check if your production Kubernetes namespaces have Network Policies:
kubectl get networkpolicies -A - Find one service using service principal + client secret. Replace it with Managed Identity.
Next up in the series: **SRE Explained: Because "It Works on My Machine" is Not an SLO* β where we decode SLOs, error budgets, incident management, and chaos engineering.*
π¬ Ever found a secret in your Git history? How did you handle it? Share below β this is a judgment-free zone. (We've all been there. ALL of us.) π«£
Top comments (0)