DEV Community

Cover image for Container Security for SREs: The Practical Checklist
Samson Tanimawo
Samson Tanimawo

Posted on

Container Security for SREs: The Practical Checklist

Security Is Part of Reliability

SREs think about availability, latency, and throughput. But a security breach is just another type of incident often the worst kind. Here's the container security checklist I use.

The Base Image Problem

# Bad: 800MB image with everything including gcc
FROM ubuntu:22.04
RUN apt-get update && apt-get install -y python3 python3-pip
COPY. /app
RUN pip install -r requirements.txt

# Good: 50MB image with only what's needed
FROM python:3.11-slim AS builder
COPY requirements.txt.
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-slim
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY. /app
USER nobody
Enter fullscreen mode Exit fullscreen mode

Smaller image = smaller attack surface. The multi-stage build removes build tools from the final image.

The Security Checklist

1. Image Scanning

# GitHub Actions: Scan before pushing
- name: Scan image for vulnerabilities
uses: aquasecurity/trivy-action@master
with:
image-ref: 'myapp:${{ github.sha }}'
format: 'table'
exit-code: '1' # Fail build on HIGH/CRITICAL
severity: 'HIGH,CRITICAL'
ignore-unfixed: true # Only fail on fixable vulns
Enter fullscreen mode Exit fullscreen mode

2. Non-Root Container

# Always run as non-root
RUN addgroup --system app && adduser --system --ingroup app app
USER app
Enter fullscreen mode Exit fullscreen mode
# Kubernetes: Enforce non-root
securityContext:
runAsNonRoot: true
runAsUser: 1000
readOnlyRootFilesystem: true
allowPrivilegeEscalation: false
capabilities:
drop: ["ALL"]
Enter fullscreen mode Exit fullscreen mode

3. Network Policies

# Default deny all traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress

---
# Allow only specific traffic
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: api-service-policy
spec:
podSelector:
matchLabels:
app: api-service
ingress:
- from:
- podSelector:
matchLabels:
app: nginx-ingress
ports:
- port: 8080
egress:
- to:
- podSelector:
matchLabels:
app: postgres
ports:
- port: 5432
Enter fullscreen mode Exit fullscreen mode

4. Secrets Management

# Bad: Secrets in environment variables
env:
- name: DB_PASSWORD
value: "super-secret-password" # Visible in pod spec!

# Good: Secrets from external vault
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-credentials
key: password

# Best: Secrets injected from Vault
annotations:
vault.hashicorp.com/agent-inject: "true"
vault.hashicorp.com/role: "api-service"
vault.hashicorp.com/agent-inject-secret-db: "secret/data/db"
Enter fullscreen mode Exit fullscreen mode

5. Resource Limits (Security Aspect)

# Without limits, a compromised container can consume all resources
resources:
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 500m
memory: 256Mi
ephemeral-storage: 100Mi # Prevent disk filling attacks
Enter fullscreen mode Exit fullscreen mode

6. Pod Security Standards

# Enforce restricted security standard at namespace level
apiVersion: v1
kind: Namespace
metadata:
name: production
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/warn: restricted
Enter fullscreen mode Exit fullscreen mode

The Audit Automation

I run this weekly:

#!/bin/bash
# weekly-security-audit.sh

echo "=== Image Age Check ==="
kubectl get pods -A -o json | jq -r '.items[] |.spec.containers[] |.image' | sort -u | while read img; do
age=$(skopeo inspect docker://$img 2>/dev/null | jq -r '.Created')
echo "$img built: $age"
done

echo "=== Privileged Containers ==="
kubectl get pods -A -o json | jq -r '.items[] | select(.spec.containers[].securityContext.privileged == true) |.metadata.namespace + "/" +.metadata.name'

echo "=== Containers Running as Root ==="
kubectl get pods -A -o json | jq -r '.items[] | select(.spec.securityContext.runAsNonRoot!= true) |.metadata.namespace + "/" +.metadata.name'

echo "=== Missing Resource Limits ==="
kubectl get pods -A -o json | jq -r '.items[] | select(.spec.containers[].resources.limits == null) |.metadata.namespace + "/" +.metadata.name'
Enter fullscreen mode Exit fullscreen mode

Every finding becomes a ticket. No exceptions.

If you want automated security monitoring for your container infrastructure, check out what we're building at Nova AI Ops.


Written by Dr. Samson Tanimawo
BSc · MSc · MBA · PhD
Founder & CEO, Nova AI Ops. https://novaaiops.com

Top comments (0)