Production-Grade Container Hardening

#container #security #docker #aws

In modern DevOps, running containers as root isn't just sloppy — it's an open invitation. If your application is compromised while running as root, the attacker isn't just inside your app. They own the entire container. Every secret, every mounted volume, every network socket.

The good news? You can architect containers where a successful exploit lands an attacker in a box with nothing — no shell, no tools, no write access, no privileges. That's what this article is about.

We're building a hardened, production-grade container designed to run on AWS ECS Fargate, using defense-in-depth at every layer: the image, the process manager, the filesystem, and the task definition itself.

Layer 1: The Multi-Stage Build — Asset Stripping, Not Just Space Saving

Most developers know multi-stage builds shrink image size. Fewer realize they're also your first line of defense.
The strategy is simple: build dirty, run clean. Your first stage installs compilers, pulls npm packages, runs tests — all the messy work. Your final stage inherits none of it.

# Stage 1: The dirty build environment
FROM node:20-alpine AS builder
WORKDIR /app
COPY . .
RUN npm ci && npm run build

# Stage 2: The clean runtime — no npm, no git, no source code
FROM nginx:1.25-alpine
COPY --from=builder --chown=appuser:appgroup /app/client/dist /usr/share/nginx/html

Notice the --chown flag on the COPY instruction. Files land with the correct ownership immediately — no root middleman, no chmod dance afterward.

Layer 2: The Ghost Account — Least Privilege as Architecture

Alpine Linux defaults to running everything as root. We fix that immediately.

RUN addgroup -S appgroup && adduser -S appuser -G appgroup

The -S flag creates a system user — no password, no login shell, no home directory with a .bashrc to backdoor. It's a ghost account: it exists only so the kernel has a non-root identity to assign to your process.

USER appuser

This single line changes everything. From this point forward, every RUN, CMD, and ENTRYPOINT executes as appuser. The ceiling is enforced by the OS itself.

Layer 3: Taming Nginx — The Privileged Citizen Problem

Here's where it gets interesting. Standard Nginx assumes it's running as root. It wants to write its PID file to /var/run/nginx.pid and its logs to /var/log/nginx/. Our appuser is forbidden from touching either of those paths.
Rather than granting extra permissions, we patch Nginx to work within our constraints:

#Redirect Nginx internals to paths appuser actually owns
RUN sed -i 's|pid /var/run/nginx.pid;|pid /tmp/nginx.pid;|g' /etc/nginx/nginx.conf

# Pre-create the temp paths and hand them to appuser
RUN mkdir -p /tmp/client_body /tmp/proxy_temp /var/cache/nginx \
    && chown -R appuser:appgroup /tmp /var/cache/nginx

We're not lowering the security bar to accommodate Nginx — we're forcing Nginx to operate within our security model. The PID file and all scratch storage move to /tmp, which we then mount as ephemeral, hardened tmpfs volumes in the Fargate task definition.

linuxParameters = {
  tmpfs = [
    { containerPath = "/tmp",      size = 128, mountOptions = ["noexec", "nosuid", "nodev"] },
    { containerPath = "/app/logs", size = 64,  mountOptions = ["noexec", "nosuid", "nodev"] },
  ]
  readonlyRootFilesystem = true
}

Those three mount options are doing serious work:

noexec — Nothing in /tmp can be executed. Even if an attacker writes a binary there, it won't run.
nosuid — Blocks privilege escalation via setuid binaries dropped into the volume.
nodev — Prevents creation of device files that could be used to bypass hardware-level security.

And readonlyRootFilesystem = true is the crown jewel: the entire container filesystem is immutable at runtime. The only writable paths are the explicitly mounted tmpfs volumes — and those can't execute anything.

Layer 4: Supervisord Without the Crown

In a traditional setup, a process manager like systemd runs as root. We use Supervisord, and we strip its crown before it starts

[supervisord]
user=appuser
logfile=/tmp/supervisord.log
pidfile=/tmp/supervisord.pid

[program:nginx]
command=nginx -g 'daemon off;'
stdout_logfile=/dev/stdout
stderr_logfile=/dev/stderr

user=appuser means even the manager of processes has no administrative power. It coordinates but cannot escalate.
The stdout_logfile=/dev/stdout line solves another problem quietly: logs are streamed directly to Docker's logging driver and never written to disk inside the container. No sensitive log data sitting in a writable layer. No persistence for an attacker to mine.

Layer 5: Dropping Linux Capabilities — Cutting the Kernel's Leash

Even a non-root user can hold Linux capabilities — granular kernel permissions like the ability to bind low-numbered ports (NET_BIND_SERVICE), manipulate network interfaces (NET_ADMIN), or bypass file permission checks (DAC_OVERRIDE).

Every capability your container holds is an attack surface. The principle is simple: if you don't need it, drop it.

In your Fargate task definition:
linuxParameters = {
  capabilities = {
    drop = ["ALL"]
  }
}

After dropping all capabilities, your container's /proc/self/status should show:

CapEff: 0000000000000000
CapBnd: 0000000000000000

CapEff at zero means the process has no active kernel privileges. CapBnd at zero means it can never acquire any — capabilities removed from the bounding set cannot be added back. The kernel's leash is cut.

Layer 6: The Task Definition as a Second Lock

Your Dockerfile hardens the image. Your Fargate Task Definition hardens the runtime. These are two independent locks on the same door.

json"containerDefinitions": [
  {
    "privileged": false,
    "user": "appuser",
    "readonlyRootFilesystem": true,
    "linuxParameters": {
      "capabilities": { "drop": ["ALL"] }
    }
  }
]

Why does this matter if the Dockerfile already sets USER appuser? Because the Task Definition is enforced by the AWS Fargate agent at runtime, independently of what the image contains. Even if someone pushes a misconfigured image that forgot the USER directive, Fargate will still enforce appuser. Defense-in-depth means each layer protects against the failure of the layer before it.

privileged: false is the explicit rejection of the Docker --privileged flag, which would otherwise give the container near-full host access. On Fargate, the "host" is AWS's infrastructure — you definitely don't want that.

Layer 7: Trust But Verify — Container Image Signing with Notation

Hardening your runtime is only half the story. How do you know the image you're deploying is the one you built? Supply chain attacks — where a malicious image is substituted somewhere between your registry and your cluster — are a growing threat.

Notation (a CNCF project) lets you cryptographically sign container images and verify those signatures before deployment.

bash# Sign after pushing to ECR
notation sign <your-ecr-registry>/your-app:latest

Verify before deploying

notation verify <your-ecr-registry>/your-app:latest

Integrate this into your CI/CD pipeline: sign on push, verify on deploy. If the signature doesn't match, the deployment doesn't happen. You get cryptographic proof that what's running in Fargate is exactly what your pipeline built — no substitutions, no tampering.

🛡️ Security Hardening Matrix

Layer	Focus	Risk	Threat	Mitigation
L1	Multi-Stage Build	Build-time residue (compilers, `.git`, secrets) left in image.	Lateral Movement: Attackers use leftover tools to compile malware or pivot deeper into the network.	Build Dirty, Run Clean: Separate build and runtime stages; only production assets move to the final image.
L2	Non-Root Identity	Containers running as `root` (UID 0) by default.	Host Escape: Exploits (e.g., CVE-2024-21626) allow a root process to break out and control the host.	Ghost Accounts: Create a system `appuser` with no shell or home directory to enforce a permission "ceiling."
L3	Hardened Nginx	App requires root access to write to system paths like `/var/run`.	Runtime Tampering: Attackers overwrite configs or web files to serve malware or redirect traffic.	User-Owned Paths: Patch Nginx to use `/tmp` for PIDs/cache and `chown` those paths to the `appuser`.
L4	Unprivileged Manager	Process managers (Supervisord) traditionally running with root "crowns."	Privilege Escalation: A hijacked manager grants the attacker "God Mode" over all managed sub-processes.	Powerless Manager: Run `supervisord` as `appuser` and stream logs to `stdout` to prevent local data mining.
L5	Immutable Filesystem	Writable runtime layers allow attackers to modify the OS environment.	Malware Persistence: Attackers download web shells or scripts that survive as long as the container runs.	The "DVD" Model: Enable `readonlyRootFilesystem` and use `tmpfs` mounts with `noexec` to kill execution.
L6	Kernel Capabilities	Granular kernel permissions (Capabilities) active by default.	Privilege Jumping: Attackers use `NET_RAW` or `DAC_OVERRIDE` to sniff traffic or bypass file security.	The Blackout: Use `drop = ["ALL"]` to zero out `CapEff` and `CapBnd`, stripping all kernel-level privileges.
L7	Image Signing	Unverified images pulled from an untrusted or compromised registry.	Supply Chain Attack: An attacker swaps a legitimate image with a "poisoned" version containing a backdoor.	Notation (CNCF): Cryptographically sign images in CI/CD and verify signatures before every deployment.