DEV Community

Cover image for I built a container escape audit tool — here's what v4.0 adds
Liam Romanis
Liam Romanis

Posted on

I built a container escape audit tool — here's what v4.0 adds

canonical_url: https://github.com/liamromanis101/K8s-container_escape_audit

Container security tooling tends to fall into two camps: heavyweight scanners that run outside the container before deployment, and ad-hoc one-liners you paste into a shell when something looks wrong. container_escape_audit.sh sits in neither — it runs inside a live container, checks the actual runtime environment, and tells you exactly what an attacker who just landed in that container would be looking at.

Version 4.0 adds 12 kernel hardening checks, a config-driven CVE engine, and a database of 10 current kernel CVEs — including three that are actively exploited in the wild right now. This post walks through what's new and why the config-driven approach matters.

Repo: github.com/liamromanis101/K8s-container_escape_audit


The quick version

# grab both files
curl -sO https://raw.githubusercontent.com/liamromanis101/K8s-container_escape_audit/main/container_escape_audit.sh
curl -sO https://raw.githubusercontent.com/liamromanis101/K8s-container_escape_audit/main/cve_checks.conf
chmod +x container_escape_audit.sh
./container_escape_audit.sh
Enter fullscreen mode Exit fullscreen mode

It runs in 15–45 seconds, writes a structured report, and exits. No installation. No root required (though running as root inside the container gives you more complete results). Nothing is written to the system — every check is read-only.

Output looks like this:

[CRIT]  Container appears PRIVILEGED (CapEff=0000003fffffffff)
[CRIT]  VULNERABLE to Copy Fail (CVE-2026-31431) — AEAD socket bindable, kernel 6.1.112
[WARN]  kptr_restrict=1 — kernel pointers visible to root processes
[WARN]  Unprivileged user namespaces enabled (kernel.unprivileged_userns_clone=1)
[ OK ]  cgroup v2 subtree_control is not writable
[ OK ]  No readable SSH private keys found

==================== SUMMARY ====================
  [CRITICAL] Container is running in privileged mode
  [CRITICAL] Copy Fail (CVE-2026-31431) AF_ALG exposure — CRITICAL [ITW] [CISA-KEV]
  [HIGH    ] kptr_restrict=1: kernel pointers visible to root
  [HIGH    ] Unprivileged user namespace creation is enabled

  CRITICAL: 2  |  HIGH: 5  |  MEDIUM: 3  |  INFO: 4
Enter fullscreen mode Exit fullscreen mode

Every finding in the full report has four fields: what it is, impact, exploitability, and recommendation.


What it checks

The script runs 47 checks across four sections plus the CVE engine.

Checks 1–23 are the classic container escape vectors most people are familiar with — privileged mode, dangerous capabilities, host namespace sharing, dangerous mounts, /proc exposure, Kubernetes service account tokens, writable cron and auth files, runtime sockets, SUID binaries, and so on.

Checks 24–35 cover newer runtime attack surface: NVIDIAScape (CVE-2025-23266), the runc masked-path race trio (CVE-2025-31133/-52565/-52881), eBPF exposure, debugfs, Kubernetes RBAC active probing via SelfSubjectAccessReview, kernel keyring access, OCI hook injection paths, page cache write primitives (splice + pipe2), and procfs namespace FD leakage.

Checks 36–47 are new in v4.0 — kernel hardening posture. More on these below.

The CVE engine reads cve_checks.conf and runs compound checks against each entry. Also new in v4.0.


New: kernel hardening checks

When you're auditing a container, you're looking at the host kernel's sysctl values too — they reflect directly what mitigations are and aren't active. All of these are read from /proc/sys with no writes, no side effects.

The ones that tend to be most impactful in practice:

kernel.kptr_restrict — if this is 0, every user on the box can read kernel symbol addresses from /proc/kallsyms. That's an instant KASLR bypass. Most exploits against kernel vulnerabilities need an address leak as step one; when kptr_restrict=0 you skip that step entirely. It should be 2.

kernel.unprivileged_userns_clone — unprivileged user namespaces are the prerequisite for the majority of container escape CVEs published since 2019. Flipping Pages (CVE-2024-1086), the Packet Socket Race (CVE-2025-38617), Copy Fail, Dirty Frag — all of them either require user namespaces or become significantly easier with them. Setting this to 0 on hosts that don't need rootless containers removes a huge amount of attack surface in one sysctl.

kernel.perf_event_paranoid — at value 0 or -1, unprivileged processes can access kernel-level performance counters. This is the foundation of Spectre-class side-channel attacks and enables cross-container information leakage on shared CPU nodes. It should be at least 2.

fs.protected_symlinks and fs.protected_hardlinks — classic /tmp race conditions. Still come up regularly in privilege escalation chains. Both should be 1.

The full list in v4.0:

# Parameter Recommended
36 kernel.kptr_restrict 2
37 kernel.dmesg_restrict 1
38 kernel.randomize_va_space 2
39 fs.protected_symlinks / fs.protected_hardlinks 1
40 fs.protected_fifos / fs.protected_regular 2
41 net.ipv4.tcp_syncookies 1
42 ICMP redirects / source routing / rp_filter 0 / 0 / 1
43 IP forwarding informational
44 kernel.unprivileged_userns_clone 0
45 kernel.perf_event_paranoid ≥ 2
46 Dirty Frag modules (esp4, esp6, rxrpc) not loaded
47 Dangerous loaded modules audit 14 modules

New: config-driven CVE checks

Previously, CVE-specific checks were hardcoded functions in the script. Adding a new one meant modifying the script. That's fine for a handful of checks, but it doesn't scale well — and it means the script and the CVE data are tightly coupled when they really shouldn't be.

In v4.0, CVE checks are defined in cve_checks.conf. The script reads the file at runtime and dispatches the right test for each entry. To add a new CVE, you append a block to the config. The script doesn't change.

A config entry looks like this:

cve_id=CVE-2024-1086
name=Flipping Pages
cvss=7.8
severity=CRITICAL
check_type=compound
introduced=3.15
fixed_versions=5.15:5.15.149 6.1:6.1.76 6.6:6.6.15
itw=yes
poc_public=yes
cisa_kev=yes
subsystem=net/netfilter/nf_tables
module_names=nf_tables
mitigation=rmmod nf_tables 2>/dev/null; echo 'install nf_tables /bin/false' > /etc/modprobe.d/nftables.conf
socket_af=none
socket_type=none
socket_proto=none
what=CVE-2024-1086 is a use-after-free in nf_tables...
impact=Full local privilege escalation...
exploit=Public PoC, 99.4% success rate on Debian/Ubuntu/KernelCTF...
rec=Patch to v5.15.149+, v6.1.76+, or v6.6.15+...
Enter fullscreen mode Exit fullscreen mode

The check_type field drives what the engine actually does:

  • kernel_version — parses uname -r and compares against the introduced/fixed_versions ranges. Handles all current LTS series.
  • module_loaded — checks /proc/modules for the listed modules.
  • socket_family — tries to open a socket with the given AF/type/proto from Python, which tells you whether the attack surface is reachable from within this specific container regardless of kernel patch status.
  • compound — runs all three and synthesises a combined severity.

The compound severity logic is worth spelling out because it avoids the two failure modes of noisy and silent:

kernel in affected range + module loaded or socket reachable  →  CRITICAL
kernel in affected range + module not blacklisted             →  HIGH (auto-load risk)
kernel in affected range + module blacklisted                 →  MEDIUM (interim mitigation, patch needed)
kernel not in affected range                                  →  INFO
Enter fullscreen mode Exit fullscreen mode

That last HIGH case catches the thing that trips people up: a module that's not currently loaded but also not blacklisted can be auto-loaded by the kernel just by opening the right socket. "Not loaded" is not the same as "not exploitable."


What's in the CVE database

The shipped cve_checks.conf has ten entries. Three are actively exploited right now.

Copy Fail (CVE-2026-31431, CVSS 7.8, CISA KEV) — a flaw in the algif_aead AF_ALG interface that gives any unprivileged user a 4-byte write into the page cache of any readable executable. A 732-byte Python PoC with no dependencies achieves reliable root. Affects kernels from 4.14. Interim mitigation: rmmod algif_aead && echo 'install algif_aead /bin/false' > /etc/modprobe.d/copyfail.conf.

Dirty Frag (CVE-2026-43284 + CVE-2026-43500, CVSS 8.8/7.8) — same bug class as Copy Fail but through the IPsec ESP and RxRPC subsystems. Provides full attacker-controlled page cache writes at any offset, not just 4 bytes. Two CVEs, typically chained. As of writing, no distro patch exists for the RxRPC path — blacklisting rxrpc is the only mitigation. Note: blacklisting esp4/esp6 kills IPsec — check before you fleet-deploy.

Flipping Pages (CVE-2024-1086, CVSS 7.8, CISA KEV) — use-after-free in nf_tables. 99.4% success rate PoC. Used in RansomHub and Akira ransomware campaigns. Needs unprivileged user namespaces + nf_tables loaded. Affects 3.15–6.6.14.

The other seven: Attack of the Vsock (CVE-2025-21756), Chronomaly (CVE-2025-38352), Packet Socket Race (CVE-2025-38617), OverlayFS SetUID Copy (CVE-2025-38352, CISA KEV), DirtyPipe (CVE-2022-0847), and DirtyCOW (CVE-2016-5195).


Updating the database

The whole point of the config-driven design is that you shouldn't need to touch the script when new vulnerabilities drop. When a CVE gets a distro patch, update fixed_versions. When something goes ITW, update itw=yes. Add a new entry for a new CVE:

cat >> cve_checks.conf << 'ENTRY'

cve_id=CVE-2025-XXXXX
name=Whatever it's called
cvss=8.1
severity=HIGH
check_type=compound
introduced=6.1
fixed_versions=6.6:6.6.50 6.12:6.12.5
itw=no
poc_public=yes
cisa_kev=no
subsystem=fs/btrfs
module_names=btrfs
mitigation=none
socket_af=none
socket_type=none
socket_proto=none
what=...
impact=...
exploit=...
rec=...
ENTRY
Enter fullscreen mode Exit fullscreen mode

Running in Kubernetes

Drop it into a pod directly:

kubectl cp container_escape_audit.sh mynamespace/mypod:/tmp/audit.sh
kubectl cp cve_checks.conf mynamespace/mypod:/tmp/cve_checks.conf
kubectl exec -n mynamespace mypod -- bash /tmp/audit.sh \
  --cve-conf /tmp/cve_checks.conf \
  --report /tmp/audit_report.txt
kubectl cp mynamespace/mypod:/tmp/audit_report.txt ./
Enter fullscreen mode Exit fullscreen mode

Or as a Job that audits the cluster's default security context:

apiVersion: batch/v1
kind: Job
metadata:
  name: container-escape-audit
spec:
  template:
    spec:
      restartPolicy: Never
      containers:
        - name: audit
          image: alpine:latest
          command:
            - sh
            - -c
            - |
              apk add --no-cache bash curl python3 && \
              curl -sO https://raw.githubusercontent.com/liamromanis101/K8s-container_escape_audit/main/container_escape_audit.sh && \
              curl -sO https://raw.githubusercontent.com/liamromanis101/K8s-container_escape_audit/main/cve_checks.conf && \
              chmod +x container_escape_audit.sh && \
              ./container_escape_audit.sh --json
Enter fullscreen mode Exit fullscreen mode
kubectl apply -f audit-job.yaml
kubectl wait --for=condition=complete job/container-escape-audit --timeout=120s
kubectl logs job/container-escape-audit
kubectl delete job container-escape-audit
Enter fullscreen mode Exit fullscreen mode

The Job runs with whatever security context the cluster assigns by default — which is the point. You want to see what a workload running in your cluster can actually reach, not what it could reach if you gave it extra permissions.


CI integration

The JSON output makes it straightforward to gate on CRITICAL findings:

CRITICAL_COUNT=$(./container_escape_audit.sh --json --no-report \
  | jq '[.findings[] | select(.severity=="CRITICAL")] | length')

if [ "$CRITICAL_COUNT" -gt 0 ]; then
  echo "FAILED: $CRITICAL_COUNT critical escape vectors detected"
  exit 1
fi
Enter fullscreen mode Exit fullscreen mode

Requirements and limitations

Requirements: Bash 4.2+, Python 3. Everything else uses /proc, /sys, and standard POSIX tools. curl is optional (used for IMDS and Kubelet API checks).

What it isn't: this is a point-in-time audit, not continuous monitoring. It identifies attack surface — it doesn't exploit anything. A CRITICAL finding means the prerequisites for a known attack are present, not that you've been compromised. For continuous detection, pair it with Falco rules watching for writes to release_agent and core_pattern, AF_ALG socket creation from non-root processes, and LD_PRELOAD pointing to /tmp or /dev/shm.


Licence

CC BY-NC 4.0 — free for non-commercial use with attribution. If you're using this as part of a paid engagement or commercial product, we're happy to discuss sponsorship: github.com/sponsors/liamromanis101.


Feedback, issues, and PRs welcome. If you hit a false positive, a missed check, or a CVE that should be in the database, open an issue on GitHub.

github.com/liamromanis101/K8s-container_escape_audit

Top comments (0)