Introduction
As modern applications increasingly run in containers, the security of the container runtime becomes paramount. One of the most powerful—but often overlooked—mechanisms for hardening Linux-based containers is seccomp (Secure Computing Mode). Seccomp enables fine-grained filtering of system calls, allowing you to limit what your applications can ask of the kernel.
In this post, we’ll go beyond surface-level explanations and dive into how seccomp works, why it matters in real-world production environments, and how to leverage it effectively using Docker. You’ll learn how to inspect, customize, and enforce syscall-level security, ensuring containers operate with the least privilege necessary.
What Is Seccomp?
Seccomp is a Linux kernel feature that allows processes to restrict the system calls (syscalls) they are permitted to make. By default, a Linux process can invoke hundreds of syscalls. However, most applications use only a small subset.
By blocking unnecessary syscalls, seccomp:
Reduces the kernel attack surface
Prevents certain classes of container breakout exploits
Enforces least privilege at the syscall level
This is especially important in containers, where applications run in shared-kernel environments. Even with other isolation mechanisms like namespaces and cgroups in place, unfiltered access to the syscall interface can lead to privilege escalation or host compromise.
Seccomp Operating Modes
Seccomp supports two modes:
1. Strict Mode
The original implementation (enabled via prctl) restricts a process to only four syscalls:
read(2)
write(2)
_exit(2)
sigreturn(2)
This mode is rarely used outside of specific academic or embedded contexts due to its extreme limitations.
2. Filter Mode (seccomp-bpf)
The modern and practical variant uses Berkeley Packet Filter (BPF) programs to evaluate syscalls at runtime. You can allow, block, log, or return errors based on syscall type, arguments, and context. This is the mode used by Docker and Kubernetes.
Why Seccomp Matters for Container Security
In containerized environments, the kernel is shared among containers and the host. A compromised container that can execute privileged syscalls becomes a potential entry point to the entire system. Seccomp provides:
Granular Syscall Control: Limit actions like creating raw sockets, mounting filesystems, or spawning new namespaces.
Exploit Mitigation: Many Linux kernel vulnerabilities rely on specific syscalls. Blocking them disrupts exploit chains.
Defense in Depth: Adds a syscall-level layer to existing security features like AppArmor, SELinux, and user namespaces.
Compliance and Auditability: You can demonstrate precise control over application behavior down to the syscall level.
How Docker Uses Seccomp
Docker applies a default seccomp profile to all containers unless overridden. This profile is maintained by the Moby project and is reasonably conservative: it blocks approximately 44 syscalls out of over 300 available on a typical x86_64 system.
Blocked syscalls include:
keyctl (used for kernel key management)
add_key, request_key
ptrace
mount (unless explicitly enabled)
The default profile allows safe defaults for most containerized workloads while blocking dangerous or rarely-used syscalls.
You can view the official profile here.
Anatomy of a seccomp.json
Profile
A seccomp profile is a JSON document that defines syscall rules for a process. Key fields include:
defaultAction
What to do with syscalls that aren’t explicitly listed. Common options:
SCMP_ACT_ERRNO: Return a permission error
SCMP_ACT_KILL: Immediately terminate the process
SCMP_ACT_ALLOW: Allow the syscall
SCMP_ACT_TRACE, SCMP_ACT_LOG: Advanced debugging options
architectures
Specifies applicable architectures. Typical values:
["SCMP_ARCH_X86", "SCMP_ARCH_X86_64", "SCMP_ARCH_X32"]
syscalls
An array defining syscall rules, where each item includes:
name (or names): syscall name(s)
action: what to do if called
args (optional): match syscall arguments for conditional rules
Minimal example:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{
"name": "mkdir",
"action": "SCMP_ACT_ERRNO",
"args": []
}
]
}
This will deny the mkdir syscall for all processes using the profile, regardless of arguments.
Using Custom Seccomp Profiles in Docker
You can supply your own seccomp profile when running containers:
docker run \
--security-opt seccomp=/path/to/custom-seccomp.json \
your-image
To disable seccomp entirely (not recommended unless debugging):
docker run --security-opt seccomp=unconfined your-image
Example: Blocking Network-Related Syscalls
Here’s a custom profile that blocks network access by denying socket, connect, and mkdir:
{
"defaultAction": "SCMP_ACT_ERRNO",
"architectures": ["SCMP_ARCH_X86_64"],
"syscalls": [
{ "name": "socket", "action": "SCMP_ACT_ERRNO" },
{ "name": "connect", "action": "SCMP_ACT_ERRNO" },
{ "name": "mkdir", "action": "SCMP_ACT_ERRNO" }
]
}
Applying this profile to a container means:
ping, curl, or anything that uses sockets will fail
mkdir operations will also fail, even as root
All other syscalls will return permission errors unless explicitly allowed
Best Practices for Seccomp Usage
Start with the default profile and incrementally restrict based on runtime analysis.
Use syscall tracing tools like strace, sysdig, or bpftrace to identify which syscalls your app needs.
Don’t over-optimize early. Avoid blocking syscalls your app might need during boot or init.
Version and audit your profiles alongside your infrastructure code.
Combine seccomp with other controls like AppArmor, SELinux, cgroups, and capabilities for layered defense.
Advanced: Conditional Filtering with
args
You can apply fine-grained rules using the args field. For example, allow the socket syscall only for TCP connections (AF_INET, SOCK_STREAM):
{
"name": "socket",
"action": "SCMP_ACT_ERRNO",
"args": [
{
"index": 0,
"value": 2,
"op": "SCMP_CMP_EQ"
},
{
"index": 1,
"value": 1,
"op": "SCMP_CMP_EQ"
}
]
}
This filters the syscall only when both arguments match the expected values for an IPv4 TCP socket.
Seccomp provides syscall-level control over application behavior, reducing attack surface and enforcing least privilege. In a containerized world where isolation boundaries are thinner than traditional virtual machines, mechanisms like seccomp are essential for building secure, production-grade systems.
Whether you’re securing microservices, isolating CI pipelines, or just building better defaults, understanding and applying seccomp is an important step in your journey toward mature container security.
Top comments (0)