1. Introduction to SecurityContext in Kubernetes
A SecurityContext in Kubernetes defines privilege and access control settings for pods or containers, allowing you to control how processes run, access resources, and interact with the system. It is a critical component for securing Kubernetes workloads by enforcing least-privilege principles.
-
Pod-Level SecurityContext: Applies security settings to all containers in a pod and can affect the pod’s volumes. It’s defined under
spec.securityContext. -
Container-Level SecurityContext: Applies to a specific container and can override pod-level settings for that container. It’s defined under
spec.containers[].securityContext.
The key difference is scope:
- Pod-level settings provide a baseline for all containers and volumes in the pod.
- Container-level settings allow fine-grained customization for individual containers, overriding pod-level settings where applicable.
2. Pod-Level SecurityContext
The pod-level securityContext is defined in the pod’s spec and applies to all containers in the pod unless overridden by a container-level securityContext. It also applies to certain volume-related settings (e.g., fsGroup and seLinuxOptions).
Fields in Pod-Level SecurityContext
Here’s a comprehensive list of fields available at the pod level, their purpose, and examples:
-
runAsUser:
- Purpose: Specifies the user ID (UID) for all containers’ processes in the pod.
- Use Case: Ensures containers don’t run as root, reducing the risk of privilege escalation.
- Example: A web server pod where all containers should run as a non-root user for security.
apiVersion: v1 kind: Pod metadata: name: web-server-pod spec: securityContext: runAsUser: 1000 # All containers run as UID 1000 containers: - name: nginx image: nginx ports: - containerPort: 80In this example, a web server (e.g., Nginx) runs as UID 1000, preventing root-level access even if the container is compromised.
-
runAsGroup:
- Purpose: Sets the primary group ID (GID) for all containers’ processes.
- Use Case: Controls group ownership for files created by containers, useful for shared volumes.
- Example: A pod with a shared volume where files need consistent group ownership.
apiVersion: v1 kind: Pod metadata: name: shared-volume-pod spec: securityContext: runAsUser: 1000 runAsGroup: 3000 # Primary group ID for processes volumes: - name: shared-data emptyDir: {} containers: - name: app image: busybox command: ["sh", "-c", "echo hello > /data/testfile && sleep 1h"] volumeMounts: - name: shared-data mountPath: /dataFiles created in the
/datavolume will be owned by GID 3000, ensuring consistent group access. -
runAsNonRoot:
-
Purpose: Ensures all containers run as a non-root user (UID ≠ 0). If set to
true, Kubernetes rejects the pod if any container tries to run as root. - Use Case: Enforce a policy where no container in the pod can run as root.
- Example: A corporate policy requires all pods to run non-root for compliance.
apiVersion: v1 kind: Pod metadata: name: non-root-pod spec: securityContext: runAsNonRoot: true # Enforces non-root user containers: - name: app image: nginx ports: - containerPort: 80If the container tries to run as root, the pod will fail to start.
-
Purpose: Ensures all containers run as a non-root user (UID ≠ 0). If set to
-
fsGroup:
-
Purpose: Sets the group ID for volume ownership and permissions. Kubernetes applies this GID to volumes that support ownership management (e.g.,
emptyDir,persistentVolumeClaim). - Use Case: Ensures files in a shared volume are accessible by a specific group, such as in a multi-container pod.
- Example: A pod with a shared volume for a data processing application.
apiVersion: v1 kind: Pod metadata: name: data-processing-pod spec: securityContext: runAsUser: 1000 fsGroup: 2000 # Volume files owned by GID 2000 volumes: - name: data-vol emptyDir: {} containers: - name: processor image: busybox command: ["sh", "-c", "echo data > /data/output && sleep 1h"] volumeMounts: - name: data-vol mountPath: /dataFiles in
/datawill be owned by GID 2000, ensuring group-level access control. -
Purpose: Sets the group ID for volume ownership and permissions. Kubernetes applies this GID to volumes that support ownership management (e.g.,
-
supplementalGroups:
-
Purpose: Adds additional group IDs to container processes, beyond the primary
runAsGroup. - Use Case: Grants access to resources owned by multiple groups, such as shared storage.
- Example: A pod accessing multiple shared volumes with different group ownerships.
apiVersion: v1 kind: Pod metadata: name: multi-group-pod spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000, 5000] # Additional group memberships containers: - name: app image: busybox command: ["sh", "-c", "sleep 1h"]Processes in the container belong to GIDs 3000, 4000, and 5000, allowing access to resources owned by these groups.
-
Purpose: Adds additional group IDs to container processes, beyond the primary
-
supplementalGroupsPolicy (Kubernetes v1.33+, beta):
-
Purpose: Controls how supplementary groups are calculated. Options are:
-
Merge: Merges groups from the container image’s/etc/groupwithfsGroupandsupplementalGroups. -
Strict: Only uses groups specified infsGroup,supplementalGroups, orrunAsGroup, ignoring/etc/group.
-
- Use Case: Avoid unintended group memberships from the container image for stricter security.
- Example: A pod requiring strict group control for compliance.
apiVersion: v1 kind: Pod metadata: name: strict-groups-pod spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] supplementalGroupsPolicy: Strict # Only specified groups are used containers: - name: app image: busybox command: ["sh", "-c", "sleep 1h"]The container process will only have GIDs 3000 and 4000, ignoring any groups defined in the image’s
/etc/group. -
Purpose: Controls how supplementary groups are calculated. Options are:
-
fsGroupChangePolicy:
-
Purpose: Controls how Kubernetes changes ownership and permissions for volumes. Options are:
-
OnRootMismatch: Only changes permissions if the volume’s root directory doesn’t match the expectedfsGroup. -
Always: Always changes permissions when the volume is mounted.
-
- Use Case: Optimize pod startup time for large volumes by reducing unnecessary permission changes.
- Example: A pod with a large persistent volume.
apiVersion: v1 kind: Pod metadata: name: large-volume-pod spec: securityContext: runAsUser: 1000 fsGroup: 2000 fsGroupChangePolicy: OnRootMismatch # Optimize permission changes volumes: - name: data persistentVolumeClaim: claimName: data-pvc containers: - name: app image: busybox volumeMounts: - name: data mountPath: /dataThis reduces startup time by only changing permissions when necessary.
-
Purpose: Controls how Kubernetes changes ownership and permissions for volumes. Options are:
-
seLinuxOptions:
- Purpose: Assigns SELinux labels to containers and volumes for access control.
- Use Case: Enforce mandatory access control in environments with SELinux enabled (e.g., Red Hat systems).
- Example: A pod running in an SELinux-enabled cluster.
apiVersion: v1 kind: Pod metadata: name: selinux-pod spec: securityContext: seLinuxOptions: level: "s0:c123,c456" # SELinux label for processes and volumes containers: - name: app image: busybox command: ["sh", "-c", "sleep 1h"]All containers and volumes use the specified SELinux label, ensuring compliance with SELinux policies.
-
seLinuxChangePolicy (Kubernetes v1.33+, beta):
-
Purpose: Controls SELinux relabeling behavior. Options are:
-
MountOption: Uses mount options for faster relabeling (requiresSELinuxMountfeature gate). -
Recursive: Recursively relabels all files in the volume.
-
- Use Case: Optimize SELinux relabeling for performance or allow multiple pods with different labels to share a volume.
- Example: A pod opting out of mount-based relabeling for compatibility.
apiVersion: v1 kind: Pod metadata: name: selinux-recursive-pod spec: securityContext: seLinuxOptions: level: "s0:c123,c456" seLinuxChangePolicy: Recursive # Recursive relabeling containers: - name: app image: busybox command: ["sh", "-c", "sleep 1h"]This ensures recursive relabeling, allowing multiple pods with different SELinux labels to share a volume.
-
Purpose: Controls SELinux relabeling behavior. Options are:
-
procMount (Kubernetes v1.33+, beta):
-
Purpose: Controls the
/procfilesystem’s mount behavior. Options are:-
Default: Masks certain/procpaths (e.g.,/proc/kcore) and makes others read-only. -
Unmasked: Exposes all/procpaths, useful for nested container runtimes.
-
- Use Case: Running containers within containers (e.g., Docker-in-Docker).
- Example: A pod running a CI/CD pipeline with nested containers.
apiVersion: v1 kind: Pod metadata: name: dind-pod spec: securityContext: procMount: Unmasked # Expose full /proc hostUsers: false # Required for Unmasked containers: - name: docker image: docker:dind command: ["dockerd"]This allows the Docker daemon to access the full
/procfilesystem for container management. -
Purpose: Controls the
Real-Life Example for Pod-Level SecurityContext
Scenario: A company runs a microservices application with multiple pods, each containing multiple containers (e.g., an app and a logging sidecar). To comply with security policies, all containers must run as non-root, and shared volumes must be accessible by a specific group.
apiVersion: v1
kind: Pod
metadata:
name: microservice-pod
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
runAsNonRoot: true
volumes:
- name: logs
emptyDir: {}
containers:
- name: app
image: my-app:1.0
volumeMounts:
- name: logs
mountPath: /logs
- name: log-collector
image: fluentd
volumeMounts:
- name: logs
mountPath: /logs
Explanation:
- All containers run as UID 1000 and GID 3000.
- The
logsvolume is owned by GID 2000 (fsGroup), ensuring both containers can write to it. -
runAsNonRoot: trueenforces non-root execution, aligning with compliance requirements.
3. Container-Level SecurityContext
The container-level securityContext is defined under spec.containers[].securityContext and applies only to the specific container. It can override pod-level settings for that container but doesn’t affect volumes.
Fields in Container-Level SecurityContext
Here’s a comprehensive list of fields available at the container level:
-
runAsUser:
-
Purpose: Overrides the pod-level
runAsUserfor the specific container. - Use Case: A specific container needs to run as a different user (e.g., root for administrative tasks).
- Example: A pod with a sidecar requiring root privileges.
apiVersion: v1 kind: Pod metadata: name: mixed-user-pod spec: securityContext: runAsUser: 1000 containers: - name: app image: nginx ports: - containerPort: 80 - name: admin-tool image: busybox command: ["sh", "-c", "sleep 1h"] securityContext: runAsUser: 0 # Runs as root, overriding pod-level setting -
Purpose: Overrides the pod-level
-
runAsGroup:
-
Purpose: Overrides the pod-level
runAsGroupfor the container’s primary group ID. - Use Case: A container needs a different primary group for specific access requirements.
- Example: A container accessing a volume with a unique group.
apiVersion: v1 kind: Pod metadata: name: custom-group-pod spec: securityContext: runAsGroup: 3000 containers: - name: app image: busybox command: ["sh", "-c", "sleep 1h"] securityContext: runAsGroup: 4000 # Overrides pod-level runAsGroup -
Purpose: Overrides the pod-level
-
runAsNonRoot:
- Purpose: Enforces non-root execution for the specific container, overriding pod-level settings.
- Use Case: Ensure a specific container adheres to non-root policies, even if the pod allows root.
- Example: A sidecar container must run non-root for security.
apiVersion: v1 kind: Pod metadata: name: non-root-sidecar-pod spec: containers: - name: app image: nginx - name: sidecar image: busybox command: ["sh", "-c", "sleep 1h"] securityContext: runAsNonRoot: true -
capabilities:
- Purpose: Adds or drops Linux capabilities for the container.
-
Use Case: Grant specific privileges (e.g.,
NET_ADMIN) without full root access. - Example: A container needs to manage network interfaces.
apiVersion: v1 kind: Pod metadata: name: network-admin-pod spec: containers: - name: network-tool image: busybox command: ["sh", "-c", "sleep 1h"] securityContext: capabilities: add: ["NET_ADMIN"] # Grants network administration privileges drop: ["ALL"] # Drops all other capabilities -
privileged:
-
Purpose: Runs the container in privileged mode, granting full root privileges, similar to Docker’s
--privilegedflag. - Use Case: Rare cases where a container needs unrestricted access (e.g., running a system utility).
- Example: A container running a system diagnostic tool.
apiVersion: v1 kind: Pod metadata: name: privileged-pod spec: containers: - name: diagnostic-tool image: busybox command: ["sh", "-c", "sleep 1h"] securityContext: privileged: true # Full root privileges -
Purpose: Runs the container in privileged mode, granting full root privileges, similar to Docker’s
-
allowPrivilegeEscalation:
-
Purpose: Controls whether a process can gain more privileges than its parent (e.g., via
setuidbinaries). Set tofalseto prevent escalation. - Use Case: Prevent containers from escalating privileges in sensitive environments.
- Example: A container running untrusted code.
apiVersion: v1 kind: Pod metadata: name: no-escalation-pod spec: containers: - name: app image: busybox command: ["sh", "-c", "sleep 1h"] securityContext: allowPrivilegeEscalation: false # Prevents privilege escalation -
Purpose: Controls whether a process can gain more privileges than its parent (e.g., via
-
readOnlyRootFilesystem:
- Purpose: Mounts the container’s root filesystem as read-only, preventing modifications.
- Use Case: Enhance security by ensuring the container cannot alter its filesystem.
- Example: A stateless application container.
apiVersion: v1 kind: Pod metadata: name: readonly-pod spec: containers: - name: app image: nginx securityContext: readOnlyRootFilesystem: true # Root filesystem is read-only -
seccompProfile:
- Purpose: Specifies a Seccomp profile to filter system calls, enhancing security.
-
Options:
-
RuntimeDefault: Uses the container runtime’s default profile. -
Unconfined: No Seccomp filtering. -
Localhost: Uses a custom profile from the node.
-
- Use Case: Restrict dangerous system calls in a container.
- Example: A container with a default Seccomp profile.
apiVersion: v1 kind: Pod metadata: name: seccomp-pod spec: containers: - name: app image: busybox securityContext: seccompProfile: type: RuntimeDefault # Apply default Seccomp profile -
appArmorProfile:
- Purpose: Applies an AppArmor profile to restrict the container’s capabilities.
-
Options:
RuntimeDefault,Unconfined, orLocalhostwith a profile name. - Use Case: Restrict a container’s access in an AppArmor-enabled environment.
- Example: A container with a custom AppArmor profile.
apiVersion: v1 kind: Pod metadata: name: apparmor-pod spec: containers: - name: app image: busybox securityContext: appArmorProfile: type: Localhost localhostProfile: k8s-apparmor-example-deny-write -
seLinuxOptions:
- Purpose: Overrides pod-level SELinux labels for the container.
- Use Case: Apply a specific SELinux label to a container in an SELinux-enabled cluster.
- Example: A container requiring a unique SELinux label.
apiVersion: v1 kind: Pod metadata: name: selinux-container-pod spec: containers: - name: app image: busybox securityContext: seLinuxOptions: level: "s0:c789,c012" -
procMount:
-
Purpose: Overrides pod-level
procMountsettings for the container. -
Use Case: A specific container needs an unmasked
/procfor nested container runtimes. - Example: A container running a nested Kubernetes cluster.
apiVersion: v1 kind: Pod metadata: name: nested-k8s-pod spec: containers: - name: k8s image: kindest/node securityContext: procMount: Unmasked # Full /proc access -
Purpose: Overrides pod-level
Real-Life Example for Container-Level SecurityContext
Scenario: A pod runs a web application (Nginx) and a monitoring tool requiring specific privileges (e.g., NET_ADMIN for network diagnostics).
apiVersion: v1
kind: Pod
metadata:
name: web-monitor-pod
spec:
securityContext:
runAsUser: 1000
runAsNonRoot: true
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
- name: monitor
image: busybox
command: ["sh", "-c", "sleep 1h"]
securityContext:
runAsUser: 2000 # Override pod-level runAsUser
capabilities:
add: ["NET_ADMIN"] # Grant network privileges
allowPrivilegeEscalation: false # Prevent escalation
Explanation:
- The pod-level
runAsUser: 1000applies to the Nginx container. - The
monitorcontainer overrides this withrunAsUser: 2000and addsNET_ADMINfor diagnostics. -
allowPrivilegeEscalation: falseensures the monitor cannot gain additional privileges.
4. Privileged Mode
Privileged mode (privileged: true) grants a container full root privileges, equivalent to Docker’s --privileged flag. It bypasses most security restrictions, giving the container access to the host’s resources.
When to Use Privileged Mode
-
Use Case: Rare scenarios requiring unrestricted access, such as:
- Running system utilities (e.g., kernel debugging tools).
- Nested container runtimes (e.g., Docker-in-Docker).
- Hardware access (e.g., GPU drivers).
- Risks: Highly insecure, as it allows the container to affect the host system. Avoid unless absolutely necessary.
Example of Privileged Mode
Scenario: A pod running a Docker-in-Docker (DinD) setup for a CI/CD pipeline.
apiVersion: v1
kind: Pod
metadata:
name: dind-pod
spec:
containers:
- name: docker
image: docker:dind
securityContext:
privileged: true # Full root privileges
command: ["dockerd"]
Explanation:
- The
docker:dindimage requires privileged mode to run the Docker daemon, which needs access to the host’s kernel and devices. - This setup is common in CI/CD pipelines (e.g., Jenkins) but should be tightly controlled due to security risks.
5. Pod-Level vs. Container-Level SecurityContext: Differences
| Aspect | Pod-Level SecurityContext | Container-Level SecurityContext |
|---|---|---|
| Scope | Applies to all containers in the pod and volumes. | Applies only to the specific container. |
| Fields Available | Includes fsGroup, supplementalGroups, seLinuxOptions, fsGroupChangePolicy, supplementalGroupsPolicy, procMount. |
Includes capabilities, privileged, readOnlyRootFilesystem, seccompProfile, appArmorProfile, and overrides for runAsUser, runAsGroup, runAsNonRoot, seLinuxOptions, procMount. |
| Volume Impact | Affects volume ownership and permissions (fsGroup, seLinuxOptions). |
Does not affect volumes. |
| Override Behavior | Provides default settings for all containers. | Overrides pod-level settings for the container. |
| Use Case | Set baseline security for all containers and volumes (e.g., shared volume permissions). | Customize security for a specific container (e.g., add capabilities or run as root). |
Example of Pod vs. Container-Level Interaction:
apiVersion: v1
kind: Pod
metadata:
name: mixed-security-pod
spec:
securityContext:
runAsUser: 1000
runAsGroup: 3000
fsGroup: 2000
containers:
- name: app
image: nginx
- name: privileged-tool
image: busybox
securityContext:
runAsUser: 0 # Override to run as root
privileged: true # Full privileges
capabilities:
add: ["SYS_ADMIN"]
Explanation:
- The
appcontainer uses the pod-level settings (runAsUser: 1000,runAsGroup: 3000). - The
privileged-toolcontainer overrides these withrunAsUser: 0and runs in privileged mode with additional capabilities. - The
fsGroup: 2000applies to any shared volumes, unaffected by container-level settings.
6. When to Use Pod-Level vs. Container-Level SecurityContext
-
Use Pod-Level SecurityContext:
- When all containers in the pod share common security settings (e.g., non-root execution, volume ownership).
- For volume-related settings (
fsGroup,seLinuxOptions) that apply across containers. - Example: A pod with multiple containers sharing a volume, requiring consistent user and group settings.
-
Use Container-Level SecurityContext:
- When a specific container needs different settings (e.g., one container needs
NET_ADMINor root privileges). - For container-specific restrictions like
readOnlyRootFilesystemorseccompProfile. - Example: A pod where one container runs a privileged task while others are restricted.
- When a specific container needs different settings (e.g., one container needs
7. Best Practices and Real-Life Considerations
-
Minimize Privileges:
- Avoid
privileged: trueunless absolutely necessary. - Use
runAsNonRoot: trueand drop unnecessary capabilities.
- Avoid
-
Use Read-Only Filesystems:
- Set
readOnlyRootFilesystem: truefor containers that don’t need to write to their filesystem.
- Set
-
Optimize Volume Permissions:
- Use
fsGroupChangePolicy: OnRootMismatchfor large volumes to reduce startup time. - Use
supplementalGroupsPolicy: Strictto avoid unintended group memberships.
- Use
-
Leverage Seccomp and AppArmor:
- Apply
seccompProfile: RuntimeDefaultand AppArmor profiles for additional security layers.
- Apply
-
SELinux in Secure Environments:
- Use
seLinuxOptionsandseLinuxChangePolicy: Recursivein SELinux-enabled clusters for fine-grained control.
- Use
-
Monitor and Audit:
- Use tools like
kubectl describe podand metrics (e.g.,selinux_warning_controller_selinux_volume_conflict) to detect misconfigurations.
- Use tools like
8. Conclusion
Pod-level SecurityContext is ideal for setting baseline security policies and managing volume permissions across all containers in a pod. Container-level SecurityContext allows fine-grained customization for individual containers, overriding pod-level settings when needed. Privileged mode should be used sparingly due to its security risks.
Top comments (1)
This is a detailed breakdown of SecurityContext in Kubernetes. Are there any typical scenarios where combining both pod-level and container-level security contexts becomes necessary, or is it generally better to stick to one?