omkar shelke

Posted on May 25

SecurityContext in Kubernetes

#kubernetes #security #containers #cybersecurity

1. Introduction to SecurityContext in Kubernetes

A SecurityContext in Kubernetes defines privilege and access control settings for pods or containers, allowing you to control how processes run, access resources, and interact with the system. It is a critical component for securing Kubernetes workloads by enforcing least-privilege principles.

Pod-Level SecurityContext: Applies security settings to all containers in a pod and can affect the pod’s volumes. It’s defined under spec.securityContext.
Container-Level SecurityContext: Applies to a specific container and can override pod-level settings for that container. It’s defined under spec.containers[].securityContext.

The key difference is scope:

Pod-level settings provide a baseline for all containers and volumes in the pod.
Container-level settings allow fine-grained customization for individual containers, overriding pod-level settings where applicable.

2. Pod-Level SecurityContext

The pod-level securityContext is defined in the pod’s spec and applies to all containers in the pod unless overridden by a container-level securityContext. It also applies to certain volume-related settings (e.g., fsGroup and seLinuxOptions).

Fields in Pod-Level SecurityContext

Here’s a comprehensive list of fields available at the pod level, their purpose, and examples:

runAsUser:
- Purpose: Specifies the user ID (UID) for all containers’ processes in the pod.
- Use Case: Ensures containers don’t run as root, reducing the risk of privilege escalation.
- Example: A web server pod where all containers should run as a non-root user for security.
```
 apiVersion: v1
 kind: Pod
 metadata:
   name: web-server-pod
 spec:
   securityContext:
     runAsUser: 1000  # All containers run as UID 1000
   containers:
   - name: nginx
     image: nginx
     ports:
     - containerPort: 80
```
In this example, a web server (e.g., Nginx) runs as UID 1000, preventing root-level access even if the container is compromised.

runAsGroup:

Purpose: Sets the primary group ID (GID) for all containers’ processes.
Use Case: Controls group ownership for files created by containers, useful for shared volumes.
Example: A pod with a shared volume where files need consistent group ownership.

 apiVersion: v1
 kind: Pod
 metadata:
   name: shared-volume-pod
 spec:
   securityContext:
     runAsUser: 1000
     runAsGroup: 3000  # Primary group ID for processes
   volumes:
   - name: shared-data
     emptyDir: {}
   containers:
   - name: app
     image: busybox
     command: ["sh", "-c", "echo hello > /data/testfile && sleep 1h"]
     volumeMounts:
     - name: shared-data
       mountPath: /data

Files created in the /data volume will be owned by GID 3000, ensuring consistent group access.

runAsNonRoot:
- Purpose: Ensures all containers run as a non-root user (UID ≠ 0). If set to true, Kubernetes rejects the pod if any container tries to run as root.
- Use Case: Enforce a policy where no container in the pod can run as root.
- Example: A corporate policy requires all pods to run non-root for compliance.
```
 apiVersion: v1
 kind: Pod
 metadata:
   name: non-root-pod
 spec:
   securityContext:
     runAsNonRoot: true  # Enforces non-root user
   containers:
   - name: app
     image: nginx
     ports:
     - containerPort: 80
```
If the container tries to run as root, the pod will fail to start.

fsGroup:

Purpose: Sets the group ID for volume ownership and permissions. Kubernetes applies this GID to volumes that support ownership management (e.g., emptyDir, persistentVolumeClaim).
Use Case: Ensures files in a shared volume are accessible by a specific group, such as in a multi-container pod.
Example: A pod with a shared volume for a data processing application.

 apiVersion: v1
 kind: Pod
 metadata:
   name: data-processing-pod
 spec:
   securityContext:
     runAsUser: 1000
     fsGroup: 2000  # Volume files owned by GID 2000
   volumes:
   - name: data-vol
     emptyDir: {}
   containers:
   - name: processor
     image: busybox
     command: ["sh", "-c", "echo data > /data/output && sleep 1h"]
     volumeMounts:
     - name: data-vol
       mountPath: /data

Files in /data will be owned by GID 2000, ensuring group-level access control.

supplementalGroups:
- Purpose: Adds additional group IDs to container processes, beyond the primary runAsGroup.
- Use Case: Grants access to resources owned by multiple groups, such as shared storage.
- Example: A pod accessing multiple shared volumes with different group ownerships.
```
 apiVersion: v1
 kind: Pod
 metadata:
   name: multi-group-pod
 spec:
   securityContext:
     runAsUser: 1000
     runAsGroup: 3000
     supplementalGroups: [4000, 5000]  # Additional group memberships
   containers:
   - name: app
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
```
Processes in the container belong to GIDs 3000, 4000, and 5000, allowing access to resources owned by these groups.
supplementalGroupsPolicy (Kubernetes v1.33+, beta):
- Purpose: Controls how supplementary groups are calculated. Options are:
  - Merge: Merges groups from the container image’s /etc/group with fsGroup and supplementalGroups.
  - Strict: Only uses groups specified in fsGroup, supplementalGroups, or runAsGroup, ignoring /etc/group.
- Use Case: Avoid unintended group memberships from the container image for stricter security.
- Example: A pod requiring strict group control for compliance.
```
 apiVersion: v1
 kind: Pod
 metadata:
   name: strict-groups-pod
 spec:
   securityContext:
     runAsUser: 1000
     runAsGroup: 3000
     supplementalGroups: [4000]
     supplementalGroupsPolicy: Strict  # Only specified groups are used
   containers:
   - name: app
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
```
The container process will only have GIDs 3000 and 4000, ignoring any groups defined in the image’s /etc/group.

fsGroupChangePolicy:

Purpose: Controls how Kubernetes changes ownership and permissions for volumes. Options are:
- OnRootMismatch: Only changes permissions if the volume’s root directory doesn’t match the expected fsGroup.
- Always: Always changes permissions when the volume is mounted.
Use Case: Optimize pod startup time for large volumes by reducing unnecessary permission changes.
Example: A pod with a large persistent volume.

 apiVersion: v1
 kind: Pod
 metadata:
   name: large-volume-pod
 spec:
   securityContext:
     runAsUser: 1000
     fsGroup: 2000
     fsGroupChangePolicy: OnRootMismatch  # Optimize permission changes
   volumes:
   - name: data
     persistentVolumeClaim:
       claimName: data-pvc
   containers:
   - name: app
     image: busybox
     volumeMounts:
     - name: data
       mountPath: /data

This reduces startup time by only changing permissions when necessary.

seLinuxOptions:
- Purpose: Assigns SELinux labels to containers and volumes for access control.
- Use Case: Enforce mandatory access control in environments with SELinux enabled (e.g., Red Hat systems).
- Example: A pod running in an SELinux-enabled cluster.
```
 apiVersion: v1
 kind: Pod
 metadata:
   name: selinux-pod
 spec:
   securityContext:
     seLinuxOptions:
       level: "s0:c123,c456"  # SELinux label for processes and volumes
   containers:
   - name: app
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
```
All containers and volumes use the specified SELinux label, ensuring compliance with SELinux policies.
seLinuxChangePolicy (Kubernetes v1.33+, beta):
- Purpose: Controls SELinux relabeling behavior. Options are:
  - MountOption: Uses mount options for faster relabeling (requires SELinuxMount feature gate).
  - Recursive: Recursively relabels all files in the volume.
- Use Case: Optimize SELinux relabeling for performance or allow multiple pods with different labels to share a volume.
- Example: A pod opting out of mount-based relabeling for compatibility.
```
 apiVersion: v1
 kind: Pod
 metadata:
   name: selinux-recursive-pod
 spec:
   securityContext:
     seLinuxOptions:
       level: "s0:c123,c456"
     seLinuxChangePolicy: Recursive  # Recursive relabeling
   containers:
   - name: app
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
```
This ensures recursive relabeling, allowing multiple pods with different SELinux labels to share a volume.
procMount (Kubernetes v1.33+, beta):
- Purpose: Controls the /proc filesystem’s mount behavior. Options are:
  - Default: Masks certain /proc paths (e.g., /proc/kcore) and makes others read-only.
  - Unmasked: Exposes all /proc paths, useful for nested container runtimes.
- Use Case: Running containers within containers (e.g., Docker-in-Docker).
- Example: A pod running a CI/CD pipeline with nested containers.
```
  apiVersion: v1
  kind: Pod
  metadata:
    name: dind-pod
  spec:
    securityContext:
      procMount: Unmasked  # Expose full /proc
    hostUsers: false  # Required for Unmasked
    containers:
    - name: docker
      image: docker:dind
      command: ["dockerd"]
```
This allows the Docker daemon to access the full /proc filesystem for container management.

Real-Life Example for Pod-Level SecurityContext

Scenario: A company runs a microservices application with multiple pods, each containing multiple containers (e.g., an app and a logging sidecar). To comply with security policies, all containers must run as non-root, and shared volumes must be accessible by a specific group.

apiVersion: v1
kind: Pod
metadata:
  name: microservice-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
    runAsNonRoot: true
  volumes:
  - name: logs
    emptyDir: {}
  containers:
  - name: app
    image: my-app:1.0
    volumeMounts:
    - name: logs
      mountPath: /logs
  - name: log-collector
    image: fluentd
    volumeMounts:
    - name: logs
      mountPath: /logs

Explanation:

All containers run as UID 1000 and GID 3000.
The logs volume is owned by GID 2000 (fsGroup), ensuring both containers can write to it.
runAsNonRoot: true enforces non-root execution, aligning with compliance requirements.

3. Container-Level SecurityContext

The container-level securityContext is defined under spec.containers[].securityContext and applies only to the specific container. It can override pod-level settings for that container but doesn’t affect volumes.

Fields in Container-Level SecurityContext

Here’s a comprehensive list of fields available at the container level:

runAsUser:

Purpose: Overrides the pod-level runAsUser for the specific container.
Use Case: A specific container needs to run as a different user (e.g., root for administrative tasks).
Example: A pod with a sidecar requiring root privileges.

 apiVersion: v1
 kind: Pod
 metadata:
   name: mixed-user-pod
 spec:
   securityContext:
     runAsUser: 1000
   containers:
   - name: app
     image: nginx
     ports:
     - containerPort: 80
   - name: admin-tool
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
     securityContext:
       runAsUser: 0  # Runs as root, overriding pod-level setting

runAsGroup:

Purpose: Overrides the pod-level runAsGroup for the container’s primary group ID.
Use Case: A container needs a different primary group for specific access requirements.
Example: A container accessing a volume with a unique group.

 apiVersion: v1
 kind: Pod
 metadata:
   name: custom-group-pod
 spec:
   securityContext:
     runAsGroup: 3000
   containers:
   - name: app
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
     securityContext:
       runAsGroup: 4000  # Overrides pod-level runAsGroup

runAsNonRoot:

Purpose: Enforces non-root execution for the specific container, overriding pod-level settings.
Use Case: Ensure a specific container adheres to non-root policies, even if the pod allows root.
Example: A sidecar container must run non-root for security.

 apiVersion: v1
 kind: Pod
 metadata:
   name: non-root-sidecar-pod
 spec:
   containers:
   - name: app
     image: nginx
   - name: sidecar
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
     securityContext:
       runAsNonRoot: true

capabilities:

Purpose: Adds or drops Linux capabilities for the container.
Use Case: Grant specific privileges (e.g., NET_ADMIN) without full root access.
Example: A container needs to manage network interfaces.

 apiVersion: v1
 kind: Pod
 metadata:
   name: network-admin-pod
 spec:
   containers:
   - name: network-tool
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
     securityContext:
       capabilities:
         add: ["NET_ADMIN"]  # Grants network administration privileges
         drop: ["ALL"]  # Drops all other capabilities

privileged:

Purpose: Runs the container in privileged mode, granting full root privileges, similar to Docker’s --privileged flag.
Use Case: Rare cases where a container needs unrestricted access (e.g., running a system utility).
Example: A container running a system diagnostic tool.

 apiVersion: v1
 kind: Pod
 metadata:
   name: privileged-pod
 spec:
   containers:
   - name: diagnostic-tool
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
     securityContext:
       privileged: true  # Full root privileges

allowPrivilegeEscalation:

Purpose: Controls whether a process can gain more privileges than its parent (e.g., via setuid binaries). Set to false to prevent escalation.
Use Case: Prevent containers from escalating privileges in sensitive environments.
Example: A container running untrusted code.

 apiVersion: v1
 kind: Pod
 metadata:
   name: no-escalation-pod
 spec:
   containers:
   - name: app
     image: busybox
     command: ["sh", "-c", "sleep 1h"]
     securityContext:
       allowPrivilegeEscalation: false  # Prevents privilege escalation

readOnlyRootFilesystem:

Purpose: Mounts the container’s root filesystem as read-only, preventing modifications.
Use Case: Enhance security by ensuring the container cannot alter its filesystem.
Example: A stateless application container.

 apiVersion: v1
 kind: Pod
 metadata:
   name: readonly-pod
 spec:
   containers:
   - name: app
     image: nginx
     securityContext:
       readOnlyRootFilesystem: true  # Root filesystem is read-only

seccompProfile:
- Purpose: Specifies a Seccomp profile to filter system calls, enhancing security.
- Options:
  - RuntimeDefault: Uses the container runtime’s default profile.
  - Unconfined: No Seccomp filtering.
  - Localhost: Uses a custom profile from the node.
- Use Case: Restrict dangerous system calls in a container.
- Example: A container with a default Seccomp profile.
```
 apiVersion: v1
 kind: Pod
 metadata:
   name: seccomp-pod
 spec:
   containers:
   - name: app
     image: busybox
     securityContext:
       seccompProfile:
         type: RuntimeDefault  # Apply default Seccomp profile
```

appArmorProfile:

Purpose: Applies an AppArmor profile to restrict the container’s capabilities.
Options: RuntimeDefault, Unconfined, or Localhost with a profile name.
Use Case: Restrict a container’s access in an AppArmor-enabled environment.
Example: A container with a custom AppArmor profile.

 apiVersion: v1
 kind: Pod
 metadata:
   name: apparmor-pod
 spec:
   containers:
   - name: app
     image: busybox
     securityContext:
       appArmorProfile:
         type: Localhost
         localhostProfile: k8s-apparmor-example-deny-write

seLinuxOptions:

Purpose: Overrides pod-level SELinux labels for the container.
Use Case: Apply a specific SELinux label to a container in an SELinux-enabled cluster.
Example: A container requiring a unique SELinux label.

  apiVersion: v1
  kind: Pod
  metadata:
    name: selinux-container-pod
  spec:
    containers:
    - name: app
      image: busybox
      securityContext:
        seLinuxOptions:
          level: "s0:c789,c012"

procMount:

Purpose: Overrides pod-level procMount settings for the container.
Use Case: A specific container needs an unmasked /proc for nested container runtimes.
Example: A container running a nested Kubernetes cluster.

  apiVersion: v1
  kind: Pod
  metadata:
    name: nested-k8s-pod
  spec:
    containers:
    - name: k8s
      image: kindest/node
      securityContext:
        procMount: Unmasked  # Full /proc access

Real-Life Example for Container-Level SecurityContext

Scenario: A pod runs a web application (Nginx) and a monitoring tool requiring specific privileges (e.g., NET_ADMIN for network diagnostics).

apiVersion: v1
kind: Pod
metadata:
  name: web-monitor-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsNonRoot: true
  containers:
  - name: nginx
    image: nginx
    ports:
    - containerPort: 80
  - name: monitor
    image: busybox
    command: ["sh", "-c", "sleep 1h"]
    securityContext:
      runAsUser: 2000  # Override pod-level runAsUser
      capabilities:
        add: ["NET_ADMIN"]  # Grant network privileges
      allowPrivilegeEscalation: false  # Prevent escalation

Explanation:

The pod-level runAsUser: 1000 applies to the Nginx container.
The monitor container overrides this with runAsUser: 2000 and adds NET_ADMIN for diagnostics.
allowPrivilegeEscalation: false ensures the monitor cannot gain additional privileges.

4. Privileged Mode

Privileged mode (privileged: true) grants a container full root privileges, equivalent to Docker’s --privileged flag. It bypasses most security restrictions, giving the container access to the host’s resources.

When to Use Privileged Mode

Use Case: Rare scenarios requiring unrestricted access, such as:
- Running system utilities (e.g., kernel debugging tools).
- Nested container runtimes (e.g., Docker-in-Docker).
- Hardware access (e.g., GPU drivers).
Risks: Highly insecure, as it allows the container to affect the host system. Avoid unless absolutely necessary.

Example of Privileged Mode

Scenario: A pod running a Docker-in-Docker (DinD) setup for a CI/CD pipeline.

apiVersion: v1
kind: Pod
metadata:
  name: dind-pod
spec:
  containers:
  - name: docker
    image: docker:dind
    securityContext:
      privileged: true  # Full root privileges
    command: ["dockerd"]

Explanation:

The docker:dind image requires privileged mode to run the Docker daemon, which needs access to the host’s kernel and devices.
This setup is common in CI/CD pipelines (e.g., Jenkins) but should be tightly controlled due to security risks.

5. Pod-Level vs. Container-Level SecurityContext: Differences

Aspect	Pod-Level SecurityContext	Container-Level SecurityContext
Scope	Applies to all containers in the pod and volumes.	Applies only to the specific container.
Fields Available	Includes `fsGroup`, `supplementalGroups`, `seLinuxOptions`, `fsGroupChangePolicy`, `supplementalGroupsPolicy`, `procMount`.	Includes `capabilities`, `privileged`, `readOnlyRootFilesystem`, `seccompProfile`, `appArmorProfile`, and overrides for `runAsUser`, `runAsGroup`, `runAsNonRoot`, `seLinuxOptions`, `procMount`.
Volume Impact	Affects volume ownership and permissions (`fsGroup`, `seLinuxOptions`).	Does not affect volumes.
Override Behavior	Provides default settings for all containers.	Overrides pod-level settings for the container.
Use Case	Set baseline security for all containers and volumes (e.g., shared volume permissions).	Customize security for a specific container (e.g., add capabilities or run as root).

Example of Pod vs. Container-Level Interaction:

apiVersion: v1
kind: Pod
metadata:
  name: mixed-security-pod
spec:
  securityContext:
    runAsUser: 1000
    runAsGroup: 3000
    fsGroup: 2000
  containers:
  - name: app
    image: nginx
  - name: privileged-tool
    image: busybox
    securityContext:
      runAsUser: 0  # Override to run as root
      privileged: true  # Full privileges
      capabilities:
        add: ["SYS_ADMIN"]

Explanation:

The app container uses the pod-level settings (runAsUser: 1000, runAsGroup: 3000).
The privileged-tool container overrides these with runAsUser: 0 and runs in privileged mode with additional capabilities.
The fsGroup: 2000 applies to any shared volumes, unaffected by container-level settings.

6. When to Use Pod-Level vs. Container-Level SecurityContext

Use Pod-Level SecurityContext:
- When all containers in the pod share common security settings (e.g., non-root execution, volume ownership).
- For volume-related settings (fsGroup, seLinuxOptions) that apply across containers.
- Example: A pod with multiple containers sharing a volume, requiring consistent user and group settings.
Use Container-Level SecurityContext:
- When a specific container needs different settings (e.g., one container needs NET_ADMIN or root privileges).
- For container-specific restrictions like readOnlyRootFilesystem or seccompProfile.
- Example: A pod where one container runs a privileged task while others are restricted.

7. Best Practices and Real-Life Considerations

Minimize Privileges:
- Avoid privileged: true unless absolutely necessary.
- Use runAsNonRoot: true and drop unnecessary capabilities.
Use Read-Only Filesystems:
- Set readOnlyRootFilesystem: true for containers that don’t need to write to their filesystem.
Optimize Volume Permissions:
- Use fsGroupChangePolicy: OnRootMismatch for large volumes to reduce startup time.
- Use supplementalGroupsPolicy: Strict to avoid unintended group memberships.
Leverage Seccomp and AppArmor:
- Apply seccompProfile: RuntimeDefault and AppArmor profiles for additional security layers.
SELinux in Secure Environments:
- Use seLinuxOptions and seLinuxChangePolicy: Recursive in SELinux-enabled clusters for fine-grained control.
Monitor and Audit:
- Use tools like kubectl describe pod and metrics (e.g., selinux_warning_controller_selinux_volume_conflict) to detect misconfigurations.

8. Conclusion

Pod-level SecurityContext is ideal for setting baseline security policies and managing volume permissions across all containers in a pod. Container-level SecurityContext allows fine-grained customization for individual containers, overriding pod-level settings when needed. Privileged mode should be used sparingly due to its security risks.

Top comments (1)

Fraser Young • Jun 2

This is a detailed breakdown of SecurityContext in Kubernetes. Are there any typical scenarios where combining both pod-level and container-level security contexts becomes necessary, or is it generally better to stick to one?