Table of contents
- Pod Security Standards
- Built-in Pod Security admission enforcement
- Alternative Option use pod security admission web-hook independently
- Add Pod Security Label to Namespace level
- as per your pod security label base on namespace level you will get waning , audit log
- PodTemplate Resources
- Templated pod resources include:
- Lets Learn about Pod Security Standards with Baseline - Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
- Lets Learn about Pod Security Standards with Restricted - Heavily restricted policy, following current Pod hardening best practices.
- disallow-capabilities-strict - Pod Security Standards (Restricted)
- disallow privilege escalation - Pod Security Standards (Restricted)
- require run as non root user - Pod Security Standards (Restricted)
PSP ( PodSecurityPolicy ) is dead . Replace PodSecurityPolicy with a new built-in admission controller that enforces the Pod Security Standards. read more
Pod Security Standards
The Pod Security standard divided into three security spectrums as following :
Privilege: Unrestricted policy, providing the widest possible level of permissions. This policy allows for known privilege escalations.
Baseline: Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
Restricted: Heavily restricted policy, following current Pod hardening best practices.
Built-in Pod Security admission enforcement
kubectl exec -it kube-apiserver-controlplane -n kube-system -- kube-apiserver -h | grep enable-admission-plugins
syntax:
kubectl exec -it kube-apiserver-<master-node> -n kube-system -- kube-apiserver -h | grep enable-admission-plugins
here you can check all enabled admission plugins
--admission-control strings Admission is divided into two phases. In the first phase, only mutating admission plugins run. In the second phase, only validating admission plugins run. The names in the below list may represent a validating plugin, a mutating plugin, or both. The order of plugins in which they are passed to this flag does not matter. Comma-delimited list of: AlwaysAdmit, AlwaysDeny, AlwaysPullImages, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, DefaultStorageClass, DefaultTolerationSeconds, DenyServiceExternalIPs, EventRateLimit, ExtendedResourceToleration, ImagePolicyWebhook, LimitPodHardAntiAffinityTopology, LimitRanger, MutatingAdmissionWebhook, NamespaceAutoProvision, NamespaceExists, NamespaceLifecycle, NodeRestriction, OwnerReferencesPermissionEnforcement, PersistentVolumeClaimResize, PersistentVolumeLabel, PodNodeSelector, PodSecurity, PodSecurityPolicy, PodTolerationRestriction, Priority, ResourceQuota, RuntimeClass, SecurityContextDeny, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionWebhook. (DEPRECATED: Use --enable-admission-plugins or --disable-admission-plugins instead. Will be removed in a future version.)
--enable-admission-plugins strings admission plugins that should be enabled in addition to default enabled ones (NamespaceLifecycle, LimitRanger, ServiceAccount, TaintNodesByCondition, PodSecurity, Priority, DefaultTolerationSeconds, DefaultStorageClass, StorageObjectInUseProtection, PersistentVolumeClaimResize, RuntimeClass, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, MutatingAdmissionWebhook, ValidatingAdmissionWebhook, ResourceQuota). Comma-delimited list of admission plugins: AlwaysAdmit, AlwaysDeny, AlwaysPullImages, CertificateApproval, CertificateSigning, CertificateSubjectRestriction, DefaultIngressClass, DefaultStorageClass, DefaultTolerationSeconds, DenyServiceExternalIPs, EventRateLimit, ExtendedResourceToleration, ImagePolicyWebhook, LimitPodHardAntiAffinityTopology, LimitRanger, MutatingAdmissionWebhook, NamespaceAutoProvision, NamespaceExists, NamespaceLifecycle, NodeRestriction, OwnerReferencesPermissionEnforcement, PersistentVolumeClaimResize, PersistentVolumeLabel, PodNodeSelector, PodSecurity, PodSecurityPolicy, PodTolerationRestriction, Priority, ResourceQuota, RuntimeClass, SecurityContextDeny, ServiceAccount, StorageObjectInUseProtection, TaintNodesByCondition, ValidatingAdmissionWebhook.
Alternative Option use pod security admission web-hook independently
git clone https://github.com/kubernetes/pod-security-admission.git
cd pod-security-admission/webhook
make certs
kubectl apply -k .
Add Pod Security Label to Namespace level
apiVersion: v1
kind: Namespace
metadata:
name: my-baseline-namespace
labels:
pod-security.kubernetes.io/enforce: baseline
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: baseline
pod-security.kubernetes.io/warn-version: latest
# We are setting these to our _desired_ `enforce` <span class='kc-markdown-code-copy'></span> level.
pod-security.kubernetes.io/audit: baseline
pod-security.kubernetes.io/audit-version: latest
---
apiVersion: v1
kind: Namespace
metadata:
name: my-restricted-namespace
labels:
pod-security.kubernetes.io/enforce: restricted
pod-security.kubernetes.io/enforce-version: latest
pod-security.kubernetes.io/warn: restricted
pod-security.kubernetes.io/warn-version: latest
pod-security.kubernetes.io/audit: restricted
pod-security.kubernetes.io/audit-version: latest
EOF
as per your pod security label base on namespace level you will get waning , audit log
kubectl apply -f resource.yaml --namespace=my-baseline-namespace
PodTemplate Resources
Audit and Warn modes are also checked on resource types that embed a PodTemplate (enumerated below), but enforce mode only applies to actual pod resources.
Since users do not create pods directly in the typical deployment model, the warning mechanism is only effective if it can also warn on templated pod resources. Similarly, for audit it is useful to tie the audited violation back to the requesting user, so audit will also apply to templated pod resources. In the interest of supporting mutating admission controllers, policies will only be enforced on actual pods.
Templated pod resources include:
v1 ReplicationController
v1 PodTemplate
apps/v1 ReplicaSet
apps/v1 Deployment
apps/v1 StatefulSet
apps/v1 DaemonSet
batch/v1 CronJob
batch/v1 Job
Lets Learn about Pod Security Standards with Baseline - Minimally restrictive policy which prevents known privilege escalations. Allows the default (minimally specified) Pod configuration.
disallow capabilities
Adding capabilities beyond those listed in the policy must be disallowed. validate
Any capabilities added beyond the allowed list (AUDIT_WRITE, CHOWN, DAC_OVERRIDE, FOWNER, FSETID, KILL, MKNOD, NET_BIND_SERVICE, SETFCAP, SETGID, SETPCAP, SETUID, SYS_CHROOT are disallowed.
- AUDIT_WRITE - CHOWN - DAC_OVERRIDE - FOWNER - FSETID - KILL - MKNOD - NET_BIND_SERVICE - SETFCAP - SETGID - SETPCAP - SETUID - SYS_CHROOT
NET_RAW
is a default permissive setting in Kubernetes. It’s there to allow ICMP traffic between containers. But in addition to ICMP traffic, this capability grants an application the ability to craft raw packets (like ARP and DNS), so there's a lot of freedom for an attacker to play with network related attacks.
apiVersion: v1
kind: Pod
metadata:
name: badpod01
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
capabilities:
add:
- NET_RAW
One or more containers do not have resource limits - this could starve other processes
CAP_NET_RAW
* Use RAW and PACKET sockets;
* bind to any address for transparent proxying.
apiVersion: v1
kind: Pod
metadata:
name: badpod02
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
capabilities:
add:
- NET_RAW
- SETGID
You should run your container with privilege escalation turned off to prevent escalating privileges using setuid or setgid binaries.
apiVersion: v1
kind: Pod
metadata:
name: badpod06
spec:
initContainers:
- name: initcontainer01
image: dummyimagename
securityContext:
capabilities:
add:
- NET_RAW
containers:
- name: container01
image: dummyimagename
securityContext:
capabilities:
add:
- SYS_ADMIN
CAP_SYS_ADMIN is the most privileged capability and should always be avoided
Capabilities permit certain named root actions without giving full root access. They are a more fine-grained permissions model, and all capabilities should be dropped from a pod, with only those required added back.
There are a large number of capabilities, with CAP_SYS_ADMIN bounding most. Never enable this capability - it’s equivalent to root and should always be avoided.
disallow-host-namespaces
- Host namespaces (Process ID namespace, Inter-Process Communication namespace, an network namespace) allow access to shared information and can be used to elevate privileges. Pods should not be allowed access to host namespaces. This policy ensures
fields which make use of these host namespaces are unset or set to
false
. validate:
Sharing the host namespaces is disallowed. The fields spec.hostNetwork, spec.hostIPC, and spec.hostPID must be unset or set to false
.
spec:
=(hostPID): "false"
=(hostIPC): "false"
=(hostNetwork): "false"
Don't set hostPID as true
Sharing the host’s PID namespace allows visibility of processes on the host, potentially leaking information such as environment variables and configuration
apiVersion: v1
kind: Pod
metadata:
name: badpod01
spec:
hostPID: true
containers:
- name: container01
image: dummyimagename
Don't set hostIPC as true
Sharing the host’s IPC namespace allows container processes to communicate with processes on the host
Removing namespaces from pods reduces isolation and allows the processes in the pod to perform tasks as if they were running natively on the host.
This circumvents the protection models that containers are based on and should only be done with absolutely certainty (for example, for low-level observation of other containers).
apiVersion: v1
kind: Pod
metadata:
name: badpod02
spec:
hostIPC: true
containers:
- name: container01
image: dummyimagename
Don't set HostNetwork as true
Sharing the host’s network namespace permits processes in the pod to communicate with processes bound to the host’s loopback adapter
apiVersion: v1
kind: Pod
metadata:
name: badpod03
spec:
hostNetwork: true
containers:
- name: container01
image: dummyimagename
set everything as false
hostPID: false
hostIPC: false
hostNetwork: false
disallow host path
HostPath volumes let Pods use host directories and volumes in containers. Using host resources can be used to access shared data or escalate privileges and should not be allowed. This policy ensures no hostPath volumes are in use.
HostPath volumes are forbidden. The field spec.volumes[*].hostPath must be unset.
A volume can be declared in a pod’s Kubernetes yaml manifest. You can specify .spec.volumes along with .spec.containers[*].volumeMounts
to specify what kind of volume it is, and where to mount it inside of the container. Here’s an example of a pod that creates a container that mounts the host’s root directory to /host
inside of the container.
apiVersion: v1
kind: Pod
metadata:
name: badpod01
spec:
containers:
- name: container01
image: dummyimagename
volumeMounts:
- name: udev
mountPath: /data
volumes:
- name: udev
hostPath:
path: /etc/udev
-
apiVersion: apps/v1
kind: Deployment
metadata:
name: gooddeployment02
spec:
replicas: 1
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: container01
image: dummyimagename
volumeMounts:
- name: temp
mountPath: /scratch
volumes:
- name: temp
emptyDir: {}
There are a few ways to protect against potential misconfigurations relating to HostPath volumes.
Scope the HostPath volume to a specific directory.
Be sure to specify a spec.volumes.hostpath.path directory that is essential. Otherwise avoid using HostPaths altogether.
Ensure the HostPath volume is read only.
When mounting the volume you can set it to read only mode.
volumeMounts:
- mountPath: /var/log/host
name: test-volume
readOnly: true
Bonus Points: Use a container optimized OS like Google’s Container Optimized OS or AWS’s Bottlerocket, which include read only root filesystems by default.
Restrict access to HostPath volumes through an pod security admission controller.
A collection of manifests that create pods with different elevated privileges. Quickly demonstrate the impact of allowing security sensitive pod attributes like hostNetwork, hostPID, hostPath, hostIPC, and privileged
disallow host ports
Access to host ports allows potential snooping of network traffic and should not be allowed, or at minimum restricted to a known list. This policy ensures the hostPort
field is unset or set to 0
.
The hostPort setting applies to the Kubernetes containers. The container port will be exposed to the external network at :, where the hostIP is the IP address of the Kubernetes node where the container is running, and the hostPort is the port requested by the user.
We recommend that you do not specify a hostPort for a pod unless it is absolutely necessary. When you bind a pod to a hostPort, it limits the number of places the pod can be scheduled, because each <hostIP, hostPort, protocol> combination must be unique.
If you do not specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol. This will expose your host to the internet.
apiVersion: v1
kind: Pod
metadata:
name: influxdb
spec:
containers:
- name: influxdb
image: influxdb
ports:
- containerPort: 8086
hostPort: 8086
The hostPort feature allows to expose a single container port on the host IP. Using the hostPort to expose an application to the outside of the Kubernetes cluster has the same drawbacks as the hostNetwork approach discussed in the previous section. The host IP can change when the container is restarted, two containers using the same hostPort cannot be scheduled on the same node and the usage of the hostPort is considered a privileged operation on OpenShift.
What is the hostPort used for? For example, the nginx based Ingress controller is deployed as a set of containers running on top of Kubernetes. These containers are configured to use hostPorts 80 and 443 to allow the inbound traffic on these ports from the outside of the Kubernetes cluster.
disallow-host-ports-range
Access to host ports allows potential snooping of network traffic and should not be allowed, or at minimum restricted to a known list. This policy ensures the hostPort
field is set to one in the designated list.
The only permitted hostPorts are in the range 5000-6000.
apiVersion: v1
kind: Pod
metadata:
name: goodpod02
spec:
containers:
- name: container01
image: dummyimagename
ports:
- name: admin
containerPort: 8000
hostPort: 5555
protocol: TCP
disallow-host-process
Windows pods offer the ability to run HostProcess containers which enables privileged access to the Windows node. Privileged access to the host is disallowed in the baseline policy. HostProcess pods are an alpha feature as of Kubernetes v1.22. This policy ensures the hostProcess
field, if present, is set to false
.
HostProcess containers are disallowed. The fields spec.securityContext.windowsOptions.hostProcess, spec.containers[*].securityContext.windowsOptions.hostProcess, spec.initContainers[*].securityContext.windowsOptions.hostProcess, and spec.ephemeralContainers[*].securityContext.windowsOptions.hostProcess must either be undefined or set to
false.
securityContext:
windowsOptions:
hostProcess: false
set hostProcess to false
apiVersion: v1
kind: Pod
metadata:
name: badpod01
spec:
hostNetwork: true
containers:
- name: container01
image: dummyimagename
securityContext:
windowsOptions:
hostProcess: true
disallow-privileged-containers
Privileged mode disables most security mechanisms and must not be allowed. This policy ensures Pods do not call for privileged mode.
Privileged mode is disallowed. The fields spec.containers[].securityContext.privileged and spec.initContainers[].securityContext.privileged must be unset or set to false
.
set privileged to false
apiVersion: v1
kind: Pod
metadata:
name: goodpod02
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
privileged: false
disallow-proc-mount
The default /proc masks are set up to reduce attack surface and should be required. This policy ensures nothing but the default procMount can be specified. Note that in order for users to deviate from the Default
procMount requires setting a feature gate at the API server.
Changing the proc mount from the default is not allowed. The field spec.containers[*].securityContext.procMount, spec.initContainers[*].securityContext.procMount, and spec.ephemeralContainers[*].securityContext.procMount must be unset or set to
Default.
Don't set procMount as Unmasked always set as Default
apiVersion: apps/v1
kind: Deployment
metadata:
name: gooddeployment02
spec:
replicas: 1
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
procMount: Default
disallow selinux
SELinux options can be used to escalate privileges and should not be allowed. This policy ensures that the seLinuxOptions
field is undefined.
Don't set seLinuxOptions as spc_t
set either as container_t container_kvm_t
or container_init_t
securityContext:
seLinuxOptions:
type: container_t
securityContext:
seLinuxOptions:
type: container_kvm_t
securityContext:
seLinuxOptions:
type: container_init_t
restrict-apparmor-profiles
On supported hosts, the 'runtime/default' AppArmor profile is applied by default. The default policy should prevent overriding or disabling the policy, or restrict overrides to an allowed set of profiles. This policy ensures Pods do not specify any other AppArmor profiles than runtime/default
or localhost/*
.
Specifying other AppArmor profiles is disallowed.
The annotation container.apparmor.security.beta.kubernetes.io
if defined must not be set to anything other than runtime/default
or localhost/*
.
always set as
metadata:
annotations:
container.apparmor.security.beta.kubernetes.io/container01: runtime/default
or
metadata:
annotations:
container.apparmor.security.beta.kubernetes.io/container01: localhost/foo
restrict-seccomp
The seccomp profile must not be explicitly set to Unconfined. This policy, requiring Kubernetes v1.19 or later, ensures that seccomp is unset or set to RuntimeDefault
or Localhost
.
Use of custom Seccomp profiles is disallowed. The fields spec.securityContext.seccompProfile.type, spec.containers[].securityContext.seccompProfile.type, spec.initContainers[].securityContext.seccompProfile.type, and spec.ephemeralContainers[*].securityContext.seccompProfile.type must be unset or set to RuntimeDefault
or Localhost
.
set as
securityContext:
seccompProfile:
type: RuntimeDefault
or
securityContext:
seccompProfile:
type: Localhost
localhostProfile: operator/default/profile1.json
restrict-sysctls
Sysctls can disable security mechanisms or affect all containers on a host, and should be disallowed except for an allowed "safe" subset. A sysctl is considered safe if it is namespaced in the container or the Pod, and it is isolated from other Pods or processes on the same Node. This policy ensures that only those "safe" subsets can be specified in a Pod.
Setting additional sysctls above the allowed type is disallowed. The field spec.securityContext.sysctls
must be unset or not use any other names than kernel.shm_rmid_forced
, net.ipv4.ip_local_port_range
, net.ipv4.ip_unprivileged_port_start
, net.ipv4.tcp_syncookies
and net.ipv4.ping_group_range
.
set as
securityContext:
sysctls:
- name: net.ipv4.ip_unprivileged_port_start
value: "2048"
or
securityContext:
sysctls:
- name: net.ipv4.ip_local_port_range
value: "31000 60999"
or
securityContext:
sysctls:
- name: kernel.shm_rmid_forced
value: "2"
or
securityContext:
sysctls:
- name: net.ipv4.tcp_syncookies
value: "0"
or
securityContext:
sysctls:
- name: net.ipv4.ip_unprivileged_port_start
value: "2048"
Lets Learn about Pod Security Standards with Restricted - Heavily restricted policy, following current Pod hardening best practices.
disallow-capabilities-strict - Pod Security Standards (Restricted)
Adding capabilities other than NET_BIND_SERVICE
is disallowed. In addition, all containers must explicitly drop ALL
capabilities.
apiVersion: apps/v1
kind: Deployment
metadata:
name: gooddeployment01
spec:
replicas: 1
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
capabilities:
drop:
- ALL
disallow privilege escalation - Pod Security Standards (Restricted)
Privilege escalation, such as via set-user-ID or set-group-ID file mode, should not be allowed. This policy ensures the allowPrivilegeEscalation
field is set to false
. allowPrivilegeEscalation: "false"
apiVersion: v1
kind: Pod
metadata:
name: goodpod01
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
allowPrivilegeEscalation: false
require run as non root user - Pod Security Standards (Restricted)
Containers must be required to run as non-root users. This policy ensures runAsUser
is either unset or set to a number greater than zero.
Running as root is not allowed. The fields spec.securityContext.runAsUser
, spec.containers[*].securityContext.runAsUser
, spec.initContainers[*].securityContext.runAsUser
, and spec.ephemeralContainers[*].securityContext.runAsUser
must be unset or set to a number greater than zero.
apiVersion: v1
kind: Pod
metadata:
name: badpod02
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
runAsUser: 0
set runAsUser: 1
apiVersion: v1
kind: Pod
metadata:
name: goodpod02
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
runAsUser: 1
require run as nonroot
Containers must be required to run as non-root users. This policy ensures runAsNonRoot
is set to true
. A known issue prevents a policy such as this using anyPattern
from being persisted properly in Kubernetes 1.23.0-1.23.2.
Running as root is not allowed. Either the field spec.securityContext.runAsNonRoot
must be set to true
, or the fields spec.containers[*].securityContext.runAsNonRoot
, spec.initContainers[*].securityContext.runAsNonRoot
, and spec.ephemeralContainers[*].securityContext.runAsNonRoot
must be set to true
.
set runAsNonRoot as true
apiVersion: v1
kind: Pod
metadata:
name: goodpod01
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
runAsNonRoot: true
Restrict Seccomp (Strict)
The seccomp profile in the Restricted group must not be explicitly set to Unconfined but additionally must also not allow an unset value. This policy, requiring Kubernetes v1.19 or later, ensures that seccomp is set to RuntimeDefault
or Localhost
. A known issue prevents a policy such as this using anyPattern
from being persisted properly in Kubernetes 1.23.0-1.23.2.
Use of custom Seccomp profiles is disallowed. The fields spec.securityContext.seccompProfile.type
, spec.containers[*].securityContext.seccompProfile.type
, spec.initContainers[*].securityContext.seccompProfile.type
, and spec.ephemeralContainers[*].securityContext.seccompProfile.type
must be set to RuntimeDefault
or Localhost
.
apiVersion: v1
kind: Pod
metadata:
name: goodpod02
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
seccompProfile:
localhostProfile: operator/default/profile1.json
type: Localhost
or
apiVersion: v1
kind: Pod
metadata:
name: goodpod03
spec:
containers:
- name: container01
image: dummyimagename
securityContext:
seccompProfile:
type: RuntimeDefault
restrict volume types
In addition to restricting HostPath volumes, the restricted pod security profile limits usage of non-core volume types to those defined through PersistentVolumes. This policy blocks any other type of volume other than those in the allow list.
Only the following types of volumes may be used: configMap, csi, downwardAPI, emptyDir, ephemeral, persistentVolumeClaim, projected, and secret.
Check it out :- kubernetes.io/docs/concepts/security/pod-se.. for more