Adding individual kernel capabilities to a container:
In the old days, traditional UNIX implementations only distinguished between privileged and unprivileged processes, but for many years, Linux has supported a much more fine-grained permission system through kernel capabilities.
Instead of making a container privileged and giving it unlimited permissions, a much safer method (from a security perspective) is to give it access only to the kernel features it really requires.
Kubernetes allows you to add capabilities to each container or drop part of them, which allows you to fine-tune the container’s permissions and limit the impact of a potential intrusion by an attacker.
For example, a container usually isn’t allow to change the system time (the hardware clock’s time).
so you can confirm this by trying to set the time in your pod-hn pod(check out the yml of pod-hn in Part 1).
$ kubectl exec -it pod-hn — date +%T -s “12:00:00”
NOTE: Linux kernel capabilities are usually prefix with CAP_. But when specifying them in a pod spec, you must leave out the prefix.
If you run the same command in this new pod’s container, the system time is change successfully:
$ kubectl exec -it pod-add-settime-capability — date +%T -s “12:00:00”
$ kubectl exec -it pod-add-settime-capability — date
If you try this yourself, be aware that it may cause your worker node to become unusable. In Minikube, although the system time was automatically reset back by the Network Time Protocol (NTP) daemon, so You need to reboot the VM to schedule new pods.
Similarly you can confirm the node’s time has been change by checking the time on the node running the pod. In my case, I’m using Minikube, so I have only one node and I can get its time like this:
$ minikube ssh date
Sun Dec 12 00:50:07 UTC 2021
Adding capabilities like this is a much better way than giving a container full privileges with privileged: true . Admittedly, it does require you to know and understand what each capability does.
Dropping capabilities from a container:
You’ve seen how to add capabilities, but you can also drop capabilities that may otherwise be available to the container. For example, the default capabilities given to a container include the CAP_CHOWN capability, which allows processes to change the ownership of files in the filesystem.
Now,your pod-hn pod (without CHOWN) to the guest user, for example:
$ kubectl exec pod-hn chown guest /tmp
$ kubectl exec pod-hn — ls -la / | grep tmp
drwxrwxrwt 1 guest root 4096 Nov 24 09:20 tmp
To prevent the container from doing that, you need to drop the capability by listing it under the container’s securityContext.capabilities.drop property, as shown in the following listing.
However By dropping the CHOWN capability, you’re not allow to change the owner of the /tmp directory in this pod:
$ kubectl exec pod-drop-chown-capability chown guest /tmp
chown: /tmp: Operation not permitted
Preventing processes from writing to the container’s filesystem:
You may want to prevent the processes running in the container from writing to the container’s filesystem, and only allow them to write to mounted volumes.
let’s imagine you’re running a PHP application with a hidden vulnerability, allowing an attacker to write to the filesystem. The PHP files are add to the container image at build time and are serve from the container’s filesystem. Because of the vulnerability, the attacker can modify those files and inject them with malicious code.
These types of attacks can be thwart by preventing the container from writing to its filesystem, where the app’s executable code is normally store.
Now you can use by setting the container’s securityContext.readOnlyRootFilesystem property to true , as shown in the following listing.
When you deploy this pod, the container is running as root, which has write permissions to the / directory, but trying to write a file there fails:
$ kubectl exec -it pod-with-readonly-filesystem touch t1.txt
touch: /t1.txt: Read-only file system
On the other hand, writing to the mounted volume is allow:
$ kubectl exec -it pod-with-readonly-filesystem touch /volume/t1.txt
$ kubectl exec -it pod-with-readonly-filesystem — ls -la volume/t1.txt
-rw-r–r– 1 root root 0 Dec 12 01:11 volume/t1
To increase security, when running pods in production, set their container’s readOnlyRootFilesystem property to true .
SETTING SECURITY CONTEXT OPTIONS AT THE POD LEVEL:
In all these examples , you’ve set the security context of an individual container but Several of these options can also be set at the pod level (through the pod.spec.securityContext property).
They serve as a default for all the pod’scontainers but can be overridden at the container level. The pod-level security context also allows you to setadditional properties, which we’ll explain in Part3.
If you want to know more about it , refer to (How to secure cluster nodes and the network Part 3).