1.8. Under the hood
A closer look at the docker command and the runtime
We’ve learned that the term “Docker” is used somewhat imprecisely. It refers to various components such as the CLI (Command Line Interface), the Docker Engine, the OCI image format, and the Container runtime. Let’s take a closer look at what’s happening when we use the command:
docker run --rm -d --name sleep-container alpine sleep 900
We will see the meaning of -rm and the other arguments later. For now, we only need to know that we started a container that sleeps for 900 seconds on our host.
First let us get the process id of the sleep process we just started:
docker inspect --format '{{.State.Pid}}' sleep-container
Let us see the process running. In the webshell we have the docker backend running in another container, let us first change into that:
kubectl exec $(kubectl get pod -l "app.kubernetes.io/name"=webshell -o name) -it -c dind -- sh
Don’t worry the command will make sense after the Kubernetes Security training.
Let us see the process running on the host now and its parents. We don’t have the necessary tools installed so we script it a bit:
PID=$(pgrep sleep)
while [ "$PID" != "1" ] && [ -n "$PID" ]; do
PPID=$(awk '/^PPid:/ {print $2}' /proc/$PID/status)
USER=$(stat -c %U /proc/$PID); UIDN=$(stat -c %u /proc/$PID)
CMD=$(tr '\0' ' ' < /proc/$PID/cmdline)
if [ -z ${fi+x} ]; then
fi=true
echo "PID PPID USER(UIDN) CMD"
fi
echo "$PID $PPID $USER($UIDN) $CMD"
PID=$PPID
done
We see hopefully the same PID and something like this
PID PPID USER(UIDN) CMD
3301 3277 rootless(1000) sleep 900
3277 1 rootless(1000) /usr/local/bin/containerd-shim-runc-v2 -namespace moby -id 08ce2ee7d4e3194f47ec6249360b51b02e6e36834f6791854ca2b24a4b15768c -address /run/user/1000/docker/containerd/containerd.sock
Indeed we see that we don’t use docker as a container runtime but containerd at a higher level and runc at a lower level (btw moby is the internal name Docker uses for its network namespaces). The parent of each of these containerd-shim-runc-v2 processes is PID 1 on the system.
The shim becomes the parent process of the containerized application. It is responsible for tasks such as reaping zombie processes, handling container process I/O (standard input, output, error), and ensuring proper container cleanup upon exit. As a result, containerd can upgrade and restart without affecting running containers.
Secondly, we see that in the end a container is just a process running on the host. If not running in rootless mode it runs as root!
Let us see the different isolation techniques being used (since we have lsns not installed we script it as well):
PID=$(pgrep sleep)
unset first
for NSPATH in /proc/$PID/ns/*; do
TYPE=$(basename "$NSPATH")
NS=$(stat -Lc %i "$NSPATH")
NPROCS=$(find /proc/[0-9]* /proc/*/task/* 2>/dev/null -lname "*[$NS]" | wc -l)
USER=$(stat -c %U /proc/$PID)
UID=$(stat -c %u /proc/$PID)
CMD=$(tr '\0' ' ' < /proc/$PID/cmdline)
if [ -z ${first+x} ]; then
first=true
printf "%-12s %-6s %-6s %-6s %-12s %s\n" "NS" "TYPE" "NPROCS" "PID" "USER" "COMMAND"
fi
printf "%-12s %-6s %-6s %-6s %-12s %s\n" "$NS" "$TYPE" "$NPROCS" "$PID" "$USER($UID)" "$CMD"
done
Which shows use the different (and newly created) namespaces being used for this container:
NS TYPE NPROCS PID USER COMMAND
4026532969 cgroup 0 3984 rootless(1000) sleep 900
4026532967 ipc 0 3984 rootless(1000) sleep 900
4026532965 mnt 0 3984 rootless(1000) sleep 900
4026532970 net 0 3984 rootless(1000) sleep 900
4026532968 pid 0 3984 rootless(1000) sleep 900
4026532968 pid_for_children 0 3984 rootless(1000) sleep 900
4026531834 time 0 3984 rootless(1000) sleep 900 # time isolation
4026531834 time_for_children 0 3984 rootless(1000) sleep 900
4026532832 user 0 3984 rootless(1000) sleep 900 # uid gid isolation (root inside is not root outside)
4026532966 uts 0 3984 rootless(1000) sleep 900 #hostname isolation
By comparision, a simple sleep command in the current shell would run in the same namespaces as the parent shell giving no isolation. Also thanks to rootless mode this process runs with UID 1000 instead of root.
Don’t forget to exit our Docker backend container if you work in the webshell.
exit
🤔 Which time will be displayed when you execute uptime inside the container? Try it out and explain what you see and why.
Show me the solution
docker run –rm -i alpine uptime
uptime reads /proc/uptime
The /proc filesystem is a kernel-generated virtual filesystem, not something Docker emulates.
So if it is not namespaced (like PID or Hostname) you will get information directly from the host. These are things like:
/proc/uptime,
/proc/cpuinfo → all host CPUs visible,
/pro/meminfo → host memory,
parts of /proc/sys/.. → global kernel parameters