Using Linux Capabilities in AKS
Introduction
Right after Kubernetes 1.21, the way Capabilities(7) worked in Kubernetes changed. At that version, a change in the upstream code enforced that Capabilities will only work when runAsUser is set to 0 - meaning root. This is somewhat counterintuitive to what most of us would expect but code goes into the reasoning and how to work with Capabilities after 1.21.
The code that prevents any user other than root to have capabilities. This was added by the commit referenced here:
// Clear all ambient capabilities. The implication of non-root + caps
// is not clearly defined in Kubernetes.
// See https://github.com/kubernetes/kubernetes/issues/56374
// Keep docker's behavior for now.
specOpts = append(specOpts,
customopts.WithoutAmbientCaps,
customopts.WithSelinuxLabels(processLabel, mountLabel),
)
On the previous note, we can add/remove capabilities to root - which essentially removes a lot of the superpowers that root have on by default (e.g.: cap_net_admin).
Approaches
-
Using
RunAsUser: 0but restricting the capabilities in the accountIn the example below we will be granting
cap_ipc_lockto the running user (root) and nothing else.apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: gbbapp
name: gbbapp
namespace: ns-gbb
spec:
replicas: 1
selector:
matchLabels:
app: gbbapp
template:
metadata:
labels:
app: gbbapp
spec:
containers:
- name: gbbapp
image: gbbapp/k8s:cfa
command: ["/bin/bash"]
args: ["-c", "sleep 3600"]
securityContext:
runAsUser: 0
capabilities:
drop: ["ALL"]
add: ["IPC_LOCK"]- Adding capabilities to binaries during the build process
It is possible to add capabilities during the build process with docker/podman. With this approach you can remove the
RunAsUserparameter altogether. The capabilities added there will persist when the image runs as a container. The following example addscap_ipc_lockto python3.8-
Create a Dockerfile
FROM ubuntu
RUN apt-get update && apt-get install -y libcap2-bin && \
setcap cap_ipc_lock+eip /usr/bin/python3.8
CMD ["/bin/bash"] -
Create an ACR instance
RESOURCE_GROUP_NAME=rg-setcomp
LOCATION=westus
ACR_NAME=myacrname
az group create --name ${RESOURCE_GROUP_NAME} --location ${LOCATION}
az acr create -n ${ACR_NAME} -l ${LOCATION} -
Add the container to ACR
az acr build -r ${MY_ACR}/setcomp:{{.Run.ID}} . -
Exec into the container and verify that the capability was added by running the
getcapcommand against a binary, which in this case ispython3.8:$ getcap /usr/bin/python3.8
/usr/bin/python3.8 = cap_ipc_lock+eip
Conclusion
In this article we've explored how to enable Capabilities for a container and how to limit it's scope. You now have the pieces and bits needed to enable the minimum amount of capabilities for any given container.
