Accessing Docker from a Kubernetes Pod

Published in

ESTL Lab Notes

6 min readMar 14, 2018

Coursemology uses Docker to evaluate programming assignments from students. A container is created from a custom image we create for code evaluation, then the student’s code is copied inside and run. The code uses the docker-api gem to control Docker. By default, the gem looks for the Docker socket file locally. On the VM deployment, that was easily solved by installing Docker on the worker VMs and then giving the worker process access to it by adding its user to the Docker group.

With the worker running on Kubernetes, there are two possible solutions:

Create a separate Compute instance to run Docker and expose the socket.
Use Docker on the nodes directly.

The steps for exposing the socket are not too hard, but securing it is more complicated, involving a lot of certificate manipulation. It would also mean a separate instance to create, configure, monitor and maintain. Some additional configuration is also needed for the docker-api gem to know where to find the Docker socket.

While working on configuring multiple writer storage, I noticed that the documentation for the hostPath volume gave “accessing Docker internals” as one of the use cases. It would be nice to use the Docker instance that is already running on the Kubernetes nodes. There will be less to maintain and the cluster autoscaler might even work automatically when there is more load!

Accessing the Docker Socket

Add a hostPath volume to the worker Deployment file and mount the volume to the worker container. The last few lines of the worker Deployment file now look like this:

          volumeMounts:
            - name: nfs
              mountPath: "/mountpath"
            - name: dockersock
              mountPath: "/var/run/docker.sock"
      volumes:
      - name: nfs
        persistentVolumeClaim:
          claimName: nfs
      - name: dockersock
        hostPath:
          path: /var/run/docker.sock

Note that the NFS mount from the previous post is there too.

Testing — Round 1

The test here is going to be very Coursemology dependent, so just follow along with the logic. You might have to use something totally different for your own application.

$ kubectl exec -it worker-pod /bin/sh
~/coursemology2 $ bundle exec rails console
irb(main):001:0> Docker.version
Excon::Error::Socket: Permission denied - connect(2) for /var/run/docker.sock (Errno::EACCES)
 from (irb):1# Let's check the permissions
~ $ ls -l /run
total 4
srw-rw----    1 root     412              0 Mar 13 14:57 docker.sock

Ahh this 412 is probably a group ID which does not exist in this container. We saw this before with the NFS permissions. We can verify this.

Get a shell onto a node and list the groups:

$ gcloud compute ssh node-hostname
...user@node-hostname ~ $ cat /etc/group | grep docker
docker:!:412:user,gke-something

412 is the group ID of the docker group on the node.

Allowing the Container to talk to Docker

We can use the pod security context to set the fsGroup . The key thing is to add the securityContext key and specify the fsGroup. Here is a partial listing of the worker’s Deployment definition:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: cm-worker
spec:
  replicas: 1
  template:
    metadata:
      name: cm-worker
      labels:
        app: worker
    spec:
      securityContext:
        fsGroup: 412    # Group ID of docker group on k8s nodes.
      containers:
        - name: sidekiq
          image: gcr.io/user/coursemology
          command: ["bundle"]
          args: ["exec", "sidekiq"]
          imagePullPolicy: Always
          env:
          - name: RAILS_ENV
            value: 'production'
          - name: DB_USER
            valueFrom:
              secretKeyRef:
                name: shhh-my-secrets
                key: DB_USER
          - name: DB_PASS
            valueFrom:
              secretKeyRef:
                name: shhh-my-secrets
                key: DB_PASS
          envFrom:
          - configMapRef:
              name: not-the-secrets
          volumeMounts:
            - name: nfs
              mountPath: "/mountpath"
            - name: dockersock
              mountPath: "/var/run/docker.sock"
      volumes:
      - name: nfs
        persistentVolumeClaim:
          claimName: nfs
      - name: dockersock
        hostPath:
          path: /var/run/docker.sock

Testing — Round 2

On the worker:

irb(main):001:0> Docker.version
=> {"Version"=>"17.03.2-ce", ...}# Pull an image
image = Docker::Image.create('fromImage' => 'ubuntu:14.04')

On the node, after pulling the image in the worker:

user@node-hostname ~ $ docker images
REPOSITORY    TAG       IMAGE ID        CREATED             SIZE
ubuntu        14.04     a35e70164dfb    6 days ago          222 MB

Pulling Evaluator Images

Coursemology will pull the Docker images for evaluation if they do not exist on the host. However, this can take a long time as the image is bloated with commonly used libraries for teaching programming. In the VM deployment, all the images are manually pulled. If there is a new version, the images are also refreshed manually. With Kubernetes, we would like all the images to be available for the worker to use without having to wait for the first evaluation.

Perhaps a Kubernetes CronJob can be used to keep the images on the nodes up to date. The evaluator images can just run a no-op command, but their images will be pulled to the node.

apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: evaluator-image-pull
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: python-36-evaluator
            command:
              - /bin/sh
            image: coursemology/evaluator-image-python:3.6
            args:
            - -c
            - date; echo Hello from the Kubernetes cluster
          restartPolicy: OnFailure

Unfortunately, there is no control over where the job gets scheduled. It might always end up on the same node, while the worker pods end up on different nodes. Over time, there is no way to guarantee that all the pods will get updated Docker images for programming evaluation.

A DaemonSet runs pods on all nodes in a cluster! Perhaps that’s the ticket! Closer inspection shows that they are actually for long running processes and not short-lived commands like the CronJob. What is needed here is something that combines both these properties.

Enter the CronJob Daemonset! But there is no such feature and the issue has been open for some time. However, there are some suggestions in the comments. One of them mentioned init containers. Init containers run in a pod, before the app containers are started. This is actually perfect for the needs here. While not all nodes will get a copy of the Docker images, it is sufficient that the nodes where the worker pod ends up on has the images.

Adding Init Containers to the Worker Pod

Add initContainers to the pod spec. Here is a partial resource definition:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: worker
spec:
  replicas: 1
  template:
    metadata:
      name: worker
      labels:
        app: worker
    spec:
      initContainers:
      - name: init-evaluator-python
        image: docker
        command: ['sh', '-c', 'docker pull coursemology/evaluator-image-python:3.6; docker pull coursemology/evaluator-image-python:3.5; docker pull coursemology/evaluator-image-python:3.4; docker pull coursemology/evaluator-image-python:2.7']
        volumeMounts:
          - name: dockersock
            mountPath: "/var/run/docker.sock"
      - name: init-evaluator-cpp
        image: docker
        command: ['sh', '-c', 'docker pull coursemology/evaluator-image-c_cpp']
        volumeMounts:
          - name: dockersock
            mountPath: "/var/run/docker.sock"
      - name: init-evaluator-java
        image: docker
        command: ['sh', '-c', 'docker pull coursemology/evaluator-image-java']
        volumeMounts:
          - name: dockersock
            mountPath: "/var/run/docker.sock"

Each init container mounts the Docker socket file so it can pull the containers onto the host node. The mount and the security context have already been defined earlier in this post.

The docker image is used for the init containers as it contains the docker binary. Mounting the host node’s /usr/bin/docker does not work and will fail with docker not found even though it is right there.

Now when the worker pod comes up, the init containers will run first:

$ kubectl get pods
NAME         READY     STATUS        RESTARTS   AGE
worker-pod   0/1       Init:0/3      0          24s
...$ kubectl get pods
NAME         READY     STATUS        RESTARTS   AGE
worker-pod   0/1       Init:2/3      0          41s
...$ kubectl get pods
NAME         READY     STATUS        RESTARTS   AGE
worker-pod   1/1       Running      0           59s
...

On the node, running docker images shows that the images have been pulled:

user@gke-node ~ $ docker images
REPOSITORY                                                            TAG                 IMAGE ID            CREATED             SIZE
coursemology/evaluator-image-python                                   3.6                 844650f7ea25        3 weeks ago         1.1 GB
coursemology/evaluator-image-python                                   2.7                 02931e5199d7        3 weeks ago         407 MB
coursemology/evaluator-image-python                                   3.4                 8f4c7623d89e        3 weeks ago         430 MB
coursemology/evaluator-image-python                                   3.5                 eb97bcdfc1a8        3 weeks ago         541 MB
coursemology/evaluator-image-c_cpp                                    latest              4542d65d001a        3 weeks ago         505 MB
docker                                                                latest              cc2d9a7e463b        4 weeks ago         133 MB

Programming Evaluation Testing

Create a programming question in Coursemology. The evaluator must be running correctly for it to succeed.

Just to be sure, start the assessment and run some code: