Deploying Redis with Persistence on Google Kubernetes Engine

Published in

ESTL Lab Notes

11 min readMar 5, 2018

Google Kubernetes Engine (GKE) makes it very easy to create a Kubernetes cluster. It’s a few clicks away through Google Cloud Platform’s (GCP) web console, which even provides you with the corresponding gcloud command so you can just copy it into a script for next time.

After that comes the challenge of actually deploying stuff onto Kubernetes. The documentation isn’t too bad, but there are lots of concepts to pick up at once and it’s not always immediately clear how they all come together. Some of the names have changed too. For example, ReplicationControllers have been superseded by Deployments. Sometimes, the documentation says different things. A lot of step by step tutorials use very simple toy applications, usually stateless ones.

Perhaps I have been unfortunate with the tutorials I’ve come across. They usually go something like this:

Deploy this stateless app that returns “Hello World”.
Scale up the number of pods. Look how easy it is to bring up lots of replicas and increase capacity!
Scale them back down. That was easy too!
Deploy v2 that returns “Hello World v2”. Rolling updates! No downtime! No fuss! Rainbows and unicorns!

For me, these tutorials were lacking because my app does have state to worry about. Tutorials that cover stateful applications usually don’t worry about how to keep the state beyond the lifecyle of the pod. For demonstration purposes, they declare a volume mount onto the host, so if the pod goes down and gets rescheduled onto another node, the data is toast. Not a problem in the tutorial, but unacceptable in real life.

This post will document how I deployed Redis with persistence for my own app. It’ll cover my exploration process, including what I found confusing, and links to the documentation which I found helpful.

Future posts will come as I get the rest of the app onto Kubernetes.

Why is Redis persistence so important?

My colleagues who already have their apps on Kubernetes don’t really need persistence for their Redis instance. They use it for sessions and caching and if it’s gone, it’s fine. They deploy Redis as a StatefulSet with one replica and only allow it to run on one node.

For Coursemology, Redis is used to store the ActiveJob queue. Rails makes this so easy for developers. Here’s an example from Coursemology:

# Setup closing remindersclosing_reminder_job_class.set(wait_until: end_at - 1.day).    perform_later(self, closing_reminder_token)

What this hides from developers is that the job has to be kept somewhere. In our case, it’s held in Redis and can be accessed through Sidekiq’s Scheduled Queue API. If the Redis data is lost, the scheduled job is lost and students will no longer be reminded that their assignments are due. Then my boss will send me an email asking what happened to the reminder emails. Not good.

Redis does not have SSL support . Coursemology’s current deployment uses SSH tunnels to secure communications between the app and the Redis server, which is deployed on a separate VM. This could probably be accomplished with a sidecar container in the same pod maintaining the tunnel, but I wanted to reduce the number of pets.

Redis also holds session information so users can maintain their login session even if the load balancer assigns their current request to a different app server.

Prerequisites

You’ll need a Kubernetes cluster. The tutorials from the official documentation always have a section with instructions on how to get access to one to play with.

kubectl should also be installed and configured to talk to your cluster.

The examples below show what happens when Coursemology communicates with Redis, although some elements have been removed from the various listings for brevity. You can use your own app which needs Redis, or just use redis-cli to get and set keys for testing.

Understanding YAML

Some lines in the sample YAML files have dashes, while others don’t. Some of them seem to have more than one object inside because there’s more than one declaration of apiVersion . What’s that --- separating the sections?

I found answers to those questions in this blog post from Mirantis. If you’re new to YAML, it’s a great guide with specific examples for Kubernetes.

Deploying Redis

Let’s start with getting a Redis container running on the cluster. To ensure that everything is reproducible, all the configuration has been done with YAML files and sent to the cluster with the kubectl create -f somefile.yml command, even when kubectl could be used directly to achieve an objective.

I wrote this redis.yml file by referencing my colleague’s version, and also with the help of the StatefulSet tutorial. In particular, I was using a newer version of Kubernetes so the pod selectors are compulsory. The documentation talks about the necessity of pod selectors outside of the StatefulSet tutorial.

apiVersion: v1
kind: Service
metadata:
  name: redis
spec:
  ports:
    - port: 6379
      name: redis
  clusterIP: None
  selector:
    app: redis
---
apiVersion: apps/v1beta2
kind: StatefulSet
metadata:
  name: redis
spec:
  selector:
    matchLabels:
      app: redis  # has to match .spec.template.metadata.labels
  serviceName: redis
  replicas: 1
  template:
    metadata:
      labels:
        app: redis  # has to match .spec.selector.matchLabels
    spec:
      containers:
        - name: redis
          image: redis:3.2-alpine
          imagePullPolicy: Always
          args: ["--requirepass", "$(REDIS_PASS)"]
          ports:
            - containerPort: 6379
              name: redis
          env:
          - name: REDIS_PASS
            valueFrom:
              secretKeyRef:
                name: env-secrets
                key: REDIS_PASS

Copy the YAML above into a file named redis.yml, then run kubectl create -f redis.yml to create the Service and the StatefulSet on the cluster.

Run kubectl get statefulsets, kubectl get services and kubectl get pods to check the status of the Redis service.

When everything is up and running, you should see output similar to the one shown below.

$ kubectl get statefulsets
NAME      DESIRED   CURRENT   AGE
redis     1         1         2m$ kubectl get services
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)    AGE
kubernetes   ClusterIP   10.x.x.x      <none>        443/TCP    5m
redis        ClusterIP   None          <none>        6379/TCP   4m$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
redis-0   1/1       Running   0          5m

Let’s get a shell to the container and try it out.

$ kubectl exec -it redis-0 /bin/sh
/data # redis-cli
127.0.0.1:6379> AUTH <yourpassword>
OK
127.0.0.1:6379> PING
PONG

One question you might have now is what apiVersion to use for each object? I just followed the examples, but there’s a reference here with some explanation of the changes.

Connecting the App

The location of the Redis host should be set by an environment variable in the app. In Kubernetes, environment variables can be set in config maps and passed to pods.

The relevant line should look something like this:

REDIS_HOST: 'redis.default.svc.cluster.local'

This was where I got confused by the documentation. According to the explanation of Services, environment variables and an optional DNS add-on are the two main ways of finding services. The environment variables worked but the documentation warns that there is an ordering requirement. That’s not ideal so I hoped to be able to use DNS instead.

DNS looked like it isn’t there by default and it looked rather tricky to install add-ons. However, at the bottom of the paragraph where it talks about ExternalName services, there’s a link to DNS pods. This was where I found the correct form for the service address.

It turns out that GKE does have DNS support, as mentioned on a GCP Solutions page. So while all the necessary documentation does exist and individual pages are fairly well written, it does take some repeated reading or prior knowledge of the concepts to figure out what parts are applicable.

According to the WordPress tutorial which I leaned on quite heavily for setting up persistence, I should have been able to just use the name redis. That’s also what the DNS section for the Services documentation suggests. However, I could not get it to work without the full hostname. Having just re-read the documentation, I finally noticed that the name lookup returns the cluster IP for the service. Since my Redis service definition has no cluster IP, that could be why using redis as the Redis host address did not work for me.

Testing — Round One

No persistence has been configured yet, but Redis is running and the app can connect to it. I can check that jobs can be scheduled and that they are added to the queue.

Let’s try deleting the pod by running kubectl delete pod redis-0. The StatefulSet will notice that there are now no Redis instances running and bring up the pod again. As the StatefulSet documentation notes, pods in a StatefulSet have an ordinal index and a stable network identity. Another pod named redis-0 will come up to replace the deleted one.

$ kubectl get pods
NAME             READY     STATUS        RESTARTS   AGE
redis-0          0/1       Terminating   0          1m$ kubectl get pods
NAME             READY     STATUS              RESTARTS   AGE
redis-0          0/1       ContainerCreating   0          3s$ kubectl get pods
NAME             READY     STATUS    RESTARTS   AGE
redis-0          1/1       Running   0          7s

After it’s up, I tried reloading the dashboard and noted that the job had disappeared.

Configuring Persistence in Redis

All the documentation for Redis persistence is available in one convenient place. There’s a dump file configured with the save option, and an Append Only Log. More details and a discussion of the trade-offs can be found in Redis’ documentation.

First, let’s figure out what save options have already configured. Get a shell to Redis then run config get <option> to see what’s going on. The relevant options for Redis persistence are save and appendonly:

$ kubectl exec -it redis-0 /bin/sh
/data # redis-cli
127.0.0.1:6379> AUTH <yourpassword>
OK
127.0.0.1:6379> config get save
1) "save"
2) ""
127.0.0.1:6379> config get appendonly
1) "appendonly"
2) "no"

Note that this configuration is not the same as Redis’ default configuration, which does specify some values for save. This is because we specified --requirepass to Redis. The Dockerfile for Redis’ official image points to an issue comment; it seems like specifying any argument to redis-server makes it assume you will specify everything. Thus, specifying a password has removed the configuration for persistence.

To specify persistence options, add them to Redis’ args in the redis.yml file. You can set multiple save points by specifying the option multiple times. For example:

args: ["--requirepass", "$(REDIS_PASS)", "--appendonly", "yes", "--save", "900", "1", "--save", "30", "2"]

This will create a snapshot after 900 seconds if at least 1 key changed, and after 30 seconds if at least 2 keys changed. It will also enable the Append Only File (AOF). For testing purposes, I have set a low save configuration of 30 2so the dump file gets created with just two changes.

Run kubectl replace -f redis.yml to update the StatefulSet and restart the Redis container. Now when you check Redis’ configuration, you should see:

$ kubectl exec -it redis-0 /bin/sh
/data # redis-cli
127.0.0.1:6379> AUTH <yourpassword>
OK
127.0.0.1:6379> config get save
1) "save"
2) "900 1 30 2"
127.0.0.1:6379> config get appendonly
1) "appendonly"
2) "yes"

Testing — Round 2

Let’s view the files on the Redis container to see what’s going on.

When the container first comes up, a 0 byte appendonly.aof file is created:

$ kubectl exec -it redis-0 /bin/sh
/data # ls -l
total 0
-rw-r--r--    1 redis    redis            0 Mar  5 05:27 appendonly.aof

After visiting the login page (which creates some session related keys):

/data # ls -l
total 4
-rw-r--r--    1 redis    redis          815 Mar  5 05:28 appendonly.aof

The AOF file grows a little.

Wait a while (for the checkpoint to run):

/data # ls -l
total 8
-rw-r--r--    1 redis    redis          815 Mar  5 05:28 appendonly.aof
-rw-r--r--    1 redis    redis          261 Mar  5 05:28 dump.rdb

The dump file is created.

After logging in:

/data # ls -l
total 8
-rw-r--r--    1 redis    redis         1489 Mar  5 05:31 appendonly.aof
-rw-r--r--    1 redis    redis          290 Mar  5 05:31 dump.rdb

The file sizes change.

Now delete the pod:

$ kubectl delete pod redis-0
pod "redis-0" deleted

The StatefulSet recreates another pod with the same name. Let’s look inside again:

$ kubectl exec -it redis-0 /bin/sh
/data # ls -l
total 0
-rw-r--r--    1 redis    redis            0 Mar  5 05:32 appendonly.aof

The data is gone. This is expected as no persistent storage was configured. I’ve also lost my session and have to login again, and the job has disappeared from Sidekiq’s dashboard.

Now let’s make the data persistent!

Creating GCP Persistent Disks

The persistent disk tutorial gives a nice example of how to create and mount a persistent disk.

You can create the disk through the console. When you’ve filled in the details, click on the command line link at the bottom to get a helpful popup with the gcloud command to run.

On the web console, the minimum disk size is 10 GB. However, this minimum is not enforced on the command line so you can create a 1 GB disk for testing purposes.

My Redis instance does not need much storage so I can run the following command to get a disk. Fill in your own values for — project and — zone:

gcloud compute --project=PROJECT disks create redis-disk --zone=ZONE --type=pd-ssd --size=1GB

If you used the command above to create the disk, the following message will appear when the disk has been provisioned.

New disks are unformatted. You must format and mount a disk before it
can be used. You can find instructions on how to do this at:https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting

There is no need to do this. GCP seems to automagically handle disk formatting when the disk is mounted to the container.

Using the Persistent Disk

Edit redis.yml to add the disk to the StatefulSet under the volumes key. Add thevolumeMounts key to the Redis container so it will use the persistent disk.

The last part of redis.yml now looks like this:

spec:
      containers:
        - name: redis
          image: redis:3.2-alpine
          imagePullPolicy: Always
          args: ["--requirepass", "$(REDIS_PASS)", "--appendonly", "yes", "--save", "900", "1", "--save", "30", "1"]
          ports:
            - containerPort: 6379
              name: redis
          env:
          - name: REDIS_PASS
            valueFrom:
              secretKeyRef:
                name: env-secrets
                key: REDIS_PASS
          volumeMounts:
            - name: redis-volume
              mountPath: /data
      volumes:
        - name: redis-volume
          gcePersistentDisk:
            pdName: redis-disk
            fsType: ext4

Replace the Redis setup with kubectl replace -f redis.yml . This takes a bit longer now as the disk has to be attached to the cluster.

Create a shell on the Redis container. There should be an additional lost+found folder in the /data directory.

$ kubectl exec -it redis-0 /bin/sh
/data # ls -l
total 16
-rw-r--r--    1 redis    redis            0 Mar  5 06:16 appendonly.aof
drwx------    2 redis    root         16384 Mar  5 06:16 lost+found

Testing — Round 3

Repeat the tests for Round 2. Check the file sizes in the /data directory of the Redis container. Try logging in and creating jobs through the app.

Delete the pod with the command kubectl delete pod redis-0. In the previous tests, the pod comes up again automatically but the session and job data was gone.

This time, when the pod comes up again, refresh the app and the Sidekiq dashboard. I’m still logged in and the scheduled jobs are still there.

Success!!

Conclusion

Google Cloud Platform and Google Kubernetes Engine make it very easy to spin up a Kubernetes cluster. Once you get the hang of the various API objects available and how to use them, it is very satisfying to put together a resilient, self-healing system.

However, getting started can be a bit confusing. I hope this post has helped to clear things up a little.