I am currently tasked with investigating key-value stores for a host of services; they should be relatively fault tolerant and performant. These KV stores will be running in a kubernetes environment and will be used mostly for storing configuration data.
There are several products I'm evaluating - etcd, Infinispan, and redis (if you have other recommendations please drop them in the comments below). Part of this project includes adding proof-of-concept support just to show that things work.
Work on the POC needs to talk to these data stores. Since my Kubernetes setup runs inside of a Linux virtual machine, the data stores need to be externally accessible during development for easier debugging et al. on my host platform (Win10).
This was relatively easy with etcd; I spun up a few pods on my microk8s machine, added an ingress, and my project could talk to it. The Kubernets ingress went to a load balancer which balanced requests to the various etcd pods. Easy.
Infinispan was a bit more of a challenge. The Infinispan Hotrod library for Java wants, by default, to directly connect to each of the nodes. This allows it to be aware of the cluster, talk to them directly, know when nodes join / drop, etc. Very nice technology, if overkill for what my needs are, but that's fine.
A simple ingress alone doesn't work here, though. Since the client, by default, wants to talk to each of the nodes directly, working through a load balanced ingress will not work. This behavior can be changed by setting ClientIntelligence to BASIC however, so that's what I did. Now it didn't want to connect to everything and it didn't really care where the calls landed.
Then came redis. Talking to a single redis node worked much the same way as etcd - connect to one thing and it's happy, but I need a cluster for fault tolerance and handing larger workloads.
Setting up a redis cluster in kubernetes is pretty easy - although this tutorial contains some Rancher-specific information, the basics are all there and it works just fine for a generic Kubernetes install.
The pods spun up, I got them to talk to each other, I learned a few Redis commands, and I was happy. But Redis in a clustered mode has similar demands as Infinispan: When a client connects, it queries the cluster for information on all of the nodes. It then attempts to make a connection to each node.
This is particularly important for Redis because of how it is architected. Keys are not stored everywhere as they are in etcd and Infinispan; they are split into buckets. A Redis master and slave group is responsible for a particular bucket. When you perform a read or a write, that command is supposed to go to the instance responsible for that particular bucket space.
I didn't find an easy work-around as I had with Infinispan - I was unable to find the redis equivalent to the Infinispan Basic Client Intelligence. My application, which was running outside of the Kubernetes cluster for development, needed to be able to talk directly to each of the redis pods.
This meant a few things:
- Each redis pod needed to be exposed to the outside.
- Since each pod will be living on the same IP address, each needed its own pair of ports (one for clients, one for inter-pod gossip).
This seemed like a simple problem at first. All I needed to do was expose and map the ports on each pod. Then it would work, right? Something like this:
kubectl expose pod redis-cluster-0 --name=redis-cluster-ingress-0 --port 50000 --target-port 6379 --external-ip=22.214.171.124 kubectl expose pod redis-cluster-0 --name=redis-cluster-gossip-0 --port 50001 --target-port 16379 --external-ip=126.96.36.199
I was forgetting something important, however: The Redis cluster was set up to use internal IP addresses and their standard ports, and is the discovery information sent to any client that wants to connect.
So, while my Java client could connect to an initial host, when it received the cluster information it was being fed the internal ip addresses and ports. My service wasn't being given 188.8.131.52:50000, it was being given 184.108.40.206:6379.
Other item I didn't know is that the gossip port for Redis is always 10000 higher than the client port (by default; this can be configured). That's fine, but it was one more item that I missed.
"Ah," I thought, "the Kubernetes manifest allows me to set the pod's IP address. Surely that will fix my problem." Nope; not quite so easy. Even with this set, things didn't seem to work. After a bit of thinking, I decided on what I needed to do:
- The pods needed to have ports on that public IP address.
- The pods needed to be actually using those ports, not just have them mapped.
- The pods need to report this IP address so that clients would attempt to connect to pods using the correct information.
- The ports needed to be predictable and not random, for convenience.
For development and POC it didn't really matter HOW this happened, it just needed to happen. In a production environment the service would be running inside of Kubernetes and I could drop these shenanigans, but that's not what I was doing. My situation, as mentioned above:
Redis: inside kubernetes on a virtual machine. Eclipse with my Java project: Not.
Not the best solution, but certainly A solution: Be specific with the ports, have redis provide external ip addresses during cluster info queries. Should be easy now that I understood the problem.
It you look at the article above from the Rancher folks, you'll see that there is an
update-node.sh script that does a bit of magic then runs the redis-server.
Note that dev.to won't let me use double curly braces; it interprets them as Liquid variables. So, pretend that the square brackets are curly braces for the purposes of kubetpl snippets down below.
Using kubetpl to process kubernetes templates, I updated the configuration with:
[[ if .REDIS_EXTERNAL_ACCESS ]] MYPORT="5000"$(echo `hostname` | sed s/redis-cluster-//) exec redis-server /conf/redis.conf --port $MYPORT [[ else ]] exec "$@" [[ end ]]
This specified the client port to be 50000 + the ordinal value of the pod within the stateful set. So pod 0 would be 50000. Pod 1 would be 50001. The gossip port would also be updated to the 60000 range.
Then added this to the bottom of the redis.conf entry in the same manifest:
[[ if .REDIS_EXTERNAL_ACCESS ]] cluster-announce-ip [[ .REDIS_EXTERNAL_ACCESS ]] [[ end ]]
Now, when asked for cluster information, the pod would report the external IP address. Combined with the individually curated ports, this should be enough to allow everything to talk.
So when spinning redis up with
REDIS_EXTERNAL_ACCESS empty, the redis pods spin up as normal: Pods get internal IP addresses and use the default ports. Run the command in the rancher article, the cluster sets itself up, and you're ready to go. Redis is accessible internal to the kubernetes cluster only.
However, when setting
REDIS_EXTERNAL_ACCESS to my microk8s external IP address:
- Each pod tells the cluster to use that IP address to talk to it.
- The ports will be 5000x and 6000x, with x matching the pod's stateful set identifier, making them unique.
Now I needed to expose both pods for each port as explained up above. A bit of bash to set up all six nodes:
for n in 0 1 2 3 4 5; do kubectl expose pod redis-cluster-$n --name=redis-cluster-gossip-$n --port=6000$n --external-ip=220.127.116.11; kubectl expose pod redis-cluster-$n --name=redis-cluster-ingress-$n --port 5000$n --external-ip=18.104.22.168; done
Once this is done each pod is externally accessible, tells the cluster to use the external IP address, and tells the cluster to use the correct ports. When starting up the cluster in redis we specify the external ip address and the client port for each pod:
kubectl exec -it redis-cluster-0 -- redis-cli -p 50000 --verbose --cluster create --cluster-replicas 1 22.214.171.124:50001 126.96.36.199:50002 188.8.131.52:50003 184.108.40.206:50004 220.127.116.11:50005 18.104.22.168:50000
A bit inefficient since all traffic is now going out of kubernetes, into the network interface, then back in again, but this fully exposes the cluster externally and I could use it for development:
JedisPoolConfig jedisPoolConfig = new JedisPoolConfig(); jedisPoolConfig.setMaxTotal(100); jedisPoolConfig.setMinIdle(25); HostAndPort hostAndPort = new HostAndPort(host, port); clusterPool = new JedisCluster(Collections.singleton(hostAndPort), jedisPoolConfig);
I hope this helps someone. I'm certain that there's a better, more elegant solution - and you probably know what it is. Drop it in the comments below and help out this Kubernetes / Redis newbie.
It's social media for devs
Level up every day