Building a Kafka Cluster in a Kubernetes Cluster
Carl Sagan quoted that If you wish to make an apple pie from scratch, you must first invent the universe.
Luckily the universe has already been invented, but we still need to build our Kubernetes cluster to host our Kafka cluster.
For this POC I chose GCP. They’ve done some good work in minimising the UI. Plus the networking works seamlessly behind the scenes.
Install gcloud and kubectl and run the following command to create the cluster:
gcloud beta container \
clusters create "[CLUSTER_NAME]" \
--project "[PROJECT_NAME]" \
--zone "[ZONE]" \
--machine-type "e2-medium" \
--num-nodes 9 \
--disk-size "100"
After about 5 minutes we should have a cluster.
Now we can add Zookeeper and Kafka.
Use the installation guide from this GitHub link
As we have no Producers or Consumers configured yet, Kafka will be in a waiting state. No leader has been elected yet. This all happens when topics and partitions are created. More on that later.
Things to note in this cluster
Zookeeper is a StatefulSet of three replicas. We need consistently named pods in the cluster. If we used a Deployment, every time a new pod was instantiated, it would be randomly name with a deployment hash suffix.
With StatefulSets you get consistent pod names. In this case zookeeper-0, zookeeper-1 and zookeeper-2
Likewise with the Kafka brokers.
The Kafka broker StatefulSet starts Kafka.
Lines 2–4 define the Dockerfile, source can be viewed here
I chose to roll my own Dockerfile to allow more control. There are many Kafka Docker images in the hub, but for this project I needed to be able start Kafka with overridden options.
Lines 5–8 are allocated resources per pod. I used Kenneth Owens excellent templates here to configure this POC.
Lines 12–20 are environment variables. Line 13 sets the internal IP of the newly created pod. Lines 17–20 are standard java options.
Lines 21 onward are Kafka start up options.
Line 24 starts Kafka with the default server properties files, we then override these properties in the following lines with a mix of Kubernetes and custom env vars
Line 25 sets the broker id from the pod hostname, eg pod kafka-broker-1 is assigned broker number 1 (StatefulSets give us this naming convention as discussed above)
Lines 26–29 set the listeners. The brokers need to know how to communicate with each other internally and with external clients. This is difficult to get right. These settings work and will save a lot of pain. Line 26 tells Kafka to use the default 0.0.0.0 which means listen on all interfaces. Line 27 uses the pod ip env var, KAFKA_POD_IP for the advertised listener. Confluent have a good article on listeners here.
Line 28 tells Kafka to use the text string ‘INSIDE’ as the inter broker listener name. I assumed that when it sees this string it knows that this is an internal listener.
Line 29 as this POC is an internal Kafka ensemble we can use PLAINTEXT for the security protocol. This is setting it to no encryption or authourisation. This will be changed to SSL for any production cluster.
Line 30 uses Kubernetes env vars to point to our zookeeper pods. The zookeeper deployment uses a Kubernetes service for gateway access. This is all we need to configure the broker. This gives us the option of *n *zookeepers. We’re just using 3 in this POC
After running the steps in on the GitHub page we will have a sucesfully running Kafka/Zookeeper ensemble within a Kubernetes cluster. The ensemble will be in a waiting state. Part 3 of these articles will discuss and demonstrate how to use the cluster, with Casper node events.
Top comments (0)