Abhishek Gupta for Microsoft Azure

Posted on Jun 8, 2020 • Edited on Jun 9, 2020

Kafka on Kubernetes, the Strimzi way! (Part 1)

#kubernetes #kafka #tutorial

Some of my previous blog posts (such as Kafka Connect on Kubernetes, the easy way!), demonstrate how to use Kafka Connect in a Kubernetes-native way. This is the first in a series of blog posts which will cover Apache Kafka on Kubernetes using the Strimzi Operator. In this post, we will start off with the simplest possible setup i.e. a single node Kafka (and Zookeeper) cluster and learn:

Strimzi overview and setup
Kafka cluster installation
Kubernetes resources used/created behind the scenes
Test the Kafka setup using clients within the Kubernetes cluster

The code is available on GitHub - https://github.com/abhirockzz/kafka-kubernetes-strimzi

What do I need to try this out?

kubectl - https://kubernetes.io/docs/tasks/tools/install-kubectl/

I will be using Azure Kubernetes Service (AKS) to demonstrate the concepts, but by and large it is independent of the Kubernetes provider (e.g. feel free to use a local setup such as minikube). If you want to use AKS, all you need is a Microsoft Azure account which you can get for FREE if you don't have one already.

Install `Helm`

I will be using Helm to install Strimzi. Here is the documentation to install Helm itself - https://helm.sh/docs/intro/install/

You can also use the YAML files directly to install Strimzi. Check out the quick start guide here - https://strimzi.io/docs/quickstart/latest/#proc-install-product-str

(optional) Setup Azure Kubernetes Service

Azure Kubernetes Service (AKS) makes it simple to deploy a managed Kubernetes cluster in Azure. It reduces the complexity and operational overhead of managing Kubernetes by offloading much of that responsibility to Azure. Here are examples of how you can setup an AKS cluster using

Azure CLI, - Azure portal or
ARM template

Once you setup the cluster, you can easily configure kubectl to point to it

az aks get-credentials --resource-group <CLUSTER_RESOURCE_GROUP> --name <CLUSTER_NAME>

Wait, what is `Strimzi`?

from the Strimzi documentation

Strimzi simplifies the process of running Apache Kafka in a Kubernetes cluster. Strimzi provides container images and Operators for running Kafka on Kubernetes. It is a part of the Cloud Native Computing Foundation as a Sandbox project (at the time of writing)

Strimzi Operators are fundamental to the project. These Operators are purpose-built with specialist operational knowledge to effectively manage Kafka. Operators simplify the process of: Deploying and running Kafka clusters and components, Configuring and securing access to Kafka, Upgrading and managing Kafka and even taking care of managing topics and users.

Here is a diagram which shows a 10,000 feet overview of the Operator roles:

Install Strimzi

Installing Strimzi using Helm is pretty easy:

//add helm chart repo for Strimzi
helm repo add strimzi https://strimzi.io/charts/

//install it! (I have used strimzi-kafka as the release name)
helm install strimzi-kafka strimzi/strimzi-kafka-operator

This will install the Strimzi Operator (which is nothing but a Deployment), Custom Resource Definitions and other Kubernetes components such as Cluster Roles, Cluster Role Bindings and Service Accounts

For more details, check out this link

To delete, simply helm uninstall strimzi-kafka

To confirm that the Strimzi Operator had been deployed, check it's Pod (it should transition to Running status after a while)

kubectl get pods -l=name=strimzi-cluster-operator

NAME                                        READY   STATUS    RESTARTS   AGE
strimzi-cluster-operator-5c66f679d5-69rgk   1/1     Running   0          43s

Check the Custom Resource Definitions as well:

kubectl get crd | grep strimzi

kafkabridges.kafka.strimzi.io           2020-04-13T16:49:36Z
kafkaconnectors.kafka.strimzi.io        2020-04-13T16:49:33Z
kafkaconnects.kafka.strimzi.io          2020-04-13T16:49:36Z
kafkaconnects2is.kafka.strimzi.io       2020-04-13T16:49:38Z
kafkamirrormaker2s.kafka.strimzi.io     2020-04-13T16:49:37Z
kafkamirrormakers.kafka.strimzi.io      2020-04-13T16:49:39Z
kafkas.kafka.strimzi.io                 2020-04-13T16:49:40Z
kafkatopics.kafka.strimzi.io            2020-04-13T16:49:34Z
kafkausers.kafka.strimzi.io             2020-04-13T16:49:33Z

kafkas.kafka.strimzi.io CRD represents Kafka clusters in Kubernetes

Now that we have the "brain" (the Strimzi Operator) wired up, let's use it!

Time to create a Kafka cluster!

As mentioned, we will keep things simple and start off with the following setup (which we will incrementally update as a part of subsequent posts in this series):

A single node Kafka cluster (and Zookeeper)
Available internally to clients in the same Kubernetes cluster
No encryption, authentication or authorization
No persistence (uses emptyDir volume)

To deploy a Kafka cluster all we need to do is create a Strimzi Kafka resource. This is what it looks like:

apiVersion: kafka.strimzi.io/v1beta1
kind: Kafka
metadata:
  name: my-kafka-cluster
spec:
  kafka:
    version: 2.4.0
    replicas: 1
    listeners:
      plain: {}
    config:
      offsets.topic.replication.factor: 1
      transaction.state.log.replication.factor: 1
      transaction.state.log.min.isr: 1
      log.message.format.version: "2.4"
    storage:
      type: ephemeral
  zookeeper:
    replicas: 1
    storage:
      type: ephemeral

For a detailed Kafka CRD reference, please check out the documentation - https://strimzi.io/docs/operators/master/using.html#type-Kafka-reference

We define the name (my-kafka-cluster) of cluster in metadata.name. Here is a summary of attributes in spec.kafka:

version - The Kafka broker version (defaults to 2.5.0 at the time of writing, but we're using 2.4.0)
replicas - Kafka cluster size i.e. the number of Kafka nodes (Pods in the cluster)
listeners - Configures listeners of Kafka brokers. In this example we are using the plain listener which means that the cluster will be accessible to internal clients (in the same Kubernetes cluster) on port 9092 (no encryption, authentication or authorization involved). Supported types are plain, tls, external (See https://strimzi.io/docs/operators/master/using.html#type-KafkaListeners-reference). It is possible to configure multiple listeners (we will cover this in subsequent blogs posts)
config - These are key-value pairs used as Kafka broker config properties
storage - Storage for Kafka cluster. Supported types are ephemeral, persistent-claim and jbod. We are using ephemeral in this example which means that the emptyDir volume is used and the data is only associated with the lifetime of the Kafka broker Pod (a future blog post will cover persistent-claim storage)

Zookeeper cluster details (spec.zookeeper) are similar to that of Kafka. In this case we just configuring the no. of replicas and storage type. Refer to https://strimzi.io/docs/operators/master/using.html#type-ZookeeperClusterSpec-reference for details

To create the Kafka cluster:

kubectl apply -f https://raw.githubusercontent.com/abhirockzz/kafka-kubernetes-strimzi/master/part-1/kafka.yaml

What's next?

The Strimzi operator spins into action and creates many Kubernetes resources in response to the Kafka CRD instance we just created.

The following resources are created:

StatefulSet - Kafka and Zookeeper clusters are exist in the form of StatefulSets which is used to manage stateful workloads in Kubernetes. Please refer https://kubernetes.io/docs/concepts/workloads/controllers/statefulset/ and related material for details
Service - Kubernetes ClusterIP Service for internal access
ConfigMap - Kafka and Zookeeper configuration is stored in Kubernetes ConfigMaps
Secret - Kubernetes Secrets to store private keys and certificates for Kafka cluster components and clients. These are used for TLS encryption and authentication (covered in subsequent blog posts)

Kafka Custom Resource

kubectl get kafka

NAME               DESIRED KAFKA REPLICAS   DESIRED ZK REPLICAS
my-kafka-cluster   1                        1

`StatefulSet` and `Pod`

Check Kafka and Zookeeper StatefulSets using:

kubectl get statefulset/my-kafka-cluster-zookeeper
kubectl get statefulset/my-kafka-cluster-kafka

Kafka and Zookeeper Pods

kubectl get pod/my-kafka-cluster-zookeeper-0
kubectl get pod/my-kafka-cluster-kafka-0

`ConfigMap`

Individual ConfigMaps are created to store Kafka and Zookeeper configurations

kubectl get configmap

my-kafka-cluster-kafka-config         4      19m
my-kafka-cluster-zookeeper-config     2      20m

Let's peek into the Kafka configuration

kubectl get configmap/my-kafka-cluster-kafka-config -o yaml

The output is quite lengthy but I will highlight the important bits. As part of the data section, there are two config properties for the Kafka broker - log4j.properties and server.config.

Here is a snippet of the server.config. Notice the advertised.listeners (highlights the internal access over port 9092) and User provided configuration (the one we specified in the yaml manifest)

    ##############################
    ##############################
    # This file is automatically generated by the Strimzi Cluster Operator
    # Any changes to this file will be ignored and overwritten!
    ##############################
    ##############################

    broker.id=${STRIMZI_BROKER_ID}
    log.dirs=/var/lib/kafka/data/kafka-log${STRIMZI_BROKER_ID}

    ##########
    # Plain listener
    ##########

    ##########
    # Common listener configuration
    ##########
    listeners=REPLICATION-9091://0.0.0.0:9091,PLAIN-9092://0.0.0.0:9092
    advertised.listeners=REPLICATION-9091://my-kafka-cluster-kafka-${STRIMZI_BROKER_ID}.my-kafka-cluster-kafka-brokers.default.svc:9091,PLAIN-9092://my-kafka-cluster-kafka-${STRIMZI_BROKER_ID}.my-kafka-cluster-kafka-brokers.default.svc:9092
    listener.security.protocol.map=REPLICATION-9091:SSL,PLAIN-9092:PLAINTEXT
    inter.broker.listener.name=REPLICATION-9091
    sasl.enabled.mechanisms=
    ssl.secure.random.implementation=SHA1PRNG
    ssl.endpoint.identification.algorithm=HTTPS

    ##########
    # User provided configuration
    ##########
    log.message.format.version=2.4
    offsets.topic.replication.factor=1
    transaction.state.log.min.isr=1
    transaction.state.log.replication.factor=1

`Service`

If you query for Services, you should see something similar to this:

kubectl get svc

NAME                                TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)
my-kafka-cluster-kafka-bootstrap    ClusterIP   10.0.240.137   <none>        9091/TCP,9092/TCP
my-kafka-cluster-kafka-brokers      ClusterIP   None           <none>        9091/TCP,9092/TCP
my-kafka-cluster-zookeeper-client   ClusterIP   10.0.143.149   <none>        2181/TCP
my-kafka-cluster-zookeeper-nodes    ClusterIP   None           <none>        2181/TCP,2888/TCP,3888/TCP

my-kafka-cluster-kafka-bootstrap makes it possible for internal Kubernetes clients to access the Kafka cluster and my-kafka-cluster-kafka-brokers is the Headless service corresponding to the StatefulSet

`Secret`

Although we're not using them, it's helpful to look at the Secrets created by Strimzi:

kubectl get secret

my-kafka-cluster-clients-ca               Opaque
my-kafka-cluster-clients-ca-cert          Opaque
my-kafka-cluster-cluster-ca               Opaque
my-kafka-cluster-cluster-ca-cert          Opaque
my-kafka-cluster-cluster-operator-certs   Opaque
my-kafka-cluster-kafka-brokers            Opaque
my-kafka-cluster-kafka-token-vb2qt        kubernetes.io/service-account-token
my-kafka-cluster-zookeeper-nodes          Opaque
my-kafka-cluster-zookeeper-token-xq8m2    kubernetes.io/service-account-token

my-kafka-cluster-cluster-ca-cert - Cluster CA certificate to sign Kafka broker certificates, and is used by a connecting client to establish a TLS encrypted connection
my-kafka-cluster-clients-ca-cert - Client CA certificate for a user to sign its own client certificate to allow mutual authentication against the Kafka cluster

Ok, but does it work?

Let's take it for a spin!

Create a producer Pod:

export KAFKA_CLUSTER_NAME=my-kafka-cluster

kubectl run kafka-producer -ti --image=strimzi/kafka:latest-kafka-2.4.0 --rm=true --restart=Never -- bin/kafka-console-producer.sh --broker-list $KAFKA_CLUSTER_NAME-kafka-bootstrap:9092 --topic my-topic

In another terminal, create a consumer Pod:

export KAFKA_CLUSTER_NAME=my-kafka-cluster

kubectl run kafka-consumer -ti --image=strimzi/kafka:latest-kafka-2.4.0 --rm=true --restart=Never -- bin/kafka-console-consumer.sh --bootstrap-server $KAFKA_CLUSTER_NAME-kafka-bootstrap:9092 --topic my-topic --from-beginning

The above demonstration was taken from the Strimzi doc - https://strimzi.io/docs/operators/master/deploying.html#deploying-example-clients-str

You can use other clients as well

We're just getting started...

We started small, but have a Kafka cluster on Kubernetes, and it works (hopefully for you as well!). As I mentioned before, this is the beginning of a multi-part blog series. Stay tuned for upcoming posts where we will explore other aspects such as external client access, TLS access, authentication, persistence etc.

Top comments (2)

yasinmoha • Oct 11 '22

Hi...

I get an error when I am installing Kafka cluster.

error: resource mapping not found for name: "my-kafka-cluster" namespace: "" from "raw.githubusercontent.com/abhirock... no matches for kind "Kafka" in version "kafka.strimzi.io/v1beta1

yasinmoha • Oct 11 '22

okay its v1beta2