DEV Community: Elias Vakkuri

Clusterception Part 3: Getting started with Kafka

Elias Vakkuri — Mon, 19 Dec 2022 10:20:31 +0000

This post is part of a series on running Kafka on Kubernetes on Azure. You can find links to other posts in the series here. All code is available in my Github.

In this part, we'll install Kafka to our Kubernetes cluster using Strimzi and try it out.

As discussed in part 1 of this series, running Kafka on Kubernetes widens the choice of deployment platform. Managed Kubernetes is provided by almost every cloud provider, whereas managed Kafka is rarer.

Strimzi makes it quite simple to start up your Kafka cluster. Note that Strimzi is not the only option for running Kafka in Kubernetes; another prominent alternative is Confluent for Kubernetes.

Installing Strimzi

I'll install Strimzi on the Azure Kubernetes Cluster that we set up in the previous part of this series. We'll follow Strimzi's quickstart for installing Strimzi. Select the "Minikube" option; it should work precisely similarly against AKS.

I'll first create a new namespace for our Kafka resources:

kubectl create namespace kafka

With version 0.32.0, Strimzi has introduced a one-line command for installation:

kubectl create \
  -f 'https://strimzi.io/install/latest?namespace=kafka' \
  -n kafka

The script deploys the Strimzi Cluster Operator, which will run and administer the Kafka resources, and the required Kubernetes users and rights for it to function. In addition, the process installs several Custom Resource Definitions or CRDs. These enable declarative deployment of Kafka resources supported by Strimzi.

For convenience, I'll set kafka as the default namespace to avoid having to write it out each time:

kubectl config set-context --current --namespace=kafka

You can also check out the CRDs that were installed:

kubectl get crd

The console shows several resources with the word kafka in them. If interested, you can get further details with kubectl describe. For example:

kubectl describe crd kafkas.kafka.strimzi.io

Now that is a long one! Lucky that you don't need to implement all that yourself. 😄

Creating a Kafka cluster

With the CRDs created, I can deploy a Kafka cluster using a single resource definition in a YAML. I'll start with the sample YAML provided by Strimzi in their quickstart, linked previously. All scripts used in this post are also available in the series' Github repository.

The YAML is as follows:

There are a lot of possible configurations when setting up the cluster. I'll not go into those in this post; that's a possible topic for the future. 🙂

I'll create the cluster with kubernetes apply:

kubectl apply -f kafka-cluster.yaml

You can have a look at the Kubernetes Services that this created:

kubectl get service

There are many services with the name of your cluster prefixed; if you used the sample YAML, the prefix is my-cluster. These services include:

ZooKeeper: ZooKeeper is Apache's general-purpose orchestrator for distributed services, used in Kafka and several other services like Hadoop. Note that you shouldn't need to interact with this directly; it just works in the background. Also, Strimzi is working on removing this dependency to simplify the setup even further.
Kafka Brokers: As discussed in part 1, brokers are the actual worker servers containing all the topics and messages in Kafka. You can think of them as comparable to "nodes" in most other distributed services. We only have one broker in our setup, but we could scale up our cluster by simply increasing the value in spec.kafka.replicas in the YAML.
Bootstrap: Strimzi simplifies connecting to Kafka by providing a bootstrap service. You only need to provide this service for any client process, and the Kafka protocol will connect to the broker containing your target topic.

Now, you don't see an External IP on any of these services, and you'll need one for connecting to Kafka from outside Kubernetes. For this, you need to add an external listener. This is luckily easy to do - I'll add the following entry to spec.kafka.listeners:

- name: external
  port: 9094
  type: loadbalancer
  tls: false

If you now apply the YAML and list your services, you'll see a service with an external IP for your broker and an external bootstrap service. The external bootstrap works the same as the internal bootstrap already added - you can use this as the single entry point for clients.

I now created an external listener of type "LoadBalancer", but there are other types. You can find more information in this series of posts by Strimzi.

WARNING: You'll also see that tls is set to false. This means that communication to Kafka is not encrypted, so it's highly insecure. I'll return to this topic in the next part of this series, where I configure security for the Kafka cluster.

For now, let's continue with these settings; however, don't send anything sensitive to your Kafka. After testing, you can remove the external listener or stop your AKS cluster to limit exposure.

Testing connectivity

Following the Strimzi quickstart, you'll find instructions for testing the Kafka cluster from inside Kubernetes. For this post, you can try it out also from outside Kubernetes with the external listener. You can do this do this with the same Docker image used in the quickstart, but with local Docker instead of Kubernetes - so you'll need Docker running.

First, get the external IP of your external bootstrap service, for example, my-cluster-kafka-external-bootstrap. With this in hand, start the console producer:

docker run -it --rm --name kafka-producer quay.io/strimzi/kafka:0.32.0-kafka-3.3.1 bin/kafka-console-producer.sh --bootstrap-server EXTERNAL_BOOTSTRAP_IP:9094 --topic my-topic

The setting -it connects an interactive terminal to the running container, and --rm automatically removes the container when you exit the console.

In another terminal, start the consumer:

docker run -it --rm --name kafka-consumer quay.io/strimzi/kafka:0.32.0-kafka-3.3.1 bin/kafka-console-consumer.sh --bootstrap-server EXTERNAL_BOOTSTRAP_IP:9094 --topic my-topic --from-beginning

Write a message in the producer console, and you should see it appear in the consumer console. If so, you now have a functioning Kafka cluster in AKS! 🎉

That's it for this post. Next time we'll look into setting up security for Kafka - see you then!

Clusterception Part 2: Initial Azure setup

Elias Vakkuri — Sun, 04 Dec 2022 16:46:06 +0000

This post is part of a series on running Kafka on Kubernetes on Azure. You can find links to other posts in the series here. All code is available in my Github.

In part 2 of the series, I will set up a VNET, Azure Container Registry, and Azure Kubernetes Service (AKS). The end goal is to set up the infrastructure so that we're ready to start deploying Kafka in the next part.

Overview

For AKS, there are a LOT of different settings that you can tweak when provisioning your cluster - look at the length of the resource format definition. Especially in networking there are many options.

To keep this post to a reasonable length, I will focus on AKS, and the following topics:

Securing access to AKS's control plane APIs (if there's one thing you secure, make it this one)
Integration between AKS and Container Registry
Expandability for future services

As I am creating a development cluster, we do not require production-grade settings for sensitive data. I'll look at configurations that are simple to implement and use and simultaneously increase your cluster's security.

By the end, my architecture will look like this (diagram created with draw.io and Azure stencil):

I have drawn AKS control plane outside of the VNET, as the control plane's nodes usually live in a separate VNET (and subscription) managed by Microsoft. There is in preview an option to deploy also the control plane to your VNET, but I don't see a reason for that here.

Using Bicep

I will set up the infrastructure in Bicep language. I mainly use Terraform in projects, but Bicep looks like a good alternative for Azure-only setups. Over the older ARM templates, Bicep has a number of benefits:

Easier to read and write: First and foremost, Bicep is much easier to read and write than ARM templates. I foresee much less wasted time figuring out where a curly brace is missing when validation won't pass. Also, the autocomplete and suggestions in VS Code work great.
Decorators: You can use parameter decorators as one easy way to document your solution and add validations. In the full templates, I use decorators to add a description and to set a list of allowed values.
Named subresources: We create the subnets as part of the VNET declaration, but then we immediately retrieve the subnet resources using the existing keyword. This way, we can refer to the subnet directly, as we'll see when creating the AKS cluster.
Simpler modules: When using ARM templates, you need to deploy the linked templates somewhere Azure can reach them. With Bicep, Azure CLI combines all the modules into 1 ARM template, so it's enough to have all the templates locally available.

One big drawback of Bicep against Terraform is that you can only manage Azure resources with Bicep. You need to manage Azure AD objects like users, groups, or service principals in another way, like directly with Azure CLI. Separating Azure and Azure AD feels like a weird division between services, but I guess Microsoft has its reasons. 🙂

Code

I will not post the full templates in this post, as you can find them in my Github. Instead, I'll post smaller snippets below relevant to my topics.

I'll deploy AKS and closely related resources via a separate template. I then call the AKS template from my main template as a module. I also output AKSs identities from the module and use those to assign relevant accesses to our Container Registry. I'll explain this more closely further down.

Let's look at the most relevant pieces of the AKS template.

Linking the Container Registry

The container images I will run in AKS need to come from somewhere. Public images might come from Docker Hub, for example, but private images are usually stored in Azure Container Registry. This connection will need to be set up for AKS to work.

AKS pods and deployments use a kubelet identity to authenticate to container registries. When creating an AKS cluster via Azure CLI, there is the option --attach-acr - this deploys a user-assigned managed identity, assigns it as the kubelet identity in AKS, and gives it the AcrPull role in the Container Registry. The managed identity is created in the cluster's resource group, so users might not have access to the actual resource.

With Bicep, I'll need to manually create and assign the kubelet identity. In addition, the cluster's control plane identity needs specific RBAC rights to manage the kubelet identity. Therefore, I'll use user-assigned managed identities for both. I set the cluster identity in identity and kubelet identity in properties.identityProfile.

Finally, I'll output the identities so I can use them for role assignments in the main template.

Networking for services

AKS has two main networking modes: Kubenet and Azure CNI. On a very high level, with Kubenet, only nodes get IPs from your subnet, and for pods, some address translation magic happens. With Azure CNI, both nodes and pods get assigned IPs. There are benefits to pods having IPs, like more transparent networking, but it also means that you will burn through your IP ranges much faster and require more planning with the networks you use.

Previously only Azure CNI supported using a custom VNET, and Kubenet was only suggested for development and testing environments. Nowadays, also Kubenet supports custom VNETs and based on the documentation, both are fine for production deployments.

Which to choose? It depends on your specific circumstances. I'm not an expert in the area, so I'll not go into too much detail now. I don't have to worry too much about IP ranges for this blog, so I'll go with Azure CNI.

The only thing we need to configure is that the AKS internal IP ranges don't overlap with the IP ranges in our VNET. The actual values don't matter in our case; I just picked some I saw in the documentation.

We'll use dynamic allocation of IPs for pods. With this option, Azure deploys nodes and pods to separate subnets, and pod IPs are assigned dynamically from their subnet. This has several benefits, as outlined in the documentation. In Bicep, I only need to create a separate subnet and assign it to pods in the agentPoolProfiles section.

Finally, I'll set publicNetworkAccess as Enabled, as I want to reach the cluster and its services from my laptop. As a side note, there are also separate settings to create a private cluster. I'm not entirely sure how these settings relate to one another - I might investigate this more in a future post.

Authentication: `aadProfile` and `disableLocalAccounts`

As mentioned earlier, we want to limit access to AKS's control plane. AKS has a pretty nice-looking integration with Azure AD nowadays, so I'll use that for both authentication and authorization for admin operations.

I enable Azure AD for control plane operations in aadProfile.enableAzureRBAC and disable local Kubernetes admin accounts with disableLocalAccounts. This way, control plane operations are authorized via Azure AD, simplifying maintenance.

Setting aadProfile.managed is related to how AKS links with Azure AD internally. In terms of using the cluster, it shouldn't matter. However, managed is the newer setting, so I'll set it on.

Finally, in aadProfile.adminGroupObjectIDs, we assign an admin group for the cluster. We provide the object ID of the group as a parameter. You can achieve the same result by assigning the "Azure Kubernetes Service RBAC Cluster Admin" role to any Azure AD identity you wish.

`apiServerAccessProfile`

I'll set some IP range restrictions for API server access so that kubectl commands can only be run from these IP ranges. I provide them as parameters to the main profile and then pass them to the AKS module.

I could also disable access to the control plane from public networks altogether, which would be preferable for production deployments. However, for this series, we're not dealing with anything sensitive, plus I don't want the hassle of creating jump machines or VPN connections. As such, I consider Azure AD authentication plus IP range restriction a good combination.

`autoUpgradeProfile`

Here, I set automatic upgrades for the Kubernetes version. AKS supports specifying the desired level of automatic version upgrades - I'll go with the suggested "patch" level.

Deploying AKS as a module

Finally, as mentioned previously, I deploy AKS and related resources as a submodule and assign access rights to the Container Registry for the kubelet identity.

What I didn't do

There's a lot more that you could do by tweaking the properties. One thing that might be a good addition would be to enable SSH access to the cluster nodes for maintenance scenarios. However, this also requires securing network access properly, which goes off this post's focus. We'll revisit the topic later on if needed.

Also, additional security options are available, like Microsoft Defender for Cloud. This sounds like a good idea for production deployments, but I don't see it as necessary for this post.

Testing

Let's deploy a service to our cluster to verify that everything is running as expected. We'll follow the Microsoft tutorial.

First, let's login to the cluster API server with kubelogin and check connectivity:

az aks get-credentials \
  --resource-group clusterception \
  --name clusterceptionaks

KUBECONFIG=<path to kubeconfig> kubelogin convert-kubeconfig

kubectl get services

With that working correctly, let's log in to our Container Registry and push the tutorial frontend image there. This way, we can test connectivity between the cluster and the registry.

az acr login --name clusterceptionacr

docker pull mcr.microsoft.com/azuredocs/azure-vote-front:v1

docker tag \
  mcr.microsoft.com/azuredocs/azure-vote-front:v1 \
  clusterceptionacr.azurecr.io/azure-vote-front:v1

docker push clusterceptionacr.azurecr.io/azure-vote-front:v1

Finally, let's apply the YAML with the Deployment and Service definitions and using the image in our Container Registry as explained in the tutorial instructions.

kubectl apply -f sample-app/azure-vote.yaml

You can get the external IP of the frontend service by running the following:

kubectl get service azure-vote-front

If this command does not return an IP, wait for a few minutes, then try again. Once you have the IP, navigate there, and the voting app front should greet you. Great stuff!

Closing words

What a long article! This really goes to show the depth of configuration options in AKS. Microsoft has done much work to simplify the setup, but for long-term operation and production deployments, you often need a dedicated team that understands all the knobs and levers. This gap in the required level of knowledge is one of the reasons why I usually prefer PaaS services.

In any case, I hope you got something out of this post! Please join me for the next part when we deploy Kafka to our AKS cluster.

Clusterception Part 1: Introduction

Elias Vakkuri — Thu, 01 Dec 2022 19:46:17 +0000

This post is part of a series on running Kafka on Kubernetes on Azure. You can find links to other posts in the series here. All code is available in my Github.

In this first part, I'll introduce the central technologies used in the rest of the series.

Motivation

I usually work with Platform-as-a-Service (PaaS) by choice. In Azure this means running application in, for example, Azure Web Apps or Azure Functions instead of Kubernetes.

I like having the cloud provider take care of hairy details like certificates and intra-cluster communications. I like using Azure AD for all authentication and authorization, both for users and services, instead of setting up certificate authorities and worrying about rotating keys. I like sensible defaults instead of a laundry list of possible configurations.

I will need to look at many of these hairy details during this series. As such, I will not be exactly in my comfort zone.

So why learn about all of this stuff? Well, firstly, both Kafka and Kubernetes are wildly popular technologies used in many organizations, big and small. So from a pure market value point of view, there are worse things you could spend your time on learning.

Secondly, even though many PaaS services hide the details from us, getting to know what happens behind the curtain is useful. It will enable better decision-making with a better idea of the tradeoffs, help debug weird errors, and give an understanding of what it would take to run these technologies outside of a cloud environment.

In summary, I am excited about what's to come!

The main characters

Let's introduce the two main cluster types that I will be discussing. Now, I'm taking a very simplified, user-centric view of both of these. So don't get me wrong, I greatly appreciate both of these as feats of engineering, but I'll avoid details in this post.

Kafka

Apache Kafka is, in its own words, a "distributed event streaming platform". On a very high level, you have a bunch of topics hosted on brokers, to which producers send messages and from which consumers read messages. From Kafka's point of view, messages are just bytes, so they can be almost anything - it's up to the producers and consumers to assign meaning to the byte stream. These core services allow you to build elaborate systems that pass and process messages between applications.

Kafka is, by design, relatively simple in terms of its services. However, there is a Kafka ecosystem of other services that integrate with Kafka and offer crucial extensions to functionality. Examples include Schema Registry for defining message structure between producers and consumers and Kafka Connect for configuration-based integrations between Kafka and other systems. I will be looking at these in later parts of this blog series.

Kubernetes

Kubernetes, on the other hand, is "an open-source system for automating deployment, scaling, and management of containerized applications". What Kubernetes tries to solve is how to distribute available compute capacity to applications, how to make sure the applications keep running during software and hardware breakages, and how to expose the applications inside and outside of the cluster in a structured way.

In a high-level workflow, you put one or more containers that need to work together into a pod, organize one or more pods into a deployment that defines, for example, the resource allocation, and then expose the deployment as a service. Again, very simplified - there are loads more core concepts in Kubernetes and an infinite amount of extensions and abstractions you can install to your cluster. I will discuss examples later on in this series.

So why run Kafka on Kubernetes?

Every cloud provider has a managed Kubernetes offering available. However, managed Kafka is rare; out of the big players, only AWS has a managed Kafka offering. Therefore, Kafka on Kubernetes allows a broader selection of cloud service providers.

There are also good implementations available to get started quickly. I will be using Strimzi during this series.

Why not go with Azure Event Hubs for Kafka?

Based on the documentation, Azure Event Hub offers transparent support for Kafka workloads, plus a schema registry to boot. So in principle, I could use Event Hubs and forget about running Kafka on Kubernetes altogether.

However, there are two reasons why I'm going with Kubernetes at this stage. Firstly, if you have a hybrid scenario where your solution needs to run on actual Kafka, you'll need to know eventually about many things that you can forget about when using Event Hubs. So better to eat the frog early and develop against something as close to the runtime environment as possible.

Secondly, through this series, I will look at several Kafka ecosystem components that need to run somewhere. So I'll need a platform for the other components, and Kubernetes is a sensible choice, especially for hybrid scenarios.

If, however, you are migrating an on-premise Kafka cluster completely to Azure, then Event Hubs and, for example, Container Apps can make an architecture that's easier to manage. That's something I might revisit in a later post. :)

Hopefully, you found this short introduction interesting! Do join me for part 2 in this series, where I'll set up the initial Azure infrastructure (coming soon)

Clusterception: Running Kafka on Kubernetes (on Azure)

Elias Vakkuri — Thu, 01 Dec 2022 18:26:54 +0000

This is a collection of posts on running Kafka on Kubernetes, based on my journey into the topic. In the end, we will be running a cluster (or actually several different clusters) on a cluster - hence, "Clusterception".

Please find the links to the individual posts below. I wish you safe travels and hopefully also some learnings!

Part 1: Introduction
Part 2: Initial Azure setup
Part 3: Setting up Kafka on Kubernetes with Strimzi (coming soon)

Further parts to be added!

DEV Community: Elias Vakkuri

Clusterception Part 3: Getting started with Kafka

Installing Strimzi

Creating a Kafka cluster

Testing connectivity

Clusterception Part 2: Initial Azure setup

Overview

Using Bicep

Code

Linking the Container Registry

Networking for services

Authentication: aadProfile and disableLocalAccounts

apiServerAccessProfile

autoUpgradeProfile

Deploying AKS as a module

What I didn't do

Testing

Closing words

Clusterception Part 1: Introduction

Motivation

The main characters

Kafka

Kubernetes

So why run Kafka on Kubernetes?

Why not go with Azure Event Hubs for Kafka?

Clusterception: Running Kafka on Kubernetes (on Azure)

Authentication: `aadProfile` and `disableLocalAccounts`

`apiServerAccessProfile`

`autoUpgradeProfile`