Shawn Seymour

Posted on May 27, 2020 • Originally published at devshawn.com on May 27, 2020

Automating Kafka Topic & ACL Management

#kafka #gitops #cicd #devops

This post was originally published on my personal blog.

Apache Kafka, a distributed streaming platform, has become a key component of many organization's infrastructure and data platforms. As adoption of Kafka grows across an organization, it is important to manage the creation of topics and access control lists (ACLs) in a centralized and standardized manner. Without proper procedures around topic and ACL management, Kafka clusters can quickly become hard to manage from a data governance and security standpoint.

Today, I'll be discussing how to automate Kafka topic and ACL management and how it can be done with a continuous integration/continuous delivery (CI/CD) pipeline. I'll explain how to do this while following GitOps patterns: all topics and ACLs will be stored in version control. This is a model followed by many companies, both small and large, and can be applied to any Kafka cluster.

Although I'll be discussing in terms of organizations, these processes can be applied to local development clusters and smaller Kafka implementations as well.

Background

As most developers who have used Kafka know, it is quite easy to create topics. They can be created through a single usage of the kafka-topics tool or with various user interfaces. Before jumping into our tutorial, let's dive into some background.

Automatic Topic Creation

Outside of the tools mentioned above, it is even easier to create topics – they are automatically created due to the broker configuration auto.create.topics.enabled being set to true by default. Although this configuration makes it easy to create topics, it is considered by most to be a bad practice. With some platforms, such as Confluent Cloud, it is even impossible to enable auto topic creation.

Allowing the automatic creation of topics can be problematic:

Security and access control become a lot harder to manage.
Test topics and unused topics end up in the cluster and likely do not get cleaned up.
Any developer or any service can create topics without giving thought to proper partitioning and potential overhead.

Outside of a development cluster, every topic should have a purpose that is understood and has an underlying business need to justify its existence. Additionally, allowing automatic topic creation does not solve the need for creating and managing ACLs.

Manual Topic & ACL Creation

The next logical step most organizations take is to create topics manually through tools such as kafka-topics or Confluent Control Center. This usually happens when Kafka is fairly new to an organization or used by a small group of people, e.g. a team or two.

Manually creating topics and ACLs only works until the usage of Kafka within an organization starts to grow. There are typically two patterns that are followed by with manual topic creation:

Anyone has access: All developers/operations team members who can access the cluster can create topics as well as ACLs. This leads to topic naming standards and security best practices being thrown out the window. If anyone can make ACLs, there is no real security on the cluster.
Operations has access: A centralized operations team manages topics & ACLs manually through a change management/request process. Although this allows for some governance to be enforced, it leaves an operations team doing manual work.

A major issue with manual topic & ACL creation is that it is not repeatable. It may be enticing to use a web interface to quickly create topics, but more-often-than-not it becomes a pain-point in the future.

Imagine a scenario where you want to migrate to a new cluster or spin up a new environment; how easy is it to re-create all of the topics, topic configurations, and ACLs if they are not defined and easily accessible? It's pretty hard.

Automated Topic & ACL Creation

After manual topic & ACL creation becomes a limiting factor, teams usually seek to build tooling and automation around it. Most organizations in today's world are automating as much as they can. We see automation around immutable infrastructure, deploying applications, managing business processes, and much more.

The first step in automating the creation of Kafka resources is usually a simple python or bash script. Teams might define their topics and ACLs in files such as JSON or YAML. These scripts are then either run by teams themselves or included in a continuous integration process.

Unfortunately, these scripts are usually quick-and-dirty. They often cannot easily change topic configurations, delete unneeded topics, or provide insight into what your actual cluster has defined in terms of topics and ACLs. Lastly, ACLs can be quite verbose: it can be hard to understand the needed ACLs depending on the complexity of the application and its security needs (e.g. Kafka Connect is much more complicated than a simple consumer).

GitOps for Apache Kafka Management

GitOps, as commonly found in Kubernetes deployment models, is a pattern centered around using a version control system (such as Git) to house information and code describing a system. This information is then used in an automated fashion to make changes to infrastructure (such as deploying a new Kubernetes workload).

This pattern is essentially how most implementations of Terraformwork: infrastructure gets defined in Terraform state files, a plan with the desired changes is generated, and then the plan is executed to apply the desired changes.

Note: This blog post describes how to manage topics & ACLs with GitOps, and not an actual Apache Kafka cluster deployment.

Kafka GitOps

In this tutorial, I'll be introducing a tool called kafka-gitops. This project is a resources-as-code tool which allows users to automate the management of Apache Kafka topics and ACLs. Before we dive in, I'd like to introduce some terminology:

Desired State: A file describing what your Kafka cluster state should look like.
Actual State: The live state of what your Kafka cluster currently looks like.
A Plan: A set of topic and/or ACL changes to apply to your Kafka cluster.

Topics and services are defined in a YAML desired state file. When run, kafka-gitops compares your desired state to the actual state of the cluster and generates a plan to execute against the cluster. The plan will include any creates, updates, or deletes to topics, topic configurations, and ACLs. After validating the plan looks correct, it can be applied and will make your topics and ACLs match your desired state.

On top of topic management, if your cluster has security, kafka-gitops can generate the needed ACLs for most applications. There is no need to manually define a bunch of ACLs for Kafka Connect or Kafka Streams. By defining your services, kafka-gitops will build the applicable ACLs.

Automating Kafka Topic & ACL Management — Example kafka-gitops workflow

The major features of kafka-gitops compared to other management tools:

🚀 Built For CI/CD: Made for CI/CD pipelines to automate the management of topics & ACLs.
🔥 Configuration as code: Describe your desired state and manage it from a version-controlled declarative file.
👍 Easy to use: Deep knowledge of Kafka administration or ACL management is NOT required.
⚡️️ Plan & Apply: Generate and view a plan with or without executing it against your cluster.
💻 Portable: Works across self-hosted clusters, managed clusters, and even Confluent Cloud clusters.
🦄 Idempotency: Executing the same desired state file on an up-to-date cluster will yield the same result.
☀️ Continue from failures: If a specific step fails during an apply, you can fix your desired state and re-run the command. You can execute kafka-gitops again without needing to rollback any partial successes.

Automating Topics & ACLs via Kafka GitOps

I'll provide an overview of how kafka-gitops works and how it can be applied to any Kafka cluster. An in-depth tutorial on how to use it will be posted in the next blog post; otherwise, the documentation has a great getting started guide.

Reminder: This tool works on all newer Kafka clusters; including self-hosted Kafka, managed Kafka solutions, and Confluent Cloud.

Desired State File

Topics and services that interact with your Kafka cluster are defined in a YAML file, named state.yaml by default.

Example desired state file:

topics:
  test-topic:
    partitions: 6
    replication: 3
    configs:
      cleanup.policy: compact

services:
  test-service:
    type: application
    principal: User:testservice
    produces:
      - test-topic

This state file defines two things:

A compacted topic named test-topic with six partitions and a replication factor of three.
An application service named test-service tied to the principal User:testservice.

The type of the service tells kafka-gitops what type of ACLs to generate. In the case of application, it will generate the needed ACLs for producing to and/or consuming from its specified topics. In this case, kafka-gitops will generate a WRITE ACL for the topic test-topic.

Currently, we support three types of services: application, kafka-connect, and kafka-streams. Each service has a slightly different schema due to the nature of the service.

Example Kafka Streams service:

services:
  my-stream:
    type: kafka-streams
    principal: User:mystream
    consumes:
      - test-topic
    produces:
      - test-topic

Kafka Streams services have special ACLs included for managing internal streams topics.

Example Kafka Connect service:

services:
  my-connect-cluster:
    type: kafka-connect
    principal: User:myconnect
    connectors:
      rabbitmq-sink:
        consumes:
          - test-topic

Kafka Connect services have special ACLs for working with their internal topics as well as defined ACLs for each running connector.

Essentially, all topics and all services for a specific cluster get put into this YAML file. If you are not using security, such as on a local development cluster, you can omit the services block.

Note : For full examples and specific requirements for each service, read the services documentation page. The specification for the desired state file and its schema can be found on the specification documentation page.

Plan Changes To A Kafka Cluster

Once your desired state file is created, you can generate a plan of changes to be applied against the cluster.

Note : kafka-gitops is configured to connect to clusters via environment variables. See the documentation for more details.

This does NOT actually change the cluster. We can generate the plan by running:

kafka-gitops -f state.yaml plan -o plan.json

This will output a JSON file with the plan as well as a prettified output describing the changes. This is an example plan for the first state.yaml file described when including only the topics block:

Generating execution plan...

An execution plan has been generated and is shown below.

Resource actions are indicated with the following symbols:
  + create
  ~ delete

The following actions will be performed:

Topics: 1 to create, 0 to update, 0 to delete.
+ [TOPIC] test-topic

ACLs: 0 to create, 0 to update, 0 to delete.

Plan: 1 to create, 0 to update, 0 to delete.

If there are topics or ACLs on the cluster that are not in the desired state file, the plan will include changes to update and/or delete them.

Note: It is possible to disable deletion by passing the --no-delete flag after -f state.yaml.

Apply Changes To A Kafka Cluster

Once the plan is created, we can apply the changes to the cluster.

Warning: This WILL change the cluster to match the plan generated from the desired state file. Without the --no-delete flag, this can be destructive.

Changes are applied using the apply command:

kafka-gitops -f state.yaml apply -p plan.json

This will execute the changes to the running Kafka cluster and output the results.

Executing apply...

Applying: [CREATE]

+ [TOPIC] test-topic

Successfully applied.

[SUCCESS] Apply complete! Resources: 1 created, 0 updated, 0 deleted.

If there is a partial failure, successes will not be rolled back. Instead, fix the error in the desired state file or manually within the cluster and rerun plan and apply.

After a successful apply, you can re-run the plan command to generate a new plan – except this time, there should be no changes, since your cluster is up to date with your desired state file!

Additional Features

On top of the brief description of the features above, kafka-gitops supports:

Automatically creating Confluent Cloud service accounts.
Splitting the topics and services blocks into their own files.
Ignoring specific topics from being deleted when not defined in the desired state file.
Defining custom ACLs to a specific service (e.g. for a service such as Confluent Control Center).

Kafka Topic & ACL Automation Workflow

Now that we've had an overview of how kafka-gitops works, we can examine how to put this workflow into action within an organization. First, we can define typical roles within an organization:

Developers: Engineers who are writing applications and services utilizing Kafka.
Operations: Engineers who manage, monitor, and maintain Kafka infrastructure.
Security: Engineers who are responsible for security operations within an organization.

Next, we can define an example setup and process for a GitOps workflow. This is not a one-size-fits-all answer – a lot depends on the organization and culture; however, this is a generalized approach that will work well if implemented correctly.

Automation Workflow Overview

A scalable implementation of the kafka-gitops workflow within an organization looks like this:

All desired state files are stored within a repository owned by Operations.
Operations owns the master branch, which should reflect the live state of every cluster.
Developers fork this repository to make changes to their topics & services.
Developers create a pull request with their changes and mark it ready to review by Operations and Security.
Operations and Security review the changes and merge to master.
A CI/CD system kicks off a kafka-gitops plan build to generate a new plan.
(Optional) The plan output is reviewed by Operations, ensuring it looks correct.
The plan is then applied, either manually by Operations or automatically, through kafka-gitops apply. The desired changes will then be reflected in the live cluster and the cluster will match the desired state file in master.

As described above, all topics and services (which includes ACLs) are defined in version-controlled code. Developers are responsible for their topic and service definitions. Operations is responsible for managing the changes to the cluster (e.g. ensuring teams are not doing crazy things) as well as responsible for deploying the changes. Security is responsible for ensuring sensitive data is being properly locked down to the services that require it.

Setting Up The Workflow

Create a centralized git repository for storing Kafka cluster desired state files.
In that repository, create folders for each environment and/or cluster.
In each cluster's folder, create its state file. Define any existing topics, services, and ACLs.

Note: If adding this workflow to an existing Kafka cluster: the easiest way to get it set up is to continually run plan against the live cluster as you update the desired state file to contain the correct information. Continue to do this until there are no changes planned.

Setting Up CI/CD

Setting up CI/CD is highly dependent on which build system you are using. This is a general outline of how it could be configured:

Set up a main CI job that is triggered on changes to the master branch.
The main job should look for changes in each desired state file.
For each desired state file with a change, trigger a side job.
The side job(s) should utilize kafka-gitops plan to generate an execution plan.
(Optional) The side job(s) should then wait until Operations can review the generated plan.
The side job(s) should then utilize kafka-gitops apply to execute the planned changes to the specified Kafka cluster.

Benefits Of GitOps for Apache Kafka

Once the full process is in place, you gain many benefits that allow you to easily govern the clusters as the adoption of Kafka continues within an organization.

Developers have a well-defined process to follow to create topics & services.
Operations has control over what is changing within the cluster and can ensure standards are followed.
Security can easily audit and monitor access changes to data within the streaming platform.

Additionally, kafka-gitops provides:

A defined process to make any changes to the Kafka cluster; no manual steps.
A full audit log and history of changes to your cluster via version control.
Automatic ACL generation for common services, reducing time spent on security.
The ability to re-create a cluster's complete topic and ACL setup (e.g. for a new environment).

Limitations and Upcoming Features

Although kafka-gitops is actively being used in production, there are a few upcoming features to address some limitations:

The ability to set a custom group.id for consumers & streams applications (currently, this must match the service name)
The ability to set custom connect topic names (currently, this has a predefined pattern)
Tooling around creating the initial desired state file from existing clusters
Eventually, the optional ability to run it as-a-service to actively monitor for changes and source from locations such as git, AWS S3, etc.

Conclusion

Automating the management of Kafka topics and ACLs brings significant benefits to all teams working with Apache Kafka. Whether working with a large enterprise set of clusters or defining topics for your local development cluster, the GitOps pattern allows for easy, repeatable cluster resource definitions.

By adopting a GitOps pattern for managing your Kafka topics and ACLs, your organization can reduce time spent managing Kafka and spend more time providing value to your core business.

In some upcoming blog posts, I will be providing in-depth tutorials on using kafka-gitops with self-hosted clusters and with Confluent Cloud.

Top comments (1)

Brian • Jan 26 '24

Please send me the motherboard I paid you for.

DEV Community