<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shawn Seymour</title>
    <description>The latest articles on DEV Community by Shawn Seymour (@devshawn).</description>
    <link>https://dev.to/devshawn</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F140372%2F4b9d86c8-069f-4c07-8913-4fa40ce7b883.jpeg</url>
      <title>DEV Community: Shawn Seymour</title>
      <link>https://dev.to/devshawn</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/devshawn"/>
    <language>en</language>
    <item>
      <title>Automating Kafka Topic &amp; ACL Management</title>
      <dc:creator>Shawn Seymour</dc:creator>
      <pubDate>Wed, 27 May 2020 12:56:38 +0000</pubDate>
      <link>https://dev.to/devshawn/automating-kafka-topic-acl-management-1a4o</link>
      <guid>https://dev.to/devshawn/automating-kafka-topic-acl-management-1a4o</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my &lt;a href="https://devshawn.com/blog/automating-kafka-topic-and-acl-mangement/"&gt;personal blog&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Apache Kafka, a distributed streaming platform, has become a key component of many organization's infrastructure and data platforms.  As adoption of Kafka grows across an organization, it is important to manage the creation of topics and access control lists (ACLs) in a centralized and standardized manner. Without proper procedures around topic and ACL management, Kafka clusters can quickly become hard to manage from a data governance and security standpoint.&lt;/p&gt;

&lt;p&gt;Today, I'll be discussing how to automate Kafka topic and ACL management and how it can be done with a continuous integration/continuous delivery (CI/CD) pipeline. I'll explain how to do this while following GitOps patterns: all topics and ACLs will be stored in version control. This is a model followed by many companies, both small and large, and can be applied to any Kafka cluster.&lt;/p&gt;

&lt;p&gt;Although I'll be discussing in terms of organizations, these processes can be applied to local development clusters and smaller Kafka implementations as well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Background
&lt;/h2&gt;

&lt;p&gt;As most developers who have used Kafka know, it is quite easy to create topics. They can be created through a single usage of the &lt;code&gt;kafka-topics&lt;/code&gt; tool or with various user interfaces. Before jumping into our tutorial, let's dive into some background.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automatic Topic Creation
&lt;/h3&gt;

&lt;p&gt;Outside of the tools mentioned above, it is even easier to create topics – they are automatically created due to the broker configuration &lt;code&gt;auto.create.topics.enabled&lt;/code&gt; being set to &lt;code&gt;true&lt;/code&gt; by default. Although this configuration makes it easy to create topics, it is considered by most to be a bad practice. With some platforms, such as Confluent Cloud, it is even &lt;a href="https://riferrei.com/2020/03/17/why-the-property-auto-create-topics-enable-is-disabled-in-confluent-cloud/"&gt;impossible to enable auto topic creation&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Allowing the automatic creation of topics can be problematic:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Security and access control become &lt;strong&gt;a lot&lt;/strong&gt; harder to manage.&lt;/li&gt;
&lt;li&gt;Test topics and unused topics end up in the cluster and likely do not get cleaned up.&lt;/li&gt;
&lt;li&gt;Any developer or any service can create topics without giving thought to proper partitioning and potential overhead.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Outside of a development cluster, &lt;strong&gt;every topic should have a purpose that is understood and has an underlying business need to justify its existence&lt;/strong&gt;. Additionally, allowing automatic topic creation does not solve the need for creating and managing ACLs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Manual Topic &amp;amp; ACL Creation
&lt;/h3&gt;

&lt;p&gt;The next logical step most organizations take is to create topics manually through tools such as &lt;code&gt;kafka-topics&lt;/code&gt; or Confluent Control Center. This usually happens when Kafka is fairly new to an organization or used by a small group of people, e.g. a team or two.&lt;/p&gt;

&lt;p&gt;Manually creating topics and ACLs only works until the usage of Kafka within an organization starts to grow. There are typically two patterns that are followed by with manual topic creation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Anyone has access&lt;/strong&gt;: All developers/operations team members who can access the cluster can create topics as well as ACLs. This leads to topic naming standards and security best practices being thrown out the window. If anyone can make ACLs, there is no real security on the cluster. &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Operations has access&lt;/strong&gt;: A centralized operations team manages topics &amp;amp; ACLs manually through a change management/request process. Although this allows for some governance to be enforced, it leaves an operations team doing manual work. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A major issue with manual topic &amp;amp; ACL creation is that it is &lt;strong&gt;not&lt;/strong&gt; repeatable. It may be enticing to use a web interface to quickly create topics, but more-often-than-not it becomes a pain-point in the future.&lt;/p&gt;

&lt;p&gt;Imagine a scenario where you want to migrate to a new cluster or spin up a new environment; how easy is it to re-create all of the topics, topic configurations, and ACLs if they are not defined and easily accessible? It's pretty hard.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automated Topic &amp;amp; ACL Creation
&lt;/h3&gt;

&lt;p&gt;After manual topic &amp;amp; ACL creation becomes a limiting factor, teams usually seek to build tooling and automation around it. Most organizations in today's world are automating as much as they can. We see automation around immutable infrastructure, deploying applications, managing business processes, and much more.&lt;/p&gt;

&lt;p&gt;The first step in automating the creation of Kafka resources is usually a simple python or bash script. Teams might define their topics and ACLs in files such as JSON or YAML. These scripts are then either run by teams themselves or included in a continuous integration process.&lt;/p&gt;

&lt;p&gt;Unfortunately, these scripts are usually quick-and-dirty. They often cannot easily change topic configurations, delete unneeded topics, or provide insight into what your actual cluster has defined in terms of topics and ACLs. Lastly, ACLs can be quite verbose: it can be hard to understand the needed ACLs depending on the complexity of the application and its security needs (e.g. Kafka Connect is much more complicated than a simple consumer).&lt;/p&gt;

&lt;h2&gt;
  
  
  GitOps for Apache Kafka Management
&lt;/h2&gt;

&lt;p&gt;GitOps, as commonly found in &lt;a href="https://thenewstack.io/what-is-gitops-and-why-it-might-be-the-next-big-thing-for-devops/"&gt;Kubernetes deployment models&lt;/a&gt;, is a pattern centered around using a version control system (such as Git) to house information and code describing a system. This information is then used in an automated fashion to make changes to infrastructure (such as deploying a new Kubernetes workload).&lt;/p&gt;

&lt;p&gt;This pattern is essentially how most implementations of &lt;a href="https://www.terraform.io/"&gt;Terraform&lt;/a&gt;work: infrastructure gets defined in Terraform state files, a plan with the desired changes is generated, and then the plan is executed to apply the desired changes.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: This blog post describes how to manage topics &amp;amp; ACLs with GitOps, and not an actual Apache Kafka cluster deployment.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Kafka GitOps
&lt;/h3&gt;

&lt;p&gt;In this tutorial, I'll be introducing a tool called &lt;a href="https://github.com/devshawn/kafka-gitops"&gt;kafka-gitops&lt;/a&gt;. This project is a resources-as-code tool which allows users to automate the management of Apache Kafka topics and ACLs. Before we dive in, I'd like to introduce some terminology:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Desired State&lt;/strong&gt;: A file describing what your Kafka cluster state should look like.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Actual State&lt;/strong&gt;: The live state of what your Kafka cluster currently looks like.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A Plan&lt;/strong&gt;: A set of topic and/or ACL changes to apply to your Kafka cluster.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Topics and services are defined in a YAML desired state file. When run, &lt;code&gt;kafka-gitops&lt;/code&gt; compares your desired state to the actual state of the cluster and generates a plan to execute against the cluster. The plan will include any creates, updates, or deletes to topics, topic configurations, and ACLs. After validating the plan looks correct, it can be applied and will make your topics and ACLs match your desired state.&lt;/p&gt;

&lt;p&gt;On top of topic management, if your cluster has security, &lt;code&gt;kafka-gitops&lt;/code&gt; can generate the needed ACLs for most applications. There is no need to manually define a bunch of ACLs for Kafka Connect or Kafka Streams. By defining your services, &lt;code&gt;kafka-gitops&lt;/code&gt; will build the applicable ACLs.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--V2mQ_VYr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://devshawn.com/content/images/2020/05/68747470733a2f2f692e696d6775722e636f6d2f6a6e44775970382e706e67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--V2mQ_VYr--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/http://devshawn.com/content/images/2020/05/68747470733a2f2f692e696d6775722e636f6d2f6a6e44775970382e706e67.png" alt="Automating Kafka Topic &amp;amp; ACL Management"&gt;&lt;/a&gt;Example kafka-gitops workflow&lt;/p&gt;

&lt;p&gt;The major features of &lt;code&gt;kafka-gitops&lt;/code&gt; compared to other management tools:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;🚀 &lt;strong&gt;Built For CI/CD&lt;/strong&gt;: Made for CI/CD pipelines to automate the management of topics &amp;amp; ACLs.&lt;/li&gt;
&lt;li&gt;🔥 &lt;strong&gt;Configuration as code&lt;/strong&gt;: Describe your desired state and manage it from a version-controlled declarative file.&lt;/li&gt;
&lt;li&gt;👍 &lt;strong&gt;Easy to use&lt;/strong&gt;: Deep knowledge of Kafka administration or ACL management is &lt;strong&gt;NOT&lt;/strong&gt; required.&lt;/li&gt;
&lt;li&gt;⚡️️ &lt;strong&gt;Plan &amp;amp; Apply&lt;/strong&gt;: Generate and view a plan with or without executing it against your cluster.&lt;/li&gt;
&lt;li&gt;💻 &lt;strong&gt;Portable&lt;/strong&gt;: Works across self-hosted clusters, managed clusters, and even Confluent Cloud clusters.&lt;/li&gt;
&lt;li&gt;🦄 &lt;strong&gt;Idempotency&lt;/strong&gt;: Executing the same desired state file on an up-to-date cluster will yield the same result.&lt;/li&gt;
&lt;li&gt;☀️ &lt;strong&gt;Continue from failures&lt;/strong&gt;: If a specific step fails during an apply, you can fix your desired state and re-run the command. You can execute &lt;code&gt;kafka-gitops&lt;/code&gt; again without needing to rollback any partial successes.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Automating Topics &amp;amp; ACLs via Kafka GitOps
&lt;/h2&gt;

&lt;p&gt;I'll provide an overview of how &lt;code&gt;kafka-gitops&lt;/code&gt; works and how it can be applied to any Kafka cluster. An in-depth tutorial on how to use it will be posted in the next blog post; otherwise, the &lt;a href="https://devshawn.github.io/kafka-gitops/#/"&gt;documentation&lt;/a&gt; has a great &lt;a href="https://devshawn.github.io/kafka-gitops/#/quick-start"&gt;getting started guide&lt;/a&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Reminder&lt;/strong&gt;: This tool works on all newer Kafka clusters; including self-hosted Kafka, managed Kafka solutions, and Confluent Cloud.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Desired State File
&lt;/h3&gt;

&lt;p&gt;Topics and services that interact with your Kafka cluster are defined in a YAML file, named &lt;code&gt;state.yaml&lt;/code&gt; by default.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example desired state file:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;topics&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test-topic&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;partitions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;6&lt;/span&gt;
    &lt;span class="na"&gt;replication&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
    &lt;span class="na"&gt;configs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="s"&gt;cleanup.policy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;compact&lt;/span&gt;

&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;test-service&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;application&lt;/span&gt;
    &lt;span class="na"&gt;principal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;User:testservice&lt;/span&gt;
    &lt;span class="na"&gt;produces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;test-topic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This state file defines two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A compacted topic named &lt;strong&gt;test-topic&lt;/strong&gt; with &lt;strong&gt;six&lt;/strong&gt; partitions and a replication factor of &lt;strong&gt;three&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;An &lt;strong&gt;application&lt;/strong&gt; service named &lt;strong&gt;test-service&lt;/strong&gt; tied to the principal &lt;code&gt;User:testservice&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;strong&gt;type&lt;/strong&gt; of the service tells &lt;code&gt;kafka-gitops&lt;/code&gt; what type of ACLs to generate. In the case of &lt;strong&gt;application&lt;/strong&gt;, it will generate the needed ACLs for producing to and/or consuming from its specified topics. In this case, &lt;code&gt;kafka-gitops&lt;/code&gt; will generate a &lt;code&gt;WRITE&lt;/code&gt; ACL for the topic &lt;strong&gt;test-topic&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Currently, we support three types of services: &lt;code&gt;application&lt;/code&gt;, &lt;code&gt;kafka-connect&lt;/code&gt;, and &lt;code&gt;kafka-streams&lt;/code&gt;. Each service has a slightly different schema due to the nature of the service.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Kafka Streams service&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;my-stream&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-streams&lt;/span&gt;
    &lt;span class="na"&gt;principal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;User:mystream&lt;/span&gt;
    &lt;span class="na"&gt;consumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;test-topic&lt;/span&gt;
    &lt;span class="na"&gt;produces&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;test-topic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Kafka Streams services have special ACLs included for managing internal streams topics.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Example Kafka Connect service&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;my-connect-cluster&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;type&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kafka-connect&lt;/span&gt;
    &lt;span class="na"&gt;principal&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;User:myconnect&lt;/span&gt;
    &lt;span class="na"&gt;connectors&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;rabbitmq-sink&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;consumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;test-topic&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Kafka Connect services have special ACLs for working with their internal topics as well as defined ACLs for each running connector.&lt;/p&gt;

&lt;p&gt;Essentially, all topics and all services for a specific cluster get put into this YAML file. If you are not using security, such as on a local development cluster, you can omit the services block.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; : For full examples and &lt;strong&gt;specific requirements&lt;/strong&gt; for each service, read the &lt;a href="https://devshawn.github.io/kafka-gitops/#/services"&gt;services documentation page&lt;/a&gt;.  The specification for the desired state file and its schema can be found on the &lt;a href="https://devshawn.github.io/kafka-gitops/#/specification"&gt;specification documentation page&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Plan Changes To A Kafka Cluster
&lt;/h3&gt;

&lt;p&gt;Once your desired state file is created, you can generate a plan of changes to be applied against the cluster.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt; : &lt;code&gt;kafka-gitops&lt;/code&gt; is configured to connect to clusters via environment variables. See the &lt;a href="https://devshawn.github.io/kafka-gitops/#/quick-start?id=configuration"&gt;documentation for more details&lt;/a&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This does &lt;strong&gt;NOT&lt;/strong&gt; actually change the cluster. We can generate the plan by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;kafka-gitops &lt;span class="nt"&gt;-f&lt;/span&gt; state.yaml plan &lt;span class="nt"&gt;-o&lt;/span&gt; plan.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This will output a JSON file with the plan as well as a prettified output describing the changes. This is an example plan for the first &lt;code&gt;state.yaml&lt;/code&gt; file described when including only the topics block:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Generating execution plan...

An execution plan has been generated and is shown below.

Resource actions are indicated with the following symbols:
  + create
  ~ delete

The following actions will be performed:

Topics: 1 to create, 0 to update, 0 to delete.
+ [TOPIC] test-topic

ACLs: 0 to create, 0 to update, 0 to delete.

Plan: 1 to create, 0 to update, 0 to delete.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If there are topics or ACLs on the cluster that are not in the desired state file, the plan will include changes to update and/or delete them.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: It is possible to disable deletion by passing the &lt;code&gt;--no-delete&lt;/code&gt; flag after &lt;code&gt;-f state.yaml&lt;/code&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Apply Changes To A Kafka Cluster
&lt;/h3&gt;

&lt;p&gt;Once the plan is created, we can apply the changes to the cluster.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Warning&lt;/strong&gt;: This &lt;strong&gt;WILL&lt;/strong&gt; change the cluster to match the plan generated from the desired state file. Without the &lt;code&gt;--no-delete&lt;/code&gt; flag, this can be &lt;strong&gt;destructive&lt;/strong&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Changes are applied using the apply command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;kafka-gitops &lt;span class="nt"&gt;-f&lt;/span&gt; state.yaml apply &lt;span class="nt"&gt;-p&lt;/span&gt; plan.json
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This will execute the changes to the running Kafka cluster and output the results.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Executing apply...

Applying: [CREATE]

+ [TOPIC] test-topic

Successfully applied.

[SUCCESS] Apply complete! Resources: 1 created, 0 updated, 0 deleted.
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;If there is a partial failure, successes will not be rolled back. Instead, fix the error in the desired state file or manually within the cluster and rerun &lt;strong&gt;plan&lt;/strong&gt; and &lt;strong&gt;apply&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;After a successful apply, you can re-run the plan command to generate a new plan – except this time, there should be no changes, since your cluster is up to date with your desired state file!&lt;/p&gt;

&lt;h3&gt;
  
  
  Additional Features
&lt;/h3&gt;

&lt;p&gt;On top of the brief description of the features above, &lt;code&gt;kafka-gitops&lt;/code&gt; supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Automatically creating Confluent Cloud service accounts.&lt;/li&gt;
&lt;li&gt;Splitting the &lt;code&gt;topics&lt;/code&gt; and &lt;code&gt;services&lt;/code&gt; blocks into their own files.&lt;/li&gt;
&lt;li&gt;Ignoring specific topics from being deleted when not defined in the desired state file.&lt;/li&gt;
&lt;li&gt;Defining custom ACLs to a specific service (e.g. for a service such as Confluent Control Center).&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Kafka Topic &amp;amp; ACL Automation Workflow
&lt;/h2&gt;

&lt;p&gt;Now that we've had an overview of how &lt;a href="https://github.com/devshawn/kafka-gitops"&gt;kafka-gitops&lt;/a&gt; works, we can examine how to put this workflow into action within an organization. First, we can define typical roles within an organization:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developers&lt;/strong&gt;: Engineers who are writing applications and services utilizing Kafka.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations&lt;/strong&gt;: Engineers who manage, monitor, and maintain Kafka infrastructure.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt;: Engineers who are responsible for security operations within an organization.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Next, we can define an example setup and process for a GitOps workflow. This is not a one-size-fits-all answer – a lot depends on the organization and culture; however, this is a generalized approach that will work well if implemented correctly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Automation Workflow Overview
&lt;/h3&gt;

&lt;p&gt;A scalable implementation of the kafka-gitops workflow within an organization looks like this:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;All desired state files are stored within a repository owned by &lt;strong&gt;Operations&lt;/strong&gt;. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations&lt;/strong&gt; owns the &lt;code&gt;master&lt;/code&gt; branch, which &lt;em&gt;should&lt;/em&gt; reflect the live state of every cluster. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developers&lt;/strong&gt; fork this repository to make changes to their topics &amp;amp; services. &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Developers&lt;/strong&gt; create a pull request with their changes and mark it ready to review by &lt;strong&gt;Operations&lt;/strong&gt; and &lt;strong&gt;Security&lt;/strong&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations&lt;/strong&gt; and &lt;strong&gt;Security&lt;/strong&gt; review the changes and merge to &lt;code&gt;master&lt;/code&gt;. &lt;/li&gt;
&lt;li&gt;A CI/CD system kicks off a &lt;code&gt;kafka-gitops plan&lt;/code&gt; build to generate a new plan.&lt;/li&gt;
&lt;li&gt;(&lt;em&gt;Optional&lt;/em&gt;) The plan output is reviewed by &lt;strong&gt;Operations&lt;/strong&gt;, ensuring it looks correct.&lt;/li&gt;
&lt;li&gt;The plan is then applied, either manually by &lt;strong&gt;Operations&lt;/strong&gt; or automatically, through &lt;code&gt;kafka-gitops apply&lt;/code&gt;. The desired changes will then be reflected in the live cluster and the cluster will match the desired state file in &lt;code&gt;master&lt;/code&gt;.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As described above, all topics and services (which includes ACLs) are defined in version-controlled code. &lt;strong&gt;Developers&lt;/strong&gt; are responsible for their topic and service definitions. &lt;strong&gt;Operations&lt;/strong&gt; is responsible for managing the changes to the cluster (e.g. ensuring teams are not doing crazy things) as well as responsible for deploying the changes. &lt;strong&gt;Security&lt;/strong&gt; is responsible for ensuring sensitive data is being properly locked down to the services that require it.&lt;/p&gt;

&lt;h4&gt;
  
  
  Setting Up The Workflow
&lt;/h4&gt;

&lt;ol&gt;
&lt;li&gt;Create a centralized git repository for storing Kafka cluster desired state files.&lt;/li&gt;
&lt;li&gt;In that repository, create folders for each environment and/or cluster.&lt;/li&gt;
&lt;li&gt;In each cluster's folder, create its state file. Define any existing topics, services, and ACLs. &lt;/li&gt;
&lt;/ol&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note&lt;/strong&gt;: If adding this workflow to an existing Kafka cluster: the easiest way to get it set up is to continually run &lt;code&gt;plan&lt;/code&gt; against the live cluster as you update the desired state file to contain the correct information. Continue to do this until there are no changes planned.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h4&gt;
  
  
  Setting Up CI/CD
&lt;/h4&gt;

&lt;p&gt;Setting up CI/CD is highly dependent on which build system you are using. This is a general outline of how it could be configured:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set up a &lt;strong&gt;main&lt;/strong&gt; CI job that is triggered on changes to the &lt;code&gt;master&lt;/code&gt; branch.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;main&lt;/strong&gt; job should look for changes in each desired state file.&lt;/li&gt;
&lt;li&gt;For each desired state file with a change, trigger a &lt;strong&gt;side&lt;/strong&gt; job.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;side&lt;/strong&gt; job(s) should utilize &lt;code&gt;kafka-gitops plan&lt;/code&gt; to generate an execution plan.&lt;/li&gt;
&lt;li&gt;(&lt;em&gt;Optional&lt;/em&gt;) The &lt;strong&gt;side&lt;/strong&gt; job(s) should then &lt;strong&gt;wait&lt;/strong&gt; until &lt;strong&gt;Operations&lt;/strong&gt; can review the generated plan.&lt;/li&gt;
&lt;li&gt;The &lt;strong&gt;side&lt;/strong&gt; job(s) should then utilize &lt;code&gt;kafka-gitops apply&lt;/code&gt; to execute the planned changes to the specified Kafka cluster. &lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Benefits Of GitOps for Apache Kafka
&lt;/h3&gt;

&lt;p&gt;Once the full process is in place, you gain many benefits that allow you to easily govern the clusters as the adoption of Kafka continues within an organization.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Developers&lt;/strong&gt; have a well-defined process to follow to create topics &amp;amp; services.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Operations&lt;/strong&gt; has control over what is changing within the cluster and can ensure standards are followed.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Security&lt;/strong&gt; can easily audit and monitor access changes to data within the streaming platform.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Additionally, &lt;code&gt;kafka-gitops&lt;/code&gt; provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A defined process to make any changes to the Kafka cluster; no manual steps.&lt;/li&gt;
&lt;li&gt;A full audit log and history of changes to your cluster via version control.&lt;/li&gt;
&lt;li&gt;Automatic ACL generation for common services, reducing time spent on security.&lt;/li&gt;
&lt;li&gt;The ability to re-create a cluster's complete topic and ACL setup (e.g. for a new environment).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Limitations and Upcoming Features
&lt;/h3&gt;

&lt;p&gt;Although &lt;code&gt;kafka-gitops&lt;/code&gt; is actively being used in production, there are a few upcoming features to address some limitations:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The ability to set a custom &lt;code&gt;group.id&lt;/code&gt; for consumers &amp;amp; streams applications (currently, this must match the service name)&lt;/li&gt;
&lt;li&gt;The ability to set custom connect topic names (currently, this has a predefined pattern)&lt;/li&gt;
&lt;li&gt;Tooling around creating the initial desired state file from existing clusters&lt;/li&gt;
&lt;li&gt;Eventually, the optional ability to run it as-a-service to actively monitor for changes and source from locations such as git, AWS S3, etc. &lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Automating the management of Kafka topics and ACLs brings significant benefits to all teams working with Apache Kafka. Whether working with a large enterprise set of clusters or defining topics for your local development cluster, the GitOps pattern allows for easy, repeatable cluster resource definitions.&lt;/p&gt;

&lt;p&gt;By adopting a GitOps pattern for managing your Kafka topics and ACLs, your organization can &lt;strong&gt;reduce time spent managing Kafka&lt;/strong&gt; and &lt;strong&gt;spend more time providing value to your core business&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In some upcoming blog posts, I will be providing in-depth tutorials on using &lt;code&gt;kafka-gitops&lt;/code&gt; with self-hosted clusters and with Confluent Cloud.&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>gitops</category>
      <category>cicd</category>
      <category>devops</category>
    </item>
    <item>
      <title>Apache Kafka: Topic Naming Conventions</title>
      <dc:creator>Shawn Seymour</dc:creator>
      <pubDate>Sun, 29 Mar 2020 20:39:02 +0000</pubDate>
      <link>https://dev.to/devshawn/apache-kafka-topic-naming-conventions-3do6</link>
      <guid>https://dev.to/devshawn/apache-kafka-topic-naming-conventions-3do6</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my &lt;a href="https://devshawn.com/blog/apache-kafka-topic-naming-conventions/"&gt;personal blog&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Apache Kafka is an amazing system for building a scalable data streaming platform within an organization. It’s being used in production from all the way from small startups to Fortune 500 companies. As the adoption of a core platform grows within an enterprise, it’s important to think about maintaining consistency and enforcing standards.&lt;/p&gt;

&lt;p&gt;Today, I’ll be discussing how one can do that in regards to Apache Kafka and its core data structure: a topic. As most engineers who have used Kafka know, a topic is a category or feed to which messages are stored and published. These are similar to queues in message bus systems such as RabbitMQ or ActiveMQ.&lt;/p&gt;

&lt;h2&gt;
  
  
  Topic Naming: The Wild West
&lt;/h2&gt;

&lt;p&gt;Imagine a company building a simple order management system using Kafka as its backbone. They might create a couple of microservices that rely on a few core topics:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;orders&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;customers&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;payments&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As the company grows, and as more teams are onboarded to the platform, more topics will be needed. The company may add data pipelines for inventory, fraud detection, and more. They might now have additional topics like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;inventory&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;pageviews&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;fulfillment&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;How do you manage to keep topic names consistent? How do people from other teams know exactly what the topic contains? It is easy when there are only a few topics and a small number of people using the platform. Once you are in a large organization with many teams creating and using topics, it becomes much harder.&lt;/p&gt;

&lt;h2&gt;
  
  
  Common Topic Naming Proposals
&lt;/h2&gt;

&lt;p&gt;A quick search leads to some great &lt;a href="https://riccomini.name/how-paint-bike-shed-kafka-topic-naming-conventions"&gt;blog posts&lt;/a&gt;, &lt;a href="https://stackoverflow.com/questions/43726571/what-is-the-best-practice-for-naming-kafka-topics"&gt;StackOverflow answers&lt;/a&gt;, and &lt;a href="http://grokbase.com/t/kafka/users/152r20xg4r/stream-naming-conventions"&gt;mailing list posts&lt;/a&gt; discussing how to name topics. There is also a vast number of opinions on the best way to do this. Here are some examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;project&amp;gt;.&amp;lt;product&amp;gt;.&amp;lt;event-name&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;app-name&amp;gt;.&amp;lt;data-type&amp;gt;.&amp;lt;event-name&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;team-name&amp;gt;.&amp;lt;app-name&amp;gt;.&amp;lt;event-type&amp;gt;.&amp;lt;event-name&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A decent topic naming strategy, proposed by Chris Riccomini in his popular blog post, &lt;a href="https://riccomini.name/how-paint-bike-shed-kafka-topic-naming-conventions"&gt;How to paint a bike shed: Kafka topic naming conventions&lt;/a&gt;, is:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;&amp;lt;message type&amp;gt;.&amp;lt;dataset name&amp;gt;.&amp;lt;data name&amp;gt;&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At first glance, none of these look particularly bad – some even look great. Before we go in-depth on how to best name a Kafka topic, let’s discuss what makes a topic name good.&lt;/p&gt;

&lt;h3&gt;
  
  
  Naming Kafka Topics: Structure
&lt;/h3&gt;

&lt;p&gt;When it comes to naming a Kafka topic, two parts are important. The structure of the name and the semantics of the name.&lt;/p&gt;

&lt;p&gt;The structure of a name defines what characters are allowed and the format to use. In its topic names, Kafka allows alphanumeric characters, periods (_ &lt;strong&gt;.&lt;/strong&gt; &lt;em&gt;), underscores (&lt;/em&gt; &lt;strong&gt;_&lt;/strong&gt; &lt;em&gt;), and hyphens (&lt;/em&gt; &lt;strong&gt;-&lt;/strong&gt; _).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Although ‘_’ and ‘.’ are allowed to be used together, &lt;a href="https://github.com/apache/kafka/blob/2.3.0/core/src/main/scala/kafka/admin/TopicCommand.scala#L147-L148"&gt;they can collide&lt;/a&gt; due to limitations in metric names. It is best to pick one and use either, but not both.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We cannot change what Kafka allows, but we can further define how dashes are used or enforce that all topics be lowercase.&lt;/p&gt;

&lt;h3&gt;
  
  
  Naming Kafka Topics: Semantics
&lt;/h3&gt;

&lt;p&gt;The semantics of a name define what fields should go in that name and in what order they should be placed in. There are a few rules that should be applied to naming topics when it comes to semantics.&lt;/p&gt;

&lt;h4&gt;
  
  
  Do not use fields that can change
&lt;/h4&gt;

&lt;p&gt;Fields that can change should not be used in topic names. Fields such as team name, product name, and service owner should never be used in topic names.&lt;/p&gt;

&lt;p&gt;As most engineers know, over time, these things change as organizations evolve. It’s not an easy task to change a topic name once it is in use all over an enterprise, so it is best to leave those fields out from the beginning.&lt;/p&gt;

&lt;h4&gt;
  
  
  Do not tie topic names to services, consumers, or producers
&lt;/h4&gt;

&lt;p&gt;Topic names should not be tied to service names unless they are completely internal to a single service and are not meant to be produced to or consumed from any other service.&lt;/p&gt;

&lt;p&gt;Most topics eventually end up with more than one consumer and its producer could change in the future. It’s best to name topics after the data they hold rather than what is creating or reading the data.&lt;/p&gt;

&lt;h4&gt;
  
  
  Leave metadata out of the name if it can be found elsewhere
&lt;/h4&gt;

&lt;p&gt;Metadata that can be found elsewhere, such as in the data payload or in a schema registry, should be left out of the topic name. This includes things such as partition count, security information, schema information, etc.&lt;/p&gt;

&lt;h2&gt;
  
  
  Topic Naming Recommendations
&lt;/h2&gt;

&lt;p&gt;Okay, so you may be thinking, there are a ton of fields to pick from and I’m not sure what semantics I should enforce. Let’s get to the details on what a great topic name convention should look like. My recommended rules to follow are:&lt;/p&gt;

&lt;h3&gt;
  
  
  Naming Format
&lt;/h3&gt;

&lt;p&gt;Topic names should be completely lowercase and adhere to the following regular expression: &lt;code&gt;[a-z0-9.-]&lt;/code&gt;.  All topics should follow &lt;code&gt;kebab-base&lt;/code&gt;, such as &lt;code&gt;my-awesome-topic-name&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Readability and ease-of-understanding play a huge role in proper topic naming. Lowercase topic names are easy to read and kebab-case flows nicely; we avoid the use of underscores due to metric naming collisions with periods. Additionally, periods make for a great separator between sections in a topic name, which is described below.&lt;/p&gt;

&lt;h3&gt;
  
  
  Naming Structure
&lt;/h3&gt;

&lt;p&gt;My recommendation is to follow the following naming convention:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;data-center&amp;gt;.&amp;lt;domain&amp;gt;.&amp;lt;classification&amp;gt;.&amp;lt;description&amp;gt;.&amp;lt;version&amp;gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Let’s discuss what each part of the name means:&lt;/p&gt;

&lt;h5&gt;
  
  
  Data Center
&lt;/h5&gt;

&lt;p&gt;The data center which the data resides in. This is not required, but is helpful when an organization reaches the size where they would like to do an active/active setup or replicate data between data centers. For example, if you have one cluster in AWS and one in Azure, your topics may be prefixed with &lt;code&gt;aws&lt;/code&gt; and &lt;code&gt;azure&lt;/code&gt;.&lt;/p&gt;

&lt;h5&gt;
  
  
  Domain
&lt;/h5&gt;

&lt;p&gt;A domain for the data is a well understood, permanent name for the area of the system the data relates to. These &lt;strong&gt;should not&lt;/strong&gt; include any product names, team names, or service names.&lt;/p&gt;

&lt;p&gt;Examples of this vary wildly between industries. For example, in a transportation organization, some domains might be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;comms&lt;/strong&gt;: all events relating to device communications&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;fleet&lt;/strong&gt;: all events relating to trucking fleet management&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;identity&lt;/strong&gt;: all events relating to identity and auth services&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  Classification
&lt;/h5&gt;

&lt;p&gt;The classification of data within a Kafka topic tells an end-user how it should be interpreted or used. This &lt;strong&gt;should not&lt;/strong&gt; tell us about data format or contents. I typically use the following classifications:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;fct&lt;/strong&gt;: Fact data is information about a thing that has happened. It is an immutable event at a specific point in time. Examples of this include data from sensors, user activity events, or notifications.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cdc&lt;/strong&gt;: Change data capture (CDC) indicates this topic contains all instances of a specific &lt;em&gt;thing&lt;/em&gt; and receives all changes to those &lt;em&gt;things&lt;/em&gt;. These topics do not capture deltas and can be used to repopulate data stores or caches. These are commonly found as &lt;em&gt;compacted&lt;/em&gt; topics within Kafka.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cmd&lt;/strong&gt;: Command topics represent operations that occur against the system. This is typically found as the request-response pattern, where you have a verb and a statement. Examples might include &lt;code&gt;UpdateUser&lt;/code&gt; and &lt;code&gt;UserUpdated&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;sys&lt;/strong&gt;: System topics are used for internal topics within a single service. They are operational topics that do not contain any relevant information outside of the owning system. These topics are not meant for public consumption.&lt;/li&gt;
&lt;/ul&gt;

&lt;h5&gt;
  
  
  Description
&lt;/h5&gt;

&lt;p&gt;The description is arguably the most important part of the name and is the event name that describes the type of data the topic holds. This is the subject of the data, such as &lt;code&gt;customers&lt;/code&gt;, &lt;code&gt;invoices&lt;/code&gt;, &lt;code&gt;users&lt;/code&gt;, &lt;code&gt;payments&lt;/code&gt;, etc.&lt;/p&gt;

&lt;h5&gt;
  
  
  Version
&lt;/h5&gt;

&lt;p&gt;The version of a topic is often the most forgotten section of a proper topic name. As data evolves within a topic, there may be breaking schema changes or a complete change in the format of the data. By versioning topics, you can allow a transitionary period where consumers can switch to the new data without impacting any old consumers.&lt;/p&gt;

&lt;p&gt;By convention, it is preferred to version all topics and to start them at &lt;code&gt;0&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Examples
&lt;/h4&gt;

&lt;p&gt;Examples, using the following convention, may be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;aws.analytics.fct.pageviews.0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;azure.comms.fct.gps.0&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;dc1.identity.cdc.users.1&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gcp.notifications.cmd.emails.3&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;gcp.notifications.sys.email-cache.0&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Enforcing Standards
&lt;/h2&gt;

&lt;p&gt;After a naming convention is decided upon and put into place, how does one enforce that topics conform to that convention? The first step is to ensure auto topic creation is disabled on the broker side; this is done via the &lt;code&gt;auto.topic.create.enable&lt;/code&gt; property. In newer versions of Kafka, this is set to &lt;code&gt;false&lt;/code&gt; by default, which is what we want. Ideally, your cluster should also have security enabled and disallow the creation of topics by services. This enforces topic creation is done in a standardized way and controlled by an operations team.&lt;/p&gt;

&lt;p&gt;The recommended approach is to create topics through a continuous integration pipeline, where topics are defined in source control and created through a build process. This ensures scripts can validate that all topic names conform to the desired conventions before getting created. A helpful tool to manage topics within a Kafka cluster is &lt;a href="https://github.com/devshawn/kafka-dsf"&gt;kafka-dsf&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Hopefully reading this has provoked some thought into how to create useful topic naming conventions and how to prevent your Kafka cluster from becoming the Wild West.  It’s important to enforce consistency early and put a standard process in place before its too late — because things like topic names are hard to change later, and probably never will. :-)&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>conventions</category>
      <category>standards</category>
    </item>
    <item>
      <title>Apache Kafka: Docker Quick Start</title>
      <dc:creator>Shawn Seymour</dc:creator>
      <pubDate>Mon, 25 Nov 2019 15:14:27 +0000</pubDate>
      <link>https://dev.to/devshawn/apache-kafka-docker-quick-start-3ikp</link>
      <guid>https://dev.to/devshawn/apache-kafka-docker-quick-start-3ikp</guid>
      <description>&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F11%2Fdocker-bg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F11%2Fdocker-bg.png" alt="Apache Kafka: Docker Quick Start"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Apache Kafka is a distributed streaming platform that can act as a message broker, as the heart of a stream processing pipeline, or even as the backbone of a large enterprise data synchronization system. Kafka is not only a highly-available and fault-tolerant system; it also handles vastly higher throughput compared to other message brokers such as RabbitMQ or ActiveMQ.&lt;/p&gt;

&lt;p&gt;In this tutorial, you will utilize Docker &amp;amp; Docker Compose to run Apache Kafka &amp;amp; ZooKeeper. Docker with Docker Compose is the quickest way to get started with Apache Kafka and to experiment with clustering and the fault-tolerant properties Kafka provides. A full Docker Compose setup with 3 Kafka brokers and 1 ZooKeeper node can be &lt;a href="https://gist.github.com/devshawn/bf5d5afff02ea332d80fbe730e6d8e58" rel="noopener noreferrer"&gt;found here&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To complete this tutorial, you will need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A UNIX environment (Mac or Linux)&lt;/li&gt;
&lt;li&gt;Docker &amp;amp; Docker Compose&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: Docker can be installed by following the &lt;a href="https://docs.docker.com/install/" rel="noopener noreferrer"&gt;official installation guide&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;Before running Kafka with Docker, let's examine the architecture of a simple Apache Kafka setup.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kafka Cluster&lt;/strong&gt; : A group of Kafka brokers forming a distributed system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kafka Broker&lt;/strong&gt; : An instance of Kafka that holds topics of data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZooKeeper&lt;/strong&gt; : A centralized system for storing and managing configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Producer&lt;/strong&gt; : A client that sends messages to a Kafka topic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer&lt;/strong&gt; : A client that read messages from a Kafka topic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka utilizes ZooKeeper to manage and coordinate brokers within a cluster. Producers and consumers are the main clients that interact with Kafka, which we'll take a look at once we have a running Kafka broker.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F09%2Fdocker-getting-started.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F09%2Fdocker-getting-started.png" alt="Apache Kafka: Docker Quick Start"&gt;&lt;/a&gt;Architecture diagram of integrations used in this tutorial&lt;/p&gt;

&lt;p&gt;The above diagram shows the architecture of the systems we are going to run in this tutorial. It also helps demonstrate how Kafka brokers utilize ZooKeeper and shows the ports of the running services. In this tutorial, we'll start by running &lt;strong&gt;one&lt;/strong&gt; Apache Kafka broker and &lt;strong&gt;one&lt;/strong&gt; ZooKeeper node (seen above in blue). Later on, we'll form a &lt;strong&gt;three&lt;/strong&gt; node cluster by adding in &lt;strong&gt;two&lt;/strong&gt; more Kafka brokers (seen above in green).&lt;/p&gt;

&lt;h2&gt;
  
  
  Running ZooKeeper in Docker
&lt;/h2&gt;

&lt;p&gt;Ensure you have Docker installed and running. You can verify this by running the following command; you should see a similar output.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker -v
&amp;gt; Docker version 18.09.2, build 6247962
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Additionally, verify you have Docker Compose installed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose -v
&amp;gt; docker-compose version 1.23.2, build 1110ad01
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We're ready to begin! Create a directory, such as &lt;code&gt;~/kafka&lt;/code&gt;, to store our Docker Compose files. Using your favorite text editor or IDE, create a file named &lt;code&gt;docker-compose.yml&lt;/code&gt; in your new directory.&lt;/p&gt;

&lt;p&gt;We'll start by getting ZooKeeper running. In the Docker Compose YAML file, define a &lt;code&gt;zookeeper&lt;/code&gt; service as shown below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: '3'

services:
  zookeeper:
    image: zookeeper:3.4.9
    hostname: zookeeper
    ports:
      - "2181:2181"
    environment:
        ZOO_MY_ID: 1
        ZOO_PORT: 2181
        ZOO_SERVERS: server.1=zookeeper:2888:3888
    volumes:
      - ./data/zookeeper/data:/data
      - ./data/zookeeper/datalog:/datalog
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A brief overview of what we're defining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Line &lt;code&gt;1&lt;/code&gt;: docker compose file version number, set to &lt;code&gt;3&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;4&lt;/code&gt;: starting the definition of a ZooKeeper service&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;5&lt;/code&gt;: The docker image to use for ZooKeeper and its version&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;6&lt;/code&gt;: The hostname the container will use when running&lt;/li&gt;
&lt;li&gt;Lines &lt;code&gt;7-8&lt;/code&gt;: The ports to expose to the host; ZooKeeper's default port&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;10&lt;/code&gt;: The unique ID of this ZooKeeper instance, set to &lt;code&gt;1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;11&lt;/code&gt;: The port this ZooKeeper instance should run with&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;12&lt;/code&gt;: The list of ZooKeeper servers; in our case just one&lt;/li&gt;
&lt;li&gt;Lines &lt;code&gt;13-15&lt;/code&gt;: Mapping volumes on the host to store ZooKeeper data&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: We've mapped &lt;code&gt;./data/zookeeper&lt;/code&gt; on the host to directories within the container. This allows ZooKeeper to persist data even if you destroy the container.&lt;/p&gt;

&lt;p&gt;We can now start ZooKeeper by running the following command in the directory containing the &lt;code&gt;docker-compose.yml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Logs will start printing, and should end with a line similar to this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;zookeeper_1 | ... binding to port 0.0.0.0/0.0.0.0:2181
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Congrats! ZooKeeper is running and exposed on port 2181. You can verify this utilizing netcat in a new terminal window:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;echo ruok | nc localhost 2181
&amp;gt; imok
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Running Kafka In Docker
&lt;/h2&gt;

&lt;p&gt;We can now add our first &lt;code&gt;kafka&lt;/code&gt; service to our Docker Compose file. We're calling this &lt;code&gt;kafka2&lt;/code&gt; as it will have a broker id of &lt;code&gt;2&lt;/code&gt; and run on the default port of &lt;code&gt;9092&lt;/code&gt;. Later on, we'll add in &lt;code&gt;kafka1&lt;/code&gt; and &lt;code&gt;kafka3&lt;/code&gt;. This is to demonstrate that order does not matter and broker &lt;code&gt;id&lt;/code&gt;s are just for identification.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: '3'

services:
...
  kafka2:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka2
    ports:
      - "9092:9092"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka2:19092,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_BROKER_ID: 2
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./data/kafka2/data:/var/lib/kafka/data
    depends_on:
      - zookeeper
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you prefer, copy the &lt;a href="https://gist.github.com/devshawn/8e8397798b9e4a6c7fd299d723338084" rel="noopener noreferrer"&gt;full gist found here&lt;/a&gt;. A brief overview of what we're defining:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Line &lt;code&gt;6&lt;/code&gt;: The docker image to use for Kafka; we're using the Confluent image&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;7&lt;/code&gt;: The hostname this Kafka broker will use when running&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;8-9&lt;/code&gt;: The ports to expose; set to Kafka's default (&lt;code&gt;9092&lt;/code&gt;)&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;11&lt;/code&gt;: Kafka's advertised listeners. Robin Moffatt has a &lt;a href="https://rmoff.net/2018/08/02/kafka-listeners-explained/" rel="noopener noreferrer"&gt;great blog post&lt;/a&gt; about this.&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;12&lt;/code&gt;: Security protocols to use for each listener.&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;13&lt;/code&gt;: The inter-broker listener name (used for internal communication)&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;14&lt;/code&gt;: The list of ZooKeeper nodes Kafka should use&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;15&lt;/code&gt;: The broker ID of this Kafka broker.&lt;/li&gt;
&lt;li&gt;Line &lt;code&gt;16&lt;/code&gt;: The replication factor of the consumer offset topic (&lt;code&gt;1&lt;/code&gt; for one broker)&lt;/li&gt;
&lt;li&gt;Lines &lt;code&gt;17-18&lt;/code&gt;: Mapping volumes on the host to store Kafka data&lt;/li&gt;
&lt;li&gt;Lines &lt;code&gt;19-20&lt;/code&gt;: Start the ZooKeeper service before the Kafka service&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Let's start the Kafka broker! In a new terminal window, run the following command in the same directory:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker-compose up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;ZooKeeper should still be running in another terminal, and if it isn't, Docker Compose will start it. You'll see a lot of logs being printed and then Kafka should be running! We can verify this by creating a topic.&lt;/p&gt;

&lt;p&gt;If you have the Kafka command line tools installed, run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;kafka-topics --zookeeper localhost:2181 --create --topic new-topic --partitions 1 --replication-factor 1
&amp;gt; Created topic "new-topic".
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you don't have the Kafka command line tools installed, you can run a command using Docker as well:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker exec -it kafka_kafka2_1 kafka-topics --zookeeper zookeeper:2181 --create --topic new-topic --partitions 1 --replication-factor 1
&amp;gt; Created topic "new-topic".
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you get any errors, verify both Kafka and ZooKeeper are running with &lt;code&gt;docker ps&lt;/code&gt; and check the logs from the terminals running Docker Compose.&lt;/p&gt;

&lt;p&gt;Yay! You now have the simplest Kafka cluster running within Docker. Kafka with broker id &lt;code&gt;2&lt;/code&gt; is exposed on port &lt;code&gt;9092&lt;/code&gt; and ZooKeeper on port &lt;code&gt;2181&lt;/code&gt;. Data for this Kafka cluster is stored in &lt;code&gt;./data/kafka2&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To stop the containers, you can use &lt;code&gt;ctrl + c&lt;/code&gt; or &lt;code&gt;cmd + c&lt;/code&gt; on the running Docker Compose terminal windows. If they don't stop, you can run &lt;code&gt;docker-compose down&lt;/code&gt;. To remove the containers if they don't get removed as a part of &lt;code&gt;down&lt;/code&gt;, you can run &lt;code&gt;docker-compose rm&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Running Three Kafka Brokers In Docker
&lt;/h2&gt;

&lt;p&gt;To run three brokers, we need to add two more &lt;code&gt;kafka&lt;/code&gt; services to our Docker Compose file. We'll run broker &lt;code&gt;1&lt;/code&gt; on port &lt;code&gt;9091&lt;/code&gt; and broker &lt;code&gt;3&lt;/code&gt; on port &lt;code&gt;9093&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Add two more services as so:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;version: "3"

services:
...
  kafka1:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka1
    ports:
      - "9091:9091"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka1:19091,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9091
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_BROKER_ID: 1
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./data/kafka1/data:/var/lib/kafka/data
    depends_on:
      - zookeeper

  kafka3:
    image: confluentinc/cp-kafka:5.3.0
    hostname: kafka3
    ports:
      - "9093:9093"
    environment:
      KAFKA_ADVERTISED_LISTENERS: LISTENER_DOCKER_INTERNAL://kafka3:19093,LISTENER_DOCKER_EXTERNAL://${DOCKER_HOST_IP:-127.0.0.1}:9093
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: LISTENER_DOCKER_INTERNAL:PLAINTEXT,LISTENER_DOCKER_EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: LISTENER_DOCKER_INTERNAL
      KAFKA_ZOOKEEPER_CONNECT: "zookeeper:2181"
      KAFKA_BROKER_ID: 3
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
    volumes:
      - ./data/kafka3/data:/var/lib/kafka/data
    depends_on:
      - zookeeper
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can find a &lt;a href="https://gist.github.com/devshawn/bf5d5afff02ea332d80fbe730e6d8e58" rel="noopener noreferrer"&gt;full gist&lt;/a&gt; with ZooKeeper and three Kafka brokers &lt;a href="https://gist.github.com/devshawn/bf5d5afff02ea332d80fbe730e6d8e58" rel="noopener noreferrer"&gt;here&lt;/a&gt;. Essentially, we update the ports, the broker ID, and the data directory on the host.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: In a production setup, you'd want the offset topic replication factor to be set higher than 1, but for the purposes of this tutorial I've left it at one since we started with one broker.&lt;/p&gt;

&lt;p&gt;We can now verify that all three brokers are running by creating a topic with a replication factor of 3:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;docker exec -it kafka_kafka2_1 kafka-topics --zookeeper zookeeper:2181 --create --topic three-isr --partitions 1 --replication-factor 3
&amp;gt; Created topic "three-isr".
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you receive an error, ensure all three Kafka clusters are running. Woohoo! You've now got a Kafka cluster with three brokers running.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Congrats! You've successfully started a local Kafka cluster using Docker and Docker Compose. Data is persisted outside of the container on the local machine which means you can delete containers and restart them without losing data. For next steps, I'd suggest playing around with Kafka's fault tolerance and replication features.&lt;/p&gt;

&lt;p&gt;For example, you could create a topic with a replication factor of &lt;code&gt;3&lt;/code&gt;, produce some data, delete broker &lt;code&gt;2&lt;/code&gt;, delete broker 2's data directory (&lt;code&gt;./data/kafka2&lt;/code&gt;), and start broker &lt;code&gt;2&lt;/code&gt; and see that the data is replicated to the new broker. Pretty cool!&lt;/p&gt;

&lt;p&gt;For full sets of Docker Compose files for running various Kafka Cluster setups, check out Stephane Maarek's &lt;a href="https://github.com/simplesteph/kafka-stack-docker-compose" rel="noopener noreferrer"&gt;kafka-stack-docker-compose&lt;/a&gt; repository. This post was inspired by it. :-).&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>docker</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Apache Kafka: Quick Start</title>
      <dc:creator>Shawn Seymour</dc:creator>
      <pubDate>Mon, 20 May 2019 20:30:58 +0000</pubDate>
      <link>https://dev.to/devshawn/apache-kafka-quick-start-5amd</link>
      <guid>https://dev.to/devshawn/apache-kafka-quick-start-5amd</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally published on my &lt;a href="https://devshawn.com/blog/apache-kafka-quick-start/"&gt;personal blog&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Apache Kafka is a distributed streaming platform that can act as a message broker, as the heart of a stream processing pipeline, or even as the backbone of an enterprise data synchronization system. Kafka is not only a highly-available and fault-tolerant system; it also handles vastly higher throughput compared to other message brokers such as RabbitMQ or ActiveMQ.&lt;/p&gt;

&lt;p&gt;In this tutorial, you will install Apache Kafka, run three brokers in a cluster, and learn how to produce and consume messages from your cluster. This tutorial assumes that you have no existing Kafka or ZooKeeper installation.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prerequisites
&lt;/h2&gt;

&lt;p&gt;To complete this tutorial, you will need:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A UNIX environment (Mac or Linux)&lt;/li&gt;
&lt;li&gt;Java 8+ installed&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: Java 7 support was dropped in 2.0.0. Java 11 support was added in 2.1.0.&lt;/p&gt;

&lt;h2&gt;
  
  
  Installation
&lt;/h2&gt;

&lt;p&gt;Download Apache Kafka and its related binaries from the Apache Kafka website. At the time of this article, the latest version is &lt;a href="https://www.apache.org/dyn/closer.cgi?path=/kafka/2.1.1/kafka_2.12-2.1.1.tgz"&gt;Apache Kafka 2.1.1&lt;/a&gt;. After downloading from the previous link, extract the &lt;code&gt;.tgz&lt;/code&gt; file from the location it was downloaded to.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;tar&lt;/span&gt; &lt;span class="nt"&gt;-xzf&lt;/span&gt; kafka_2.11-2.1.0.tgz
&lt;span class="nb"&gt;cd &lt;/span&gt;kafka_2.11-2.1.0
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;h2&gt;
  
  
  System Architecture
&lt;/h2&gt;

&lt;p&gt;Let's take a look at the architecture of a simple Apache Kafka setup.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Kafka Cluster&lt;/strong&gt;: A group of Kafka brokers forming a distributed system&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Kafka Broker&lt;/strong&gt;: An instance of Kafka that holds topics of data&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;ZooKeeper&lt;/strong&gt;: A centralized system for storing and managing configuration&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Producer&lt;/strong&gt;: A client that sends messages to a Kafka topic&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Consumer&lt;/strong&gt;: A client that read messages from a Kafka topic&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Kafka utilizes ZooKeeper to manage and coordinate brokers within a cluster. Producers and consumers are the main components that interact with Kafka, which we'll take a look at once we have a running Kafka broker. In this tutorial, we'll be running &lt;strong&gt;three&lt;/strong&gt; Kafka brokers and &lt;strong&gt;one&lt;/strong&gt; ZooKeeper node.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--dZ45M3k4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://devshawn.com/content/images/2019/05/getting-started-diagram.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--dZ45M3k4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://devshawn.com/content/images/2019/05/getting-started-diagram.png" alt="Apache Kafka: Installation &amp;amp; Quick Start"&gt;&lt;/a&gt;Architecture diagram of integrations used in this tutorial&lt;/p&gt;

&lt;p&gt;The above diagram shows the architecture of the systems and tools used in this tutorial. It helps demonstrate how Kafka brokers utilize ZooKeeper, which components the command line tools we'll be using interact with, and shows the ports of the running services.&lt;/p&gt;

&lt;h2&gt;
  
  
  Starting Zookeeper
&lt;/h2&gt;

&lt;p&gt;ZooKeeper is a centralized service that is used to maintain naming and configuration data as well as to provide flexible and robust synchronization within distributed systems. Kafka &lt;strong&gt;requires&lt;/strong&gt; ZooKeeper, so we must start an instance of ZooKeeper before we start Kafka.&lt;/p&gt;

&lt;p&gt;Conveniently, the download for Apache Kafka includes an easy way to run a ZooKeeper instance. Inside of the &lt;code&gt;bin&lt;/code&gt; directory, there is a file named &lt;code&gt;zookeeper-server-start.sh&lt;/code&gt;. To start ZooKeeper, run the following command from the root directory of your download:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/zookeeper-server-start.sh config/zookeeper.properties
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;In your terminal, ZooKeeper logs will start flowing and you will shortly see a line that states ZooKeeper is running on port &lt;code&gt;2181&lt;/code&gt;. This is ZooKeeper's default port, and can be changed in &lt;code&gt;config/zookeeper.properties&lt;/code&gt;.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: The default directory where ZooKeeper stores its state is set to &lt;code&gt;/tmp/zookeeper&lt;/code&gt;. If you restart your machine, all ZooKeeper data will be lost.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Lastly, open a new terminal window and let ZooKeeper continue running in your original terminal. Ensure you &lt;code&gt;cd&lt;/code&gt; to the root directory of your extracted Kafka download.&lt;/p&gt;

&lt;h2&gt;
  
  
  Setting Up A Kafka Cluster
&lt;/h2&gt;

&lt;p&gt;The official Kafka quick start guide only runs one broker – that's not really a distributed system or a cluster; so we're going to run &lt;strong&gt;three&lt;/strong&gt; brokers! :)&lt;/p&gt;

&lt;p&gt;Let's examine the configuration file for a Kafka broker located at &lt;code&gt;config/server.properties&lt;/code&gt;. You can view the configuration file from your new terminal window by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;config/server.properties
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;There's quite a bit of configuration, but the main properties we care about are the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;broker.id=0&lt;/code&gt;: the unique id of the broker&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;listeners=PLAINTEXT://:9092&lt;/code&gt;: the protocol and port of the broker&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;logs.dir=/tmp/kafka&lt;/code&gt;: the storage location for data in the broker&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;All three of these configuration properties &lt;strong&gt;must&lt;/strong&gt; be unique per broker. By default, you can see the default broker id is &lt;code&gt;0&lt;/code&gt; and the default Kafka port is &lt;code&gt;9092&lt;/code&gt;. Since we're going to start 3 brokers, let's copy this file for each broker and leave &lt;code&gt;server.properties&lt;/code&gt; as-is for reference. We can do this by running:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cp &lt;/span&gt;config/server.properties config/server-1.properties
&lt;span class="nb"&gt;cp &lt;/span&gt;config/server.properties config/server-2.properties
&lt;span class="nb"&gt;cp &lt;/span&gt;config/server.properties config/server-3.properties
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Next, we need to modify the properties listed above to be unique per broker. You'll want to ensure you uncomment the &lt;code&gt;listeners&lt;/code&gt; property. Modify the files using your favorite text editor, or via a CLI program such as &lt;code&gt;vim&lt;/code&gt;. Make sure to only modify the lines below, and not to replace the whole file with them!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;server-1.properties&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;broker.id&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;1&lt;/span&gt;
&lt;span class="py"&gt;listeners&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;PLAINTEXT://:9091&lt;/span&gt;
&lt;span class="py"&gt;log.dirs&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/tmp/kafka-1&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;server-2.properties&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;broker.id&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;
&lt;span class="py"&gt;listeners&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;PLAINTEXT://:9092&lt;/span&gt;
&lt;span class="py"&gt;log.dirs&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/tmp/kafka-2&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;server-3.properties&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight properties"&gt;&lt;code&gt;&lt;span class="py"&gt;broker.id&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;3&lt;/span&gt;
&lt;span class="py"&gt;listeners&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;PLAINTEXT://:9093&lt;/span&gt;
&lt;span class="py"&gt;log.dirs&lt;/span&gt;&lt;span class="p"&gt;=&lt;/span&gt;&lt;span class="s"&gt;/tmp/kafka-3&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Yay! We now have a configuration file for each broker. Each broker has a unique id, listens on a unique port, and stores data in a unique location.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note:&lt;/em&gt; As with ZooKeeper, the data is stored in the &lt;code&gt;/tmp&lt;/code&gt; directory. All data will be lost when you restart your machine.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Starting Kafka
&lt;/h2&gt;

&lt;p&gt;In addition to your current terminal, open two more terminal windows and &lt;code&gt;cd&lt;/code&gt; to your Kafka download directory. You should have four terminals open at this point; one running ZooKeeper and three for running Kafka.&lt;/p&gt;

&lt;p&gt;To start Kafka, you'll want to run the following commands, with each one in a separate terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-server-start.sh config/server-1.properties
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-server-start.sh config/server-2.properties
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;





&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-server-start.sh config/server-3.properties
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;You'll start to see logs in each terminal for the brokers you started. If you look at your ZooKeeper terminal, you'll also see logs from the brokers connecting to ZooKeeper. Each terminal should end with a line similar to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[2019-03-02 15:28:21,074] INFO [KafkaServer id=1] started (kafka.server.KafkaServer)
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Congrats! You now have a Kafka cluster running, with a unique broker exposed on ports &lt;code&gt;9091&lt;/code&gt;, &lt;code&gt;9092&lt;/code&gt;, and &lt;code&gt;9093&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating A Topic
&lt;/h2&gt;

&lt;p&gt;Now that we have a Kafka cluster running, let's send some messages! To do this, we must first create a topic. Kafka includes some command line tools to do this, located in the &lt;code&gt;bin&lt;/code&gt; directory. Open a new terminal window and &lt;code&gt;cd&lt;/code&gt; to the Kafka download directory.&lt;/p&gt;

&lt;p&gt;Let's create a topic named &lt;code&gt;test&lt;/code&gt;. We can do this by utilizing the &lt;code&gt;kafka-topics.sh&lt;/code&gt; script in the &lt;code&gt;bin&lt;/code&gt; directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-topics.sh &lt;span class="nt"&gt;--create&lt;/span&gt; &lt;span class="nt"&gt;--zookeeper&lt;/span&gt; localhost:2181 &lt;span class="nt"&gt;--replication-factor&lt;/span&gt; 3 &lt;span class="nt"&gt;--partitions&lt;/span&gt; 1 &lt;span class="nt"&gt;--topic&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Let's analyze the arguments we're passing the script:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;--create&lt;/code&gt;: flag to create a topic&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--zookeeper&lt;/code&gt;: pass the zookeeper connect utilized by Kafka&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--replication-factor&lt;/code&gt;: set the replication factor&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--partitions&lt;/code&gt;: set the number of partitions&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;--topic&lt;/code&gt;: set the topic name&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In the command above, we create a single partition topic. We also set the replication factor to &lt;code&gt;3&lt;/code&gt;. This means that data will be replicated (copied for redundancy) to all of our brokers.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Note&lt;/em&gt;: The max replication factor for a topic is the number of brokers you have running. In this case, we have max replication factor of 3.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;We can now &lt;code&gt;describe&lt;/code&gt; the topic to gain insight into our newly created topic:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-topics.sh &lt;span class="nt"&gt;--describe&lt;/span&gt; &lt;span class="nt"&gt;--zookeeper&lt;/span&gt; localhost:2181 &lt;span class="nt"&gt;--topic&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This will output something similar to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Topic:test  PartitionCount:1    ReplicationFactor:3 Configs:
    Topic: test Partition: 0    Leader: 2   Replicas: 2,3,1 Isr: 2,3,1
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;This explains that our topic &lt;code&gt;test&lt;/code&gt; has one partition, a replication factor of three, and no non-default configurations set. It also shows for our one partition, partition &lt;code&gt;0&lt;/code&gt;, that the leader is broker &lt;code&gt;2&lt;/code&gt; and that we have &lt;code&gt;3&lt;/code&gt; in-sync replicas. Your leader may be different than broker &lt;code&gt;2&lt;/code&gt;, but you should have &lt;code&gt;3&lt;/code&gt; in-sync replicas.&lt;/p&gt;

&lt;p&gt;To learn more about what partitions, replicas, and in-sync replicas mean, go check out and read my post &lt;a href="https://devshawn.com/blog/apache-kafka-introduction/"&gt;Apache Kafka: An Introduction&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Producing Messages
&lt;/h2&gt;

&lt;p&gt;Now that we have a Kafka topic, let's send some messages to it! We can do this using the &lt;code&gt;kafka-console-producer.sh&lt;/code&gt; script in the &lt;code&gt;bin&lt;/code&gt; directory. This is a handy tool for producing messages from the command line.&lt;/p&gt;

&lt;p&gt;Run the console producer with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-console-producer.sh &lt;span class="nt"&gt;--broker-list&lt;/span&gt; localhost:9091,localhost:9092,localhost:9093 &lt;span class="nt"&gt;--topic&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We pass the list of Kafka brokers with the &lt;code&gt;--broker-list&lt;/code&gt; argument and the name of the topic to produce to with the &lt;code&gt;--topic&lt;/code&gt; argument. You should now have a terminal line starting with &lt;code&gt;&amp;gt;&lt;/code&gt;. From here, you can type a message and hit enter to send it to Kafka. For example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt; hello world, this is my first message
&amp;gt; this is a second message
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;Once you've sent some messages, exit out of the console producer by using &lt;code&gt;cmd + c&lt;/code&gt; or &lt;code&gt;ctrl + c&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Consuming Messages
&lt;/h2&gt;

&lt;p&gt;We've successfully sent some messages to our Kafka topic, so the last thing we need to do is read those messages. We can do this by using the &lt;code&gt;kafka-console-consumer.sh&lt;/code&gt; script in the &lt;code&gt;bin&lt;/code&gt; directory. This is a handy tool for consuming messages from the command line.&lt;/p&gt;

&lt;p&gt;Run the console consumer against our topic with the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight shell"&gt;&lt;code&gt;bin/kafka-console-consumer.sh &lt;span class="nt"&gt;--bootstrap-server&lt;/span&gt; localhost:9091,localhost:9092,localhost:9093 &lt;span class="nt"&gt;--topic&lt;/span&gt; &lt;span class="nb"&gt;test&lt;/span&gt; &lt;span class="nt"&gt;--from-beginning&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;We set the &lt;code&gt;--bootstrap-server&lt;/code&gt; argument to a comma-separated list of our brokers; this can be one or all of the brokers. I typically use all brokers for consistency. We also set the argument &lt;code&gt;--topic&lt;/code&gt; to our topic name and pass the &lt;code&gt;--from-beginning&lt;/code&gt; flag to read all messages in the topic. If you don't pass &lt;code&gt;--from-beginning&lt;/code&gt;, you'll only see messages that have been produced since starting the consumer.&lt;/p&gt;

&lt;p&gt;You should see the messages sent earlier appear in the output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight"&gt;&lt;pre class="highlight plaintext"&gt;&lt;code&gt;hello world, this is my first message
this is a second message
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;



&lt;p&gt;To exit the consumer, use &lt;code&gt;cmd + c&lt;/code&gt; or &lt;code&gt;ctrl + c&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Congrats! You've successfully started a local Kafka cluster, created a topic, sent messages to it with a console producer, and read messages from it with a console consumer. For fun, you can start the console producer and console consumer in separate terminal windows and produce some more messages. You'd then be able to see messages get consumed and printed in real time! Sweet!&lt;/p&gt;

&lt;p&gt;You can stop the Kafka brokers and ZooKeeper node by using &lt;code&gt;cmd + c&lt;/code&gt; or &lt;code&gt;ctrl + c&lt;/code&gt; in their respective terminal windows. I hope this tutorial helped you in getting a local Kafka cluster set up, and now you should be ready to continue on in your Kafka journey!&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Apache Kafka: An Introduction</title>
      <dc:creator>Shawn Seymour</dc:creator>
      <pubDate>Thu, 16 May 2019 21:39:07 +0000</pubDate>
      <link>https://dev.to/devshawn/apache-kafka-an-introduction-3d4o</link>
      <guid>https://dev.to/devshawn/apache-kafka-an-introduction-3d4o</guid>
      <description>&lt;p&gt;&lt;em&gt;This post was originally posted on my &lt;a href="https://devshawn.com/blog/apache-kafka-introduction/" rel="noopener noreferrer"&gt;personal blog&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Apache Kafka is a distributed data streaming platform made for publishing, subscribing to, storing, and processing streams of events or records in real time. It is designed to take in data from multiple sources, store that data in a reliable way, and allow that data to be consumed from multiple systems. It is also designed to handle trillions of events per day. It was originally developed at LinkedIn and is now an open source Apache project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Message Queues
&lt;/h2&gt;

&lt;p&gt;Apache Kafka is an alternative to traditional message queue systems, such as ActiveMQ or RabbitMQ. A &lt;em&gt;message queue&lt;/em&gt; is a form of asynchronous service-to-service communication. It allows a service to send messages to a queue, where another service can then read those messages. Services that write to a queue are typically called &lt;strong&gt;producers&lt;/strong&gt;. Services that subscribe and read from a queue are called &lt;strong&gt;consumers&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;This communication is called &lt;em&gt;asynchronous&lt;/em&gt; because once a service sends the message, it can continue doing other work instead of waiting for a response from another service. In a nutshell, message queues allow a number of systems to pull a message, or a batch of messages, from the end of the queue. Typically, after a message has been read, it is removed from the queue.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Fmessage-queue.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Fmessage-queue.svg" alt="Apache Kafka: An Introduction"&gt;&lt;/a&gt;Simple message queue data flow example&lt;/p&gt;

&lt;p&gt;The most general implementation of a message queue is a task list, where a consumer reads a task off the queue and processes it. Multiple consumers can be added to add concurrency and improve task processing speed, but it does &lt;strong&gt;not&lt;/strong&gt; allow for multiple actions to happen based on the &lt;em&gt;same&lt;/em&gt; message. This can be generalized as a list of &lt;em&gt;commands&lt;/em&gt; where each command is only processed by one consumer.&lt;/p&gt;

&lt;p&gt;To improve upon this, the &lt;em&gt;publish/subscribe&lt;/em&gt; model was born (a.k.a. pub/sub). In the pub/sub model, multiple consumers can subscribe to the same queue and each consumer can read the same message independently.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For example, imagine a queue that provides the latest stock price of a given stock. There could be many systems that would be interested in consuming the latest stock price. Those systems can subscribe to the queue and each system will read the latest stock price, even if another independent system has already read it.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This can be generalized as a list of &lt;em&gt;events&lt;/em&gt; where each consumer can process every event.&lt;/p&gt;

&lt;h2&gt;
  
  
  Introducing Apache Kafka
&lt;/h2&gt;

&lt;p&gt;Apache Kafka, as stated above, is an alternative messaging system that encompasses the concepts of message queues, pub/sub, and even databases. A producer can publish a record to a &lt;em&gt;topic&lt;/em&gt;, rather than a queue. Consumers can then subscribe to and read messages from that topic. Unlike most message queues, messages from a topic are not deleted once they are consumed; rather, Kafka persists them to disk. This allows you to replay messages and allows a multitude of consumers to process differing logic for each record, or like the example above, each &lt;em&gt;event&lt;/em&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benefits
&lt;/h3&gt;

&lt;p&gt;There are many benefits provided by Apache Kafka that most message queue systems were not built to provide.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Reliability&lt;/strong&gt; : Kafka is distributed, partitioned, replicated and fault tolerant. We'll explore what this means later on.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Scalability&lt;/strong&gt; : Kafka scales easily to multiple nodes and allows for zero-downtime deployments and upgrades.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Durability&lt;/strong&gt; : Kafka's distributed commit log allows for messages to be persisted on disk.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Performance&lt;/strong&gt; : Kafka's high-throughput for publishing and subscribing allows for highly performant distributed systems.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As described above, Kafka provides a unique range of benefits over traditional message queues or pub/sub systems. Let's dig deeper into the internals of Kafka and how it works.&lt;/p&gt;

&lt;h3&gt;
  
  
  Kafka Terminology
&lt;/h3&gt;

&lt;p&gt;The architecture of Kafka is organized into a few key components. As a distributed system, Kafka runs as a &lt;em&gt;cluster&lt;/em&gt;. Each instance of Kafka within a cluster is called a &lt;em&gt;broker&lt;/em&gt;. All records within Kafka are stored in &lt;em&gt;topics&lt;/em&gt;. Topics are split into &lt;em&gt;partitions&lt;/em&gt; of data; more on that later. Lastly, &lt;em&gt;producers&lt;/em&gt; write to topics and &lt;em&gt;consumers&lt;/em&gt; read from topics.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Fkafka-overview.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Fkafka-overview.svg" alt="Apache Kafka: An Introduction"&gt;&lt;/a&gt;Apache Kafka is the backbone of a data streaming platform&lt;/p&gt;

&lt;h3&gt;
  
  
  Commit Log
&lt;/h3&gt;

&lt;p&gt;At the heart of Apache Kafka lies a distributed, immutable commit log, which is quite similar to the &lt;em&gt;git log&lt;/em&gt; we all know and love. Each record published to a topic is committed to the end of a log and assigned a unique, sequential log-entry number. This is also often called a "write-ahead log". Essentially, we get an ordered list of events that tell us two things: &lt;em&gt;what&lt;/em&gt; happened and &lt;em&gt;when&lt;/em&gt; it happened. In distributed systems, for many reasons, this is typically the heart of the problem.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Flog-example.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Flog-example.svg" alt="Apache Kafka: An Introduction"&gt;&lt;/a&gt;Example of a write-ahead log with a sequential id for each entry&lt;/p&gt;

&lt;p&gt;As a side effect of Kafka topics being based around a commit log, we get &lt;em&gt;durability&lt;/em&gt;. Data is persisted to disk and is available for consumers to read as many times as they would like to. If desired, Kafka can then be used as a source of truth, much like a database.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For example, imagine a &lt;code&gt;users&lt;/code&gt; topic. Each time a new user registers within an application, an event is sent to Kafka. From here, one service can then read from the &lt;code&gt;users&lt;/code&gt; topic and persist it in a database. Another service might read the &lt;code&gt;users&lt;/code&gt; topic and send a welcome email. This allows us to decouple services from one another and often helps implement microservices and event-driven architectures.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Topics and Partitions within Kafka
&lt;/h3&gt;

&lt;p&gt;As described above, Kafka stores data within topics. Topics are then split into partitions. A &lt;em&gt;partition&lt;/em&gt; is an ordered, immutable log of records that is continually appended to. Each record in a partition is assigned a sequential id number, called the &lt;em&gt;offset,&lt;/em&gt; that uniquely identifies the record within the partition. A topic is made up of one or more partitions.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Ftopic-partition-example.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Ftopic-partition-example.svg" alt="Apache Kafka: An Introduction"&gt;&lt;/a&gt;Example anatomy of a Kafka topic with three partitions&lt;/p&gt;

&lt;p&gt;Splitting topics into multiple partitions provides multiple benefits:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Logs can scale larger than the size of one server; each partition must fit within the size of one server but a topic with multiple partitions can spread across many servers  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Consumption of topics can be parallelized by having a consumer for each partition of a topic, which we will explain later on&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A Kafka cluster persists all published records using a configurable retention period. This is true for records that have &lt;strong&gt;and&lt;/strong&gt; have not been consumed. Kafka's performance is not affected with respect to the size of the data on the disk; so storing data for a long time is not a problem. The retention period can be set based on a length of time or the size of the topic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For example, if the retention policy is set to five days, then a record can be consumed for up to five days since being published. After those five days have passed, Kafka will discard the record to free up disk space.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Kafka can also persist data indefinitely based on the key of a message. This is very similar to a database table, where the latest record for each key is stored. This is called &lt;em&gt;log compaction&lt;/em&gt;, and leads to what is called a &lt;code&gt;compacted&lt;/code&gt; topic. Messages with an outdated record will eventually be garbage collected and removed from the topic.&lt;/p&gt;

&lt;h3&gt;
  
  
  Distribution and Reliability within Kafka
&lt;/h3&gt;

&lt;p&gt;Each broker holds a set of partitions where each partition is either a &lt;em&gt;leader&lt;/em&gt; or a &lt;em&gt;replica&lt;/em&gt; for a given topic. All writes to and reads from a topic happen through the leader. The leader coordinates updates to replicas when new records are appended to a topic. If a leader fails, a replica takes over as a new leader. Additionally, a replica is said to be &lt;em&gt;in-sync&lt;/em&gt; if all data has been replicated from the leader. By default, only in-sync replicas can become a leader if the leader fails. Out-of-sync replicas can be a sign of broker failure or problems within Kafka.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Fleader-replication-example.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Fleader-replication-example.svg" alt="Apache Kafka: An Introduction"&gt;&lt;/a&gt;Example diagram showing replication for a topic with two partitions and a replication factor of three&lt;/p&gt;

&lt;p&gt;By having multiple replicas of a topic, we help ensure data is not lost if a broker fails. For a cluster with &lt;code&gt;n&lt;/code&gt; brokers and topics with a replication factor of &lt;code&gt;n&lt;/code&gt;, Kafka will tolerate up to &lt;code&gt;n-1&lt;/code&gt; server failures before data loss occurs.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For example, let's say you have a cluster with &lt;code&gt;3&lt;/code&gt; brokers. Imagine a &lt;code&gt;users&lt;/code&gt; topic with a replication factor of &lt;code&gt;3&lt;/code&gt;. If one broker is lost, &lt;code&gt;users&lt;/code&gt; will have &lt;code&gt;2&lt;/code&gt; in-sync replicas and no data loss occurs. Even further, if another broker is lost, &lt;code&gt;users&lt;/code&gt; will have &lt;code&gt;1&lt;/code&gt; replica and there is &lt;strong&gt;still no data loss&lt;/strong&gt;. Impressive!&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Load of the cluster is managed by distributing the number of partition leaders across multiple brokers within the cluster. This allows Kafka to handle high amounts of reads and writes without putting all the strain on one broker – unless you only have &lt;code&gt;1&lt;/code&gt; broker!&lt;/p&gt;

&lt;h3&gt;
  
  
  Producing to Kafka
&lt;/h3&gt;

&lt;p&gt;Producers publish to topics of their choosing. Producers are responsible for assigning a partition to the record within the topic it's producing to. This can be done in a round-robin fashion to balance it or according to a semantic partition function (such as based on a key within the record).&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For example, the default partition strategy for the Java clients use a hash of the record's key to choose the partition. This preserves message order for messages with the same key. If the record's key is &lt;code&gt;null&lt;/code&gt;, then the Java client will partition the data randomly. This can be useful for easily partitioning high-volume data where order does not matter.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h3&gt;
  
  
  Consuming from Kafka
&lt;/h3&gt;

&lt;p&gt;Consumers in Kafka are organized into &lt;em&gt;consumer groups&lt;/em&gt;. A consumer group is a set of consumer instances that consume data from partitions in a topic.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Fconsumer-groups.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdevshawn.com%2Fcontent%2Fimages%2F2019%2F05%2Fconsumer-groups.svg" alt="Apache Kafka: An Introduction"&gt;&lt;/a&gt;Example consumer group with three consumers reading from a topic with three partitions&lt;/p&gt;

&lt;p&gt;Consumers read from a single partition at a time, which allows us to scale the number of consumers to the number of partitions to increase the consumption throughput. Each consumer within a consumer group for a topic reads from a unique partition. The group as a whole then consumes all messages from the entire topic.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;For example, imagine a topic with &lt;code&gt;6&lt;/code&gt; partitions. If you have &lt;code&gt;6&lt;/code&gt; consumers in a consumer group, each consumer will read from &lt;code&gt;1&lt;/code&gt; partition. If you have &lt;code&gt;12&lt;/code&gt;, six of the consumers will be idle while the other six consume from &lt;code&gt;1&lt;/code&gt; partition. If you have &lt;code&gt;3&lt;/code&gt; consumers, each consumer will read from &lt;code&gt;2&lt;/code&gt; partitions. If you had &lt;code&gt;1&lt;/code&gt; consumer, it would read from all of the partitions.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Each consumer group reads from a topic independent of any other consumer group. This allows for many systems (each having their own consumer group) to read every message in the topic, unlike consuming messages from a traditional message queue.&lt;/p&gt;

&lt;p&gt;It's important to note that &lt;strong&gt;ordering&lt;/strong&gt; within a topic is only guaranteed for each partition. Thus, if you care about the order of records, it's important to partition based on something that preserves ordering (such as a primary key) or to only use one partition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;This post was only a simple introduction to the key concepts of Kafka. We'll dig deeper into the internals of Kafka, the guarantees it makes, real-world use cases, and in-depth tutorials on how to use Kafka in further posts.&lt;/p&gt;

&lt;p&gt;Overall, Kafka is quickly becoming the backbone of many organization's data pipelines. It allows for massive throughput of messages while maintaining stability. It enables decoupling of producers and consumers for a flexible and adaptive architecture. Lastly, it provides reliability, consistency, and durability guarantees that many traditional message queue systems do not. I hope you enjoyed learning about how Kafka can be a useful tool when building large-scale data platforms!&lt;/p&gt;

</description>
      <category>kafka</category>
      <category>architecture</category>
      <category>beginners</category>
    </item>
    <item>
      <title>Kafka Shell - Supercharge Your Apache Kafka CLI</title>
      <dc:creator>Shawn Seymour</dc:creator>
      <pubDate>Wed, 20 Mar 2019 16:37:24 +0000</pubDate>
      <link>https://dev.to/devshawn/kafka-shell---supercharge-your-apache-kafka-cli-2keg</link>
      <guid>https://dev.to/devshawn/kafka-shell---supercharge-your-apache-kafka-cli-2keg</guid>
      <description>&lt;h1&gt;
  
  
  Kafka Shell
&lt;/h1&gt;

&lt;p&gt;Are you working with the Apache Kafka command line tools? Ever had trouble remembering what options are available, or remembering URLs for your clusters? Kafka shell to the rescue!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/devshawn/kafka-shell" rel="noopener noreferrer"&gt;Kafka Shell&lt;/a&gt; is a supercharged, interactive Kafka shell that is built on top of the existing Kafka command line tools. It features auto completion, auto suggestion from history, key commands, and much more. It's an open source project I just released, built with Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Features
&lt;/h2&gt;

&lt;p&gt;Auto completion of Kafka commands, options, and configuration options.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://camo.githubusercontent.com/33e27a2cab87d8e8eddeb1ba10e6cc2f75ec980f/68747470733a2f2f692e696d6775722e636f6d2f666b777a4f6b762e706e67" class="article-body-image-wrapper"&gt;&lt;img src="https://camo.githubusercontent.com/33e27a2cab87d8e8eddeb1ba10e6cc2f75ec980f/68747470733a2f2f692e696d6775722e636f6d2f666b777a4f6b762e706e67" alt="Auto Completion"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Configuration of clusters, schema registries, and properties that will be automatically added to commands being run.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://camo.githubusercontent.com/5202bd45b24b607602fd10a7c80e0a8ebec13336/68747470733a2f2f692e696d6775722e636f6d2f334a6a4978794c2e706e67" class="article-body-image-wrapper"&gt;&lt;img src="https://camo.githubusercontent.com/5202bd45b24b607602fd10a7c80e0a8ebec13336/68747470733a2f2f692e696d6775722e636f6d2f334a6a4978794c2e706e67" alt="Configuration"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Supported Commands
&lt;/h2&gt;

&lt;p&gt;Kafka shell currently supports a lot of most popular Kafka command line tools, such as Kafka-topics, kafka-console-consumer, and more. I plan to add the rest after I get some initial feedback!&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;kafka-topics&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-configs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-console-consumer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-console-producer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-avro-console-consumer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-avro-console-producer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-verifiable-consumer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-verifiable-producer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-broker-api-versions&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-consumer-groups&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-delete-records&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-log-dirs&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-dump-log&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;kafka-acls&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;ksql&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Get It Now
&lt;/h2&gt;

&lt;p&gt;Let me know what you think -- and I hope this can help improve working with Apache Kafka. :) &lt;/p&gt;


&lt;div class="ltag-github-readme-tag"&gt;
  &lt;div class="readme-overview"&gt;
    &lt;h2&gt;
      &lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev.to%2Fassets%2Fgithub-logo-5a155e1f9a670af7944dd5e12375bc76ed542ea80224905ecaf878b9157cdefc.svg" alt="GitHub logo"&gt;
      &lt;a href="https://github.com/devshawn" rel="noopener noreferrer"&gt;
        devshawn
      &lt;/a&gt; / &lt;a href="https://github.com/devshawn/kafka-shell" rel="noopener noreferrer"&gt;
        kafka-shell
      &lt;/a&gt;
    &lt;/h2&gt;
    &lt;h3&gt;
      ⚡A supercharged, interactive Kafka shell built on top of the existing Kafka CLI tools.
    &lt;/h3&gt;
  &lt;/div&gt;
  &lt;div class="ltag-github-body"&gt;
    
&lt;div id="readme" class="md"&gt;
&lt;div class="markdown-heading"&gt;
&lt;h1 class="heading-element"&gt;kafka-shell&lt;/h1&gt;
&lt;/div&gt;
&lt;p&gt;&lt;a href="https://travis-ci.org/devshawn/kafka-shell" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/d6ae0496f43511681bf420783146d97287e3124c35aa5073d8280b25d2a375e2/68747470733a2f2f7472617669732d63692e6f72672f646576736861776e2f6b61666b612d7368656c6c2e7376673f6272616e63683d6d6173746572" alt="Build Status"&gt;&lt;/a&gt; &lt;a href="https://codecov.io/gh/devshawn/kafka-shell" rel="nofollow noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/ffee18c51aae9674ef1c68a8d6f37ac4296c66e42b0bca26224b339092c17684/68747470733a2f2f636f6465636f762e696f2f67682f646576736861776e2f6b61666b612d7368656c6c2f6272616e63682f6d61737465722f67726170682f62616467652e737667" alt="codecov"&gt;&lt;/a&gt; &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/1f004e73a074f98fe69abd4ab61842f08aef7c4c51dda76fcd2fcdde4372d2c4/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f6b61666b612d7368656c6c2e7376673f636f6c6f723d626c7565"&gt;&lt;img src="https://camo.githubusercontent.com/1f004e73a074f98fe69abd4ab61842f08aef7c4c51dda76fcd2fcdde4372d2c4/68747470733a2f2f696d672e736869656c64732e696f2f707970692f762f6b61666b612d7368656c6c2e7376673f636f6c6f723d626c7565" alt="PyPI"&gt;&lt;/a&gt; &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/a80cc2d49f6b15bd2520a5acd34b9fb572bd45b29d523a69fcbe475e6a5cbd34/68747470733a2f2f696d672e736869656c64732e696f2f707970692f707976657273696f6e732f6b61666b612d7368656c6c2e737667"&gt;&lt;img src="https://camo.githubusercontent.com/a80cc2d49f6b15bd2520a5acd34b9fb572bd45b29d523a69fcbe475e6a5cbd34/68747470733a2f2f696d672e736869656c64732e696f2f707970692f707976657273696f6e732f6b61666b612d7368656c6c2e737667" alt="PyPI - Python Version"&gt;&lt;/a&gt; &lt;a href="https://github.com/devshawn/kafka-shellLICENSE" rel="noopener noreferrer"&gt;&lt;img src="https://camo.githubusercontent.com/859a1a0bc85ce8bbd7a730a274fec5c9e77c4726ffdf6aa762a78685e26033a4/68747470733a2f2f696d672e736869656c64732e696f2f62616467652f4c6963656e73652d417061636865253230322e302d626c75652e737667" alt="License"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;A supercharged, interactive Kafka shell built on top of the existing Kafka CLI tools.&lt;/p&gt;
&lt;p&gt;
    &lt;a rel="noopener noreferrer nofollow" href="https://camo.githubusercontent.com/8cf8065a3fe5e43f08aeac082f0f429c73e7f30ce95017210252e948f79951de/68747470733a2f2f692e696d6775722e636f6d2f62316f4e545a5a2e706e67"&gt;&lt;img src="https://camo.githubusercontent.com/8cf8065a3fe5e43f08aeac082f0f429c73e7f30ce95017210252e948f79951de/68747470733a2f2f692e696d6775722e636f6d2f62316f4e545a5a2e706e67"&gt;&lt;/a&gt;
&lt;/p&gt;

&lt;p&gt;Kafka shell allows you to configure a list of clusters, and properties such as &lt;code&gt;--bootstrap-server&lt;/code&gt; and &lt;code&gt;--zookeeper&lt;/code&gt; for the currently selected cluster will automatically be added when the command is run. No more remembering long server addresses or ports!&lt;/p&gt;

&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Installation&lt;/h2&gt;
&lt;/div&gt;

&lt;p&gt;Kafka shell requires &lt;code&gt;python&lt;/code&gt; and &lt;code&gt;pip&lt;/code&gt;. Kafka shell is a wrapper over the existing Kafka command-line tools, so
those must exist within your &lt;code&gt;PATH&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;You can install kafka-shell using pip:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;pip install kafka-shell&lt;/pre&gt;

&lt;/div&gt;
&lt;div class="markdown-heading"&gt;
&lt;h2 class="heading-element"&gt;Usage&lt;/h2&gt;
&lt;/div&gt;
&lt;p&gt;Kafka shell is an interactive shell. You can run it from the terminal by:&lt;/p&gt;
&lt;div class="highlight highlight-source-shell notranslate position-relative overflow-auto js-code-highlight"&gt;
&lt;pre&gt;kafka-shell&lt;/pre&gt;

&lt;/div&gt;
&lt;p&gt;From here, you can start typing &lt;code&gt;kafka&lt;/code&gt; and the autocomplete will kick in.&lt;/p&gt;
&lt;p&gt;&lt;strong&gt;Key Commands&lt;/strong&gt;&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Change Cluster&lt;/strong&gt;: The selected cluster commands are run against can be cycled through by pressing &lt;code&gt;F2&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Fuzzy Search&lt;/strong&gt;: By default, fuzzy search of commands is enabled…&lt;/li&gt;
&lt;/ul&gt;
&lt;/div&gt;
  &lt;/div&gt;
  &lt;div class="gh-btn-container"&gt;&lt;a class="gh-btn" href="https://github.com/devshawn/kafka-shell" rel="noopener noreferrer"&gt;View on GitHub&lt;/a&gt;&lt;/div&gt;
&lt;/div&gt;


</description>
      <category>kafka</category>
      <category>apache</category>
      <category>cli</category>
      <category>showdev</category>
    </item>
  </channel>
</rss>
