Automate Kafka topic creation on an AWS Managed Kafka cluster via terraform

ABOUT

In this article, we will cover how to automate Kafka topic creation, based on user input without enabling auto.create.topics.enable feature in Kafka. To achieve the above we will go ahead with:

Cloud platform : AWS
IAC : Terraform
Services : S3, EC2, IAM
Scripting : Shell

WHAT ARE KAFKA TOPICS?

A topic is a base entity / unit of the entire Kafka system that contains streams of similar kind of data, distributed amongst partitions (can be treated as a queue) inside them.

WHY IS THIS AUTOMATION REQUIRED ?

Having an automation in place to create topic based on user inputs prevents enabling auto.create.topics.enable.
This provides us with the following benefits:

Prevents inadvertently creating new topics, especially due to typos.
Allows creation of topic with custom configurations of partition, replication factor etc.
Minimize costs associated with unused topics.

APPROACH: IN A NUTSHELL…

DETAILED STEPS TO ACHIEVE AUTOMATION

Step 1: Shell script to create Kafka topic via Kafka CLI commands.
For reference,

docs.confluent.io

Create a variable in terraform to input topic related configuration from user

variable "topic_config" {
  description = <<EOF
    ex:
    topic_config = [{
      name = "topic_sample",
      replication = 2,
      partitions = 1,
      retention  = 604800000
    }]
  EOF
  type = list(object({
    name        = string
    replication = optional(number, 3)
    partitions  = optional(number, 1)
    retention   = optional(number, 604800000)
  }))
}

Create shell script (or lang of your choice)template to execute commands for cli installation & topic creation- the values for variables will be sent from data block when the script will be called in terraform

#!/bin/bash

# Install dependencies
sudo yum -y install java-11

# Install Kafka CLI
cd /opt
wget https://archive.apache.org/dist/kafka/${KAFKA_VERSION}/kafka_2.13-${KAFKA_VERSION}.tgz
tar -xzf kafka_2.13-${KAFKA_VERSION}.tgz

cd kafka_2.13-${KAFKA_VERSION}/libs
wget https://github.com/aws/aws-msk-iam-auth/releases/download/v1.1.1/aws-msk-iam-auth-1.1.1-all.jar

# Extract details from variables
data=$(echo '${TOPIC_DATA}' | sed 's/\[//;s/\]//;s/},{/}\n{/g')

# Loop through JSON objects
while IFS= read -r line; do
  topic_name=$(echo "$line" | grep -o '"name":"[^"]*' | cut -d '"' -f 4)
  replication=$(echo "$line" | grep -o '"replication":[0-9]*' | awk -F ':' '{print $2}')
  partition=$(echo "$line" | grep -o '"partitions":[0-9]*' | awk -F ':' '{print $2}')
  retention=$(echo "$line" | grep -o '"retention":[0-9]*' | awk -F ':' '{print $2}')


# Create kafka topic

  kafka_2.13-${KAFKA_VERSION}/bin/kafka-topics.sh \
    --create \
    --bootstrap-server ${BROKER_URI} \
    --command-config kafka_2.13-${KAFKA_VERSION}/bin/client.properties \
    --replication-factor $replication \
    --partitions $partition \
    --config retention.ms=$retention \
    --topic $topic_name
    fi
done <<< "$data"

shutdown -h now // Post execution shutting down ec2 instance to save cost

Use data block in terraform to pass on user input to the shell script template created above for creating Kafka topic

data "template_file" "user_data" {
  template = file("${path.module}/templates/kafka_scripts.sh.tpl") // Path in your terraform script where the above created shell script is stored, followed by name of your script.

  vars = {
    TOPIC_DATA    = jsonencode(var.topic_config) // user input
    KAFKA_VERSION = var.msk_version // provide value according to MSK managed instance used on AWS
    BROKER_URI    = var.broker_uri // Broker URI can be fetched from AWS Console. AWS Managed Instance --> View Client Information
  }
}

Step 2: Create an S3 bucket & Upload this script on it
For reference, s3 bucket creation via terraform:

registry.terraform.io

Create s3 bucket object to upload script

bucket  = <enter your s3 bucket id, where script needs to be stored>
  key     = "script.sh" 
  content = data.template_file.user_data.rendered // Render the script created above
}

Step 3: Create an EC2 instance and configure the user data to copy the script from S3 Bucket
[Tip] EC2 instance type need not be very high as it will be created on temp basis to execute the script
[Tip] Delete the instance once script is executed to save costs

For reference, EC2 instance creation via terraform:

registry.terraform.io

Update the user data of the EC2 instance to copy Kafka topic creation shell script from s3 bucket & execute it

resource "aws_instance" "ephemeral_instance" {
 . . .  
 . 
 . 
 .
  instance_type          = "t3a.medium"

  # Terminate instance on shutdown
  instance_initiated_shutdown_behavior = "terminate"

  user_data = <<-EOF
              #!/bin/bash
              sudo su -
              aws s3 cp s3://<your-bucket-name>/<kafka-script-name>.sh .
              chmod +x <kafka-script-name>.sh
              ./<kafka-script-name>.sh
              EOF
 .
 . 
 . 
 . . .
}

PERMISSIONS

[Tip] Follow least privilege best practice - only allow actions that are necessary

Following permissions have to be provided for EC2 and S3 bucket to interact

Allow EC2 to copy files from S3 Bucket (attach this to role assumed by ec2 instance)

data "aws_iam_policy_document" "s3_allow" {
  statement {
    effect  = "Allow"
   actions = var.allowed_actions

    resources = [
      var.s3_bucket_arn,         //arn of the s3 bucket where script is uploaded
      "${var.s3_bucket_arn}/*",  // Allow all objects inside it too
    ]
  }
}

Where,

Allowed actions: [    // Modify action list as per requirement
   "s3:GetObject",
    "s3:PutObject",
    "s3:ListBucket",
    "s3:ListMultipartUploadParts",
    "s3:ListBucketMultipartUploads",
    "s3:AbortMultipartUpload"
  ]

Update S3 bucket policy to allow actions from EC2 instance (principal)

data "aws_iam_policy_document" "s3_bucket_policy" {
    statement {
     sid = "Allows3actions"
     effect    = "Allow"
     actions = var.allowed_actions //specified above
     resources = [
       "arn:aws:s3:::<name-of-your-s3-bucket>",
       "arn:aws:s3:::<name-of-your-s3-bucket>/*" // include objects inside bucket
     ]
     principals {
       type        = "Service"
       identifiers = ["ec2.amazonaws.com"] // Allow ec2
     }
   }
}

CONCLUSION

In summary, automating Kafka topic creation on an AWS Managed Kafka cluster with Terraform offers a reliable solution for managing your Kafka environment. By using AWS services like S3, EC2, and IAM, you can avoid the risks of unintended topic creation, customize configurations, and reduce costs associated with unused topics. This approach not only streamlines your data management but also enhances control and flexibility, setting the stage for a more efficient and scalable Kafka implementation.

Top comments (1)

Saargunam Venkateshan • Oct 4 '24

Very precise and concise blog post. Awesome Nistha !