ABOUT
In this article, we will cover how to automate Kafka topic creation, based on user input without enabling auto.create.topics.enable
feature in Kafka. To achieve the above we will go ahead with:
- Cloud platform : AWS
- IAC : Terraform
- Services : S3, EC2, IAM
- Scripting : Shell
WHAT ARE KAFKA TOPICS?
A topic is a base entity / unit of the entire Kafka system that contains streams of similar kind of data, distributed amongst partitions (can be treated as a queue) inside them.
WHY IS THIS AUTOMATION REQUIRED ?
Having an automation in place to create topic based on user inputs prevents enabling auto.create.topics.enable
.
This provides us with the following benefits:
- Prevents inadvertently creating new topics, especially due to typos.
- Allows creation of topic with custom configurations of partition, replication factor etc.
- Minimize costs associated with unused topics.
APPROACH: IN A NUTSHELL…
DETAILED STEPS TO ACHIEVE AUTOMATION
Step 1: Shell script to create Kafka topic via Kafka CLI commands.
For reference,
- Create a variable in terraform to input topic related configuration from user
variable "topic_config" {
description = <<EOF
ex:
topic_config = [{
name = "topic_sample",
replication = 2,
partitions = 1,
retention = 604800000
}]
EOF
type = list(object({
name = string
replication = optional(number, 3)
partitions = optional(number, 1)
retention = optional(number, 604800000)
}))
}
- Create shell script (or lang of your choice)template to execute commands for cli installation & topic creation- the values for variables will be sent from data block when the script will be called in terraform
#!/bin/bash
# Install dependencies
sudo yum -y install java-11
# Install Kafka CLI
cd /opt
wget https://archive.apache.org/dist/kafka/${KAFKA_VERSION}/kafka_2.13-${KAFKA_VERSION}.tgz
tar -xzf kafka_2.13-${KAFKA_VERSION}.tgz
cd kafka_2.13-${KAFKA_VERSION}/libs
wget https://github.com/aws/aws-msk-iam-auth/releases/download/v1.1.1/aws-msk-iam-auth-1.1.1-all.jar
# Extract details from variables
data=$(echo '${TOPIC_DATA}' | sed 's/\[//;s/\]//;s/},{/}\n{/g')
# Loop through JSON objects
while IFS= read -r line; do
topic_name=$(echo "$line" | grep -o '"name":"[^"]*' | cut -d '"' -f 4)
replication=$(echo "$line" | grep -o '"replication":[0-9]*' | awk -F ':' '{print $2}')
partition=$(echo "$line" | grep -o '"partitions":[0-9]*' | awk -F ':' '{print $2}')
retention=$(echo "$line" | grep -o '"retention":[0-9]*' | awk -F ':' '{print $2}')
# Create kafka topic
kafka_2.13-${KAFKA_VERSION}/bin/kafka-topics.sh \
--create \
--bootstrap-server ${BROKER_URI} \
--command-config kafka_2.13-${KAFKA_VERSION}/bin/client.properties \
--replication-factor $replication \
--partitions $partition \
--config retention.ms=$retention \
--topic $topic_name
fi
done <<< "$data"
shutdown -h now // Post execution shutting down ec2 instance to save cost
- Use data block in terraform to pass on user input to the shell script template created above for creating Kafka topic
data "template_file" "user_data" {
template = file("${path.module}/templates/kafka_scripts.sh.tpl") // Path in your terraform script where the above created shell script is stored, followed by name of your script.
vars = {
TOPIC_DATA = jsonencode(var.topic_config) // user input
KAFKA_VERSION = var.msk_version // provide value according to MSK managed instance used on AWS
BROKER_URI = var.broker_uri // Broker URI can be fetched from AWS Console. AWS Managed Instance --> View Client Information
}
}
Step 2: Create an S3 bucket & Upload this script on it
For reference, s3 bucket creation via terraform:
- Create s3 bucket object to upload script
bucket = <enter your s3 bucket id, where script needs to be stored>
key = "script.sh"
content = data.template_file.user_data.rendered // Render the script created above
}
Step 3: Create an EC2 instance and configure the user data to copy the script from S3 Bucket
[Tip] EC2 instance type need not be very high as it will be created on temp basis to execute the script
[Tip] Delete the instance once script is executed to save costs
For reference, EC2 instance creation via terraform:
- Update the user data of the EC2 instance to copy Kafka topic creation shell script from s3 bucket & execute it
resource "aws_instance" "ephemeral_instance" {
. . .
.
.
.
instance_type = "t3a.medium"
# Terminate instance on shutdown
instance_initiated_shutdown_behavior = "terminate"
user_data = <<-EOF
#!/bin/bash
sudo su -
aws s3 cp s3://<your-bucket-name>/<kafka-script-name>.sh .
chmod +x <kafka-script-name>.sh
./<kafka-script-name>.sh
EOF
.
.
.
. . .
}
PERMISSIONS
[Tip] Follow least privilege best practice - only allow actions that are necessary
Following permissions have to be provided for EC2 and S3 bucket to interact
- Allow EC2 to copy files from S3 Bucket (attach this to role assumed by ec2 instance)
data "aws_iam_policy_document" "s3_allow" {
statement {
effect = "Allow"
actions = var.allowed_actions
resources = [
var.s3_bucket_arn, //arn of the s3 bucket where script is uploaded
"${var.s3_bucket_arn}/*", // Allow all objects inside it too
]
}
}
Where,
Allowed actions: [ // Modify action list as per requirement
"s3:GetObject",
"s3:PutObject",
"s3:ListBucket",
"s3:ListMultipartUploadParts",
"s3:ListBucketMultipartUploads",
"s3:AbortMultipartUpload"
]
- Update S3 bucket policy to allow actions from EC2 instance (principal)
data "aws_iam_policy_document" "s3_bucket_policy" {
statement {
sid = "Allows3actions"
effect = "Allow"
actions = var.allowed_actions //specified above
resources = [
"arn:aws:s3:::<name-of-your-s3-bucket>",
"arn:aws:s3:::<name-of-your-s3-bucket>/*" // include objects inside bucket
]
principals {
type = "Service"
identifiers = ["ec2.amazonaws.com"] // Allow ec2
}
}
}
CONCLUSION
In summary, automating Kafka topic creation on an AWS Managed Kafka cluster with Terraform offers a reliable solution for managing your Kafka environment. By using AWS services like S3, EC2, and IAM, you can avoid the risks of unintended topic creation, customize configurations, and reduce costs associated with unused topics. This approach not only streamlines your data management but also enhances control and flexibility, setting the stage for a more efficient and scalable Kafka implementation.
Top comments (1)
Very precise and concise blog post. Awesome Nistha !