poojave

Posted on Nov 15, 2022 • Updated on Dec 5, 2022

All about Kafka

#kafka

This post will give a brief about Kafka technology and suitable for beginner audience. Moving ahead I will be sharing Knowledge on top of it.

What you should expect by this post?

Background - What kafka is needed?
What it takes to understand Kafka?
Downside of using kafka?
How kafka works?
Best practices
How can you be more familiar with Kafka?

1. Background - Why is kafka needed?

"Realtime" "Ordering" "Persistence" "Scalability" "Distributed" are the core requirement use cases of Kafka.

Few examples of such usecases:

Financial transaction in stock exchanges
Building logging/analytics systems
Chat application
Flash sales

2. What it takes to understand Kafka?

Kafka was developed by Linkedin first and donated to Apache later as "open source" project.
How do we operate before Kafka? We somehow use to manage with Messaging queue only.
How messaging queues are different from kafka?

No flexibility of data retention period cum persistence.
Clients can read messages as per their connivence unlike queue.

This interface helps to understand kafka.

Basic demo
Data on Local set up
Use this as reference

3. Downside of using kafka?

Difficult to manage in production
Difficult to manage while migration from on-premise to cloud, one cloud provider to another cloud provider
You need experts like Kafka developer or database developer in some cases. Hence, Choose wisely to use Kafka unless you see the specific need for it and proper scale.

4. How kafka works?

Kafka works in two ways.

[FREE] Self managed - Apache Kafka
[PAID] Fully managed - Confluent Kafka

Here Consumer Group 1 has two consumer A & B. And the partitions are distributed amongst these.
Whereas Consumer group 2 has one consumer only. It subscribe to all the partitions.
In order to make kafka highly available, The partitions are distributed across brokers and the replicated copies of data are maintained. With this we have the leader and follower concept, which essentially means all the updates are first received by leader server and then by rest of followers.

If leader server goes down, new leader is elected amongst follower brokers. Please note that this is one of the responsibility of Zookeeper.

Kafka Simulation

5. Best practices

Replication factor
Partition count

More replication factor and huge partition count will demand you more CPU and memory. Be conscious in choosing it.

Retention period

Retention period can vary from minutes to hours to day and in some cases it can go to infinite as well. But It would cause extra disc on the brokers.

Clean up policy

Disable things like automatic topic. You can also add policies like automatic topic deletion if not seen any data from alst 30 days.

Compression

Compression and decompression will cause extra CPU cycles at both consumer and producer end.

6. KIP(Kafka improvement process)- 500

Still under road to success
Available by Kafka version 2.8.0.
One of the server acts as a metadata management house like zookeeper.
Overhead of maintaining same deployment at two places cluster & zookeeper.

7. How can you be familiar with Kafka?

Read about kafka via official docs only.
You can contribute in issues solving by using this jira dashboard.
Read and keep yourself up-to-date by following kafka summit, videos and awesome community.

Use Kadeck for free to maintain your data/topics and visualise it as GUI.

DEV Community

All about Kafka

What you should expect by this post?

1. Background - Why is kafka needed?

2. What it takes to understand Kafka?

3. Downside of using kafka?

4. How kafka works?

5. Best practices

6. KIP(Kafka improvement process)- 500

7. How can you be familiar with Kafka?

Top comments (0)

Read next

How to Process Events Exactly-Once with Kafka and DBOS

What is Kafka Connect?

Introduction to Apache Kafka Error Handling (Springboot)

Outbox Pattern in Spring Boot 3 and Apache Kafka