DEV Community

Cover image for All about Kafka
poojave
poojave

Posted on • Updated on

All about Kafka

This post will give a brief about Kafka technology and suitable for beginner audience. Moving ahead I will be sharing Knowledge on top of it.

What you should expect by this post?

Background - What kafka is needed?
What it takes to understand Kafka?
Downside of using kafka?
How kafka works?
Best practices
How can you be more familiar with Kafka?

1. Background - Why is kafka needed?

"Realtime" "Ordering" "Persistence" "Scalability" "Distributed" are the core requirement use cases of Kafka.

Few examples of such usecases:

  • Financial transaction in stock exchanges
  • Building logging/analytics systems
  • Chat application
  • Flash sales

2. What it takes to understand Kafka?

Kafka was developed by Linkedin first and donated to Apache later as "open source" project.
How do we operate before Kafka? We somehow use to manage with Messaging queue only.
How messaging queues are different from kafka?

  • No flexibility of data retention period cum persistence.
  • Clients can read messages as per their connivence unlike queue.

This interface helps to understand kafka.
Image description

Basic demo
Data on Local set up
Use this as reference

3. Downside of using kafka?

  • Difficult to manage in production
  • Difficult to manage while migration from on-premise to cloud, one cloud provider to another cloud provider
  • You need experts like Kafka developer or database developer in some cases. Hence, Choose wisely to use Kafka unless you see the specific need for it and proper scale.

4. How kafka works?

Kafka works in two ways.

  1. [FREE] Self managed - Apache Kafka
  2. [PAID] Fully managed - Confluent Kafka

Image description

Image description

Here Consumer Group 1 has two consumer A & B. And the partitions are distributed amongst these.
Whereas Consumer group 2 has one consumer only. It subscribe to all the partitions.
In order to make kafka highly available, The partitions are distributed across brokers and the replicated copies of data are maintained. With this we have the leader and follower concept, which essentially means all the updates are first received by leader server and then by rest of followers.

If leader server goes down, new leader is elected amongst follower brokers. Please note that this is one of the responsibility of Zookeeper.

Kafka Simulation

5. Best practices

  1. Replication factor
  2. Partition count

Image description
More replication factor and huge partition count will demand you more CPU and memory. Be conscious in choosing it.

  1. Retention period

Image description
Retention period can vary from minutes to hours to day and in some cases it can go to infinite as well. But It would cause extra disc on the brokers.

  1. Clean up policy

Image description
Disable things like automatic topic. You can also add policies like automatic topic deletion if not seen any data from alst 30 days.

  1. Compression

Image description
Compression and decompression will cause extra CPU cycles at both consumer and producer end.

6. KIP(Kafka improvement process)- 500

  1. Still under road to success
  2. Available by Kafka version 2.8.0.
  3. One of the server acts as a metadata management house like zookeeper.
  4. Overhead of maintaining same deployment at two places cluster & zookeeper.

Image description

7. How can you be familiar with Kafka?

Read about kafka via official docs only.
You can contribute in issues solving by using this jira dashboard.
Read and keep yourself up-to-date by following kafka summit, videos and awesome community.

Image description

Image description

Image description

Use Kadeck for free to maintain your data/topics and visualise it as GUI.

Top comments (0)