DEV Community

Cover image for Apache Kafka Deep Dive: Core Concepts, Data Engineering Applications, and Real-World Production Practices
Eric Kahindi
Eric Kahindi

Posted on

Apache Kafka Deep Dive: Core Concepts, Data Engineering Applications, and Real-World Production Practices

Welcome to the world of data streaming. The world of Kafka. Leave all of your previous data storage knowledge and prepare to "subscribe" to
a whole new line of thought

What is Kafka

Apache Kafka was founded by LinkedIn and was later adopted by Apache under the open source license. This means you can take the Kafka and modify it however you like to suit your needs.

It is an open-source data streaming platform that uses the publish and subscribe model to decouple applications and reduce dependency on them.
It does this by keeping logs of events between microservices in an application

Scenario

Imagine you are the owner of a certain business with the structure shown below

Everything might seem ok at first, but what happens when one node in the system fails?
Let's take, for instance, the payment microservice fails. This situation might leave your clients waiting on a loading screen once the order is placed for a payment fulfillment that will never come.
Worse still, this order might be logged onto the analytics section, corrupting your data

This is where Kafka shines
It places itself between these microservices to act as a middleman between these services, which,

  • Eliminates the single points of failure
  • Improves recoverability

Kafka core concepts

There are a few concepts you may need to wrap your head around when dealing with Kafka
Publish-subscribe model - a messaging architectural pattern where applications that generate data (publishers) send messages to an intermediary called a message broker without knowing who will receive them.

  • This leads to the decoupling advantage above
  • It also brings about interoperability as systems only need to talk to Kafka instead of creating custom integrations for each system

Event-first architecture - This way of thinking shifts the focus from requesting data or calling services to reacting to facts and business events as they occur.

Kafka Architectural elements

Ok, now that that's out of the way, let's peer into the inner workings of Kafka by briefly describing its constitution.

  • Record - This, also known as an event, is the actual message that is produced by the publishing microservice
  • Producer - This is the publishing microservice that creates and sends the message to Kafka
  • Consumer - This is the subscribing microservice that listens for and receives messages that come from Kafka
  • Topic - This is the Kafka equivalent of a database table. It is an immutable log of events that a consumer and producer subscribe to and publish to. You can have an orders topic for the orders microservice
  • Broker - This is the actual server on which Kafka runs
  • Kafka cluster - A single Kafka instance can run on multiple servers (nodes/brokers). These servers make up the Kafka cluster
  • Partitioning - This is the logical subdivision of a topic into various nodes in the Kadka cluster
  • Replication - Partitions can be copied across multiple nodes to create replicas that can act as backups in case one node fails
  • Zookeeper - Partitioning and replication can be tricky business, especially when it comes to issues of data consistency. Zookeeper is an external resource that solves this problem, handling the coordination and synchronization of Kafka.
  • Kraft - This is a newer internal part of Kafka that came out on Kafka 3.3, which handles the Zookeeper functions, eliminating Kafka's dependence on it.
  • Consumer Group - Suppose your system leverages replication of microservices to improve scalability and accessibility. Production may straight forward as each producer microservice replica just sends a different record to a topic. But then, if producer replicas subscribe to that topic, then they each receive all the messages in sequential order. This is a problem Consumer groups help solve this issue, here's how. Multiple consumer instances that belong to the same logical application or service are configured with the same group ID. This group ID identifies them as part of the same consumer group. This allows Kafka the ability to coordinate the consumption of messages such that they are processed in parallel by separate consumer microservice replicas.

Set up

Awesome, now that we're all caught up, let's get hands-on by installing and running an instance of Kafka in our terminal

Note
Kafka on the console should not be used in a production environment unless it is absolutely necessary to. AVOID in production at all costs

But since we're running a simple single-partition Kafka instance on our pc, it should be fine. Follow the steps below.

Install java

sudo apt install default-jre
Enter fullscreen mode Exit fullscreen mode

Confirm Java installation

java --version
Enter fullscreen mode Exit fullscreen mode

Make sure to use Java 11 or 17

The commands below download Kafka, unzip it, then rename the kafka folder (to a more readable name, Kafk, which will act as Kafka's home directory). Finally, move into the Kafka directory

wget https://archive.apache.org/dist/kafka/3.3.1/kafka_2.13-3.3.1.tgz
tar -xzvf kafka_2.13-3.3.1.tgz 
mv kafka_2.13-3.3.1/ kafka/
cd kafka/
Enter fullscreen mode Exit fullscreen mode

Start up zookeeper

bin/zookeeper-server-start.sh config/zookeeper.properties
Enter fullscreen mode Exit fullscreen mode

Start up Kafka itself (ie, Kafka server)

bin/kafka-server-start.sh config/server.properties
Enter fullscreen mode Exit fullscreen mode

Now, let's create a test topic in Kafka

bin/kafka-topics.sh --create   --topic test-topic   --bootstrap-server localhost:9092   --partitions 1   --replication-factor 1
Enter fullscreen mode Exit fullscreen mode

Create a producer

bin/kafka-console-producer.sh --topic test-topic --bootstrap-server localhost:9092
Enter fullscreen mode Exit fullscreen mode

This will open up an interactive console where you can type in messages on your created test topic

Create a consumer in a different terminal

bin/kafka-console-consumer.sh --topic test-topic --from-beginning --bootstrap-server localhost:9092
Enter fullscreen mode Exit fullscreen mode

Now, when you type anything on the producer's interactive terminal, it will show up on the producer's side. You are now streaming data with Kafka!

Use cases

Kafka console - (special case)
Consumers and producers are usually applications or microservices; however, the Kafka console allows the developer to be the producer.

This is not ideal in production because any wrong or misspelled record made by a developer or producer becomes an irreversible record in that topic.
Furthermore, if multiple microservices subscribe to this topic, it may trigger an unwanted chain of events that may be catastrophic

Nevertheless, it's still a useful feature in some scenarios

Testing
If you need to confirm whether or not the Kafka service that you want is working as expected, just as we did in the setup section.
This means,

  • You have deployed a new cluster and want to try it out
  • You are debugging an issue with an existing cluster.

Backfilling data
Suppose your orders microservice crashed before it shut down, and a few orders were placed before they could be pushed to the Topic. Not to worry, you have a backup saved in a CSV file in case of failure, you can simply run

kafka-console-producer \
    --topic example-topic \
    --bootstrap-server localhost:9092 \
    < your_prepared_backorders.csv
Enter fullscreen mode Exit fullscreen mode

Provided the schema aligns with the topic schema

Real-world use cases

Netflix
When you think about Netflix, you think about instant access to your favorite movie or series. But what happens behind the scenes when you hit that “Play” button?
Netflix uses Kafka to handle real-time monitoring and event processing across its entire platform.

Every time you play, pause, fast-forward, or even hover over a title, an event is generated.

Kafka acts as the middleman, transporting billions of these events per day into different services:

  • Recommendations engine – to suggest what you should watch next

  • Quality of Service (QoS) monitoring – to make sure the video resolution adjusts smoothly to your network

  • Operational alerts – so engineers can act if something breaks in the delivery pipeline

Without Kafka, the massive amount of real-time events would overwhelm individual services. By centralizing these events, Netflix achieves scalability, resilience, and real-time personalization.

Uber
Uber is not just a ride-hailing app. It’s a real-time logistics platform moving people, food, and even packages around cities worldwide.

Here’s how Kafka fits in:

Every trip generates a constant stream of GPS events from both driver and rider apps.

Kafka ingests and streams these events in real-time to different services:

  • Matching service – to connect you with the nearest driver

  • ETA calculation – to update arrival times dynamically as traffic changes

  • Surge pricing – to adjust fares instantly during high demand

  • Fraud detection – to flag suspicious activity as it happens

Kafka enables Uber to handle millions of concurrent, low-latency events across geographies, ensuring rides, deliveries, and payments work seamlessly without bottlenecks.

Production Best Practices

Running Kafka on your laptop is fun for demos, but in production the story is very different. Major companies have learned (sometimes the hard way) that to keep Kafka reliable at scale, you need to follow certain best practice

Partitioning and replication
In production, Companies run clusters of multiple brokers spread across different machines or even data centers.

Topics are partitioned for horizontal scalability and replicated (usually with a replication factor of 3) for fault tolerance.

This way, even if one broker crashes, the cluster can continue without data loss.

Retention Policy and capacity
Topics can grow endlessly in production, which is bad when your machine has finite memory. Therefore, it is good to have a sort of garbage collection device ready
In production teams, set retention policies (e.g., 7 days or 30 days) to automatically delete old messages.

They tune log compaction to retain only the latest value for each key when needed.

Storage, disk throughput, and network bandwidth are carefully planned before scaling up.

References

Kafka Tutorial for Beginners | Everything you need to get started, TechWorld with Nana
Kafka Tutorial—Multi-chapter guide, RedPanda
Featuring Apache Kafka in the Netflix Studio and Finance World, Confluent
Kafka retention—Types, challenges, alternatives, Red Panda

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.