Decoding Kafka Part 2

#programming #javascript #backend #microservices

Hands-On & Under the Hood

In Part 1, we established that Kafka is the high-speed highway for data, handling real-time streams with high throughput. We covered the basic anatomy: Brokers, Topics, and Partitions.

Now, it’s time to get our hands dirty. In this part, we will spin up a local Kafka cluster, write code to produce and consume events, and—crucially—dive deeper into how Consumers actually track their progress using Offsets.

1. The Setup: Kafka via Docker Compose

Setting up Kafka manually can be complex (ZooKeeper, multiple brokers, etc.). To keep things clean, we’ll use Docker Compose. This allows us to spin up a Broker and a UI tool in a single command.

We will use the apache/kafka image, provectus/kafka-ui for monitoring and node.js runtime and kafkajs driver.

Download the docker-compose file on your system

docker-compose.yaml
Run the docker compose command

docker compose up -d

Thats it we've a running kafka broker, ready to churn out events 🙌

2. Managing Topics (The Admin API)

Before we send data, we need a destination. While you can let Kafka auto-create topics, defining them via the Admin API gives you control over two critical factors:

Partitions: How much parallelism do you need?
Replication Factor: How many copies of the data do you want for fault tolerance?


import { Kafka } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-app",
  brokers: ["localhost:9094"],
});

const createTopicIfNotExists = async () => {
  const admin = kafka.admin();
  const topicName = "Test-topic";

  try {
    await admin.connect();

    // 1. Get the list of existing topics
    const topics = await admin.listTopics();
    console.log("Existing topics:", topics);

    // 2. Check if the specific topic exists
    if (!topics.includes(topicName)) {
      console.log(`Topic "${topicName}" not found. Creating...`);

      // 3. Create the topic
      await admin.createTopics({
        topics: [
          {
            topic: topicName,
            numPartitions: 1,     // Adjust based on your needs
            replicationFactor: 1, // Adjust based on your broker count
          },
        ],
      });
      console.log(`Topic "${topicName}" created successfully.`);
    } else {
      console.log(`Topic "${topicName}" already exists.`);
    }

  } catch (error) {
    console.error("Error in admin operation:", error);
  } finally {
    // 4. Always disconnect
    await admin.disconnect();
  }
};

createTopicIfNotExists();

3. The Producer: Keys vs. No Keys

Now, let's write some events. In Kafka, how you send the message determines where it lands.

Scenario A: Sending without a Key

If you send a message with key=null, the producer creates a "Round Robin" effect. It distributes messages evenly across all available partitions. This is great for load balancing but implies no guarantee of order relative to other messages.

import { Kafka } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-app",
  brokers: ["localhost:9094"],
});

async function produceWithoutKey() {
  await producer.connect();
  await producer.send({
    topic: "Test-topic",
    messages: [
      {
        value: JSON.stringify({
          message: "Hello KafkaJS user!",
        }),
      },
    ],
  });

  await producer.disconnect();
}

produceWithoutKey(); // this will produce messages without a key

Scenario B: Sending with a Key

If you provide a Key (e.g., user_id or transaction_id), Kafka guarantees that all messages with the same key go to the same partition.

import { Kafka } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-app",
  brokers: ["localhost:9094"],
});

const producer = kafka.producer();
async function produceWithKey() {
  await producer.connect();
  await producer.send({
    topic: "Test-topic",
    messages: [
      {
        key: Math.random().toString(36).substring(2, 15),
        value: JSON.stringify({
          message: "Hello KafkaJS user!",
        }),
      },
    ],
  });

  await producer.disconnect();
}
produceWithKey(); // this will produce messages with a key

Why does this matter? If you are processing payment updates for User A, you need "Payment Initiated" to arrive before "Payment Completed." Sending both with key=User A ensures they land in the same partition and are read in order.

TIP 💡: Check out the script multiple-events.zsh —it generates events repeatedly to simulate high-volume data, giving you a feel for the scale Kafka is designed to handle.

4. The Consumer

Consuming messages is where the logic gets interesting. This isn't just about reading data; it's about tracking state. GroupId is an important parameter for consumer groups it helps sharing events in parallel, rebalancing manage consumption and much more. Typically, consumers run in a separate microservice so they can process events independently from the producer or other services.

import { Kafka } from "kafkajs";

const kafka = new Kafka({
  clientId: "my-app",
  brokers: ["localhost:9094"],
});

const consumer = kafka.consumer({ groupId: "Test-group" });

await consumer.connect();
await consumer.subscribe({ topic: "Test-topic" });



await consumer.run({
  eachMessage: async ({ topic, partition, message }) => {
    // processing as per need
    console.log({
      partition,
      offset: message.offset,
      value: JSON.parse(message.value.toString()),
      message: message,
      topic,
    });
  },
});

Understanding Offsets

As a consumer reads from a partition, it needs to keep track of its place. We call this the Offset.

TIP 💡: Consider the Offset as a bookmark while reading a book. It tells you exactly where you stopped reading so you can pick up from there next time (or after a crash).

There are three specific types of offsets you should know:

Log-End Offset: The offset of the very last message written to the partition (the newest data).
High-Watermark Offset: The point up to which all consumers are safe to read.
Committed Offset: The last offset the consumer successfully processed and reported done.

Consumer Lag

One of the most important metrics to watch is Consumer Lag. This is essentially the distance between the writer (Producer) and the reader (Consumer).

Consumer Lag = LogEndOffset - CommittedOffset

If your lag is high, your consumer is falling behind the producer 🚨

The Consumer Group Logic (Scaling)

When you start a consumer, you usually assign it to a Group. Kafka automatically balances the partitions among the consumers in that group.

Here is the relationship between Consumers n(C) and Partitions n(P):

n(C) < n(P):
- Result: Some consumers will read from multiple partitions.
- Status: Heavy load on individual consumers.
n(C) = n(P):
- Result: The ideal state. Each consumer handles exactly one partition.
- Status: Balanced.
n(C) > n(P):
- Result: Since a partition cannot be split, the extra consumers will have no work.
- Status: Idle. (Useful only as failover backups).

5. Monitoring with Kafka UI

Finally, we can visualize everything we just built. By opening localhost:9012 (or your configured port), we can see the Kafka UI.

What to look for:

Topics: Verify your topic exists with the correct partition count.

Messages : Here we can see each message with its partition assigned and offset, how partitioned are assigned is out of scope for now, we'll pick it up in future.