DEV Community

Byron Hsieh
Byron Hsieh

Posted on

Understanding Kafka Producer: From Basics to Sticky Partitioner

Introduction

When I started learning Apache Kafka, one of the first questions I had was: "How exactly does a Producer work?" The documentation was thorough, but I wanted to understand it through practical examples.

In this article, I'll walk you through what I learned about Kafka Producers, from sending your first message to understanding the mysterious "Sticky Partitioner." We'll build two simple Java programs and explore some interesting behaviors along the way.

This guide is based on the excellent course "Apache Kafka Series - Learn Apache Kafka for Beginners v3".


Part 1: My First Kafka Producer

Let's start with the simplest possible Kafka Producer that sends a single message.

Setting Up Producer Properties

Every Kafka Producer needs configuration. We use Java's Properties class for this:

Properties properties = new Properties();
properties.setProperty("bootstrap.servers", "127.0.0.1:9092");
properties.setProperty("key.serializer", StringSerializer.class.getName());
properties.setProperty("value.serializer", StringSerializer.class.getName());
Enter fullscreen mode Exit fullscreen mode

What's happening here?

  • bootstrap.servers: The address of your Kafka broker(s)
  • key.serializer: Converts your key object into bytes for transmission
  • value.serializer: Converts your value object into bytes

Note: java.util.Properties is similar to Python's dict, but it only stores string key-value pairs. It's part of the Java standard library, not specific to Kafka.

Creating the Producer

With properties configured, we can create our producer:

KafkaProducer<String, String> producer = new KafkaProducer<>(properties);
Enter fullscreen mode Exit fullscreen mode

The generic types <String, String> represent the types for key and value respectively.

Creating and Sending a Message

ProducerRecord<String, String> producerRecord =
    new ProducerRecord<>("demo_java", "hello world");

producer.send(producerRecord);
Enter fullscreen mode Exit fullscreen mode

Here, "demo_java" is the topic name, and "hello world" is our message. Notice we didn't specify a key - it defaults to null.

The Critical Trio: send(), flush(), close()

This is where things get interesting. Here's what each method does:

send() - Asynchronous Operation

producer.send(producerRecord);
Enter fullscreen mode Exit fullscreen mode
  • Asynchronous: The message goes into a buffer, it's NOT sent immediately
  • If your program exits right after this, the message might never reach Kafka

flush() - Synchronous Operation

producer.flush();
Enter fullscreen mode Exit fullscreen mode
  • Synchronous: Forces all buffered messages to be sent and blocks until complete
  • Useful for learning/demos to ensure messages are sent
  • Rarely used in production because it impacts performance

close() - Cleanup

producer.close();
Enter fullscreen mode Exit fullscreen mode
  • Shuts down the Producer and releases resources
  • Internally calls flush() to ensure all messages are sent
  • MUST be called in production to prevent resource leaks

Here's the relationship:

send()          flush()         close()
  ↓               ↓               ↓
[Buffer] -----> [Send] -----> [Clean up]
(async)        (sync)         (includes flush)
Enter fullscreen mode Exit fullscreen mode

Best Practice:

  • Always call close() in production
  • Avoid calling flush() unless absolutely necessary
  • Let Kafka handle batching automatically for better performance

Part 2: Producer with Callbacks

Now let's level up and add callbacks to track message metadata.

Why Use Callbacks?

Callbacks let you know when a message is successfully sent (or if it failed):

producer.send(producerRecord, new Callback() {
    @Override
    public void onCompletion(RecordMetadata metadata, Exception e) {
        if (e == null) {
            // Success!
            log.info("Received new metadata \n" +
                    "Topic:\t" + metadata.topic() + "\n" +
                    "Partition:\t" + metadata.partition() + "\n" +
                    "Offset:\t" + metadata.offset() + "\n" +
                    "Timestamp:\t" + metadata.timestamp());
        } else {
            // Something went wrong
            log.error("Error while producing", e);
        }
    }
});
Enter fullscreen mode Exit fullscreen mode

What metadata can you get?

  • topic: Which topic the message was sent to
  • partition: Which partition number within that topic
  • offset: The position of this message in the partition
  • timestamp: When the message was created

Part 3: Understanding the Sticky Partitioner

This is where it gets really interesting. When I ran my producer sending 100 messages to a topic with 3 partitions, I noticed something odd: all messages went to partition 0.

Was my code broken? Not quite. I needed to understand the Sticky Partitioner.

What is the Sticky Partitioner?

Introduced in Kafka 2.4+, the UniformStickyPartitioner is the default partitioner when messages don't have a key.

How it works:

  • Messages "stick" to one partition until a batch is full
  • When the batch is sent, switch to a different partition
  • Goal: Improve performance by reducing network requests

When Does a Batch Get Sent?

A batch is sent when ANY of these conditions is met:

  1. batch.size - Batch reaches configured size (in bytes)
  2. linger.ms - Time limit reached (in milliseconds)
  3. flush() - Manually forced

Common Misconception: batch.size is NOT time!

I initially thought batch.size was related to time. It's not!

  • batch.size = Size in bytes (default: 16384 bytes or 16 KB)
  • linger.ms = Time in milliseconds (default: 0)

Demonstrating Sticky Partitioner Behavior

To observe partition switching, we need to make batches fill up quickly:

// Make batches smaller so they fill up faster
properties.setProperty("batch.size", "400");  // 400 bytes instead of 16KB
Enter fullscreen mode Exit fullscreen mode

Why 400?

  • Each message is roughly 15-20 bytes ("hello worldXX")
  • 400 bytes ÷ 20 bytes ≈ 20 messages per batch
  • When a batch fills up → it's sent → switches to next partition

The Loop with Delay

for (int i = 0; i < 100; i++) {
    if (i % 10 == 0) {
        Thread.sleep(1000);  // Pause every 10 messages
    }

    ProducerRecord<String, String> producerRecord =
        new ProducerRecord<>("demo_java", "hello world" + i);

    producer.send(producerRecord, callback);
}
Enter fullscreen mode Exit fullscreen mode

Expected Behavior

With 3 partitions and batch.size=400:

  1. First ~20 messages → Partition 0
  2. Next ~20 messages → Partition 1
  3. Next ~20 messages → Partition 2
  4. Cycle continues...

You can observe this in the callback logs showing the partition number!


Key Configuration Parameters

Here's a quick reference table:

Property Description Default Unit
bootstrap.servers Kafka broker address None -
key.serializer Key serializer class None -
value.serializer Value serializer class None -
batch.size Max batch size 16384 bytes
linger.ms Max wait time before sending 0 milliseconds

Keyed vs Keyless Messages

The partitioning behavior changes based on whether you specify a key:

Scenario Partitioner Behavior
With key Messages with the same key go to the same partition (hash-based)
Without key Uses Sticky Partitioner (batch-based)

Troubleshooting Common Issues

Error: Invalid value null for configuration value.serializer

Cause: Typo in property name

  • values.serializer (with 's')
  • value.serializer (singular)

All Messages Going to Partition 0

Possible causes:

  1. Topic only has 1 partition
  2. batch.size is too large - all messages fit in one batch
  3. Wrong delay logic (i/10 instead of i%10)

Solution:
Check your topic configuration:

kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic demo_java
Enter fullscreen mode Exit fullscreen mode

Adjust batch size:

properties.setProperty("batch.size", "400");
Enter fullscreen mode Exit fullscreen mode

Fix the delay logic:

if (i % 10 == 0) {  // Correct!
    Thread.sleep(1000);
}
Enter fullscreen mode Exit fullscreen mode

Why Sticky Partitioner is Better Than Round-Robin

Old Way: Round-Robin (Pre-Kafka 2.4)

  • Each message goes to a different partition in sequence
  • 100 messages = potentially 100 network requests
  • More overhead, more latency

New Way: Sticky Partitioner (Kafka 2.4+)

  • Messages stick to one partition until batch is full
  • 100 messages = maybe 5 network requests (assuming 20 messages per batch)
  • Fewer network calls = better throughput

When to Use flush()

Good use cases for flush():

  • Financial transaction systems (need immediate confirmation)
  • Critical logging where every message must be guaranteed

Avoid flush() in:

  • Regular log aggregation
  • High-throughput data pipelines
  • Real-time analytics (slight delay is acceptable)

Remember: Kafka's automatic batching is designed for performance. Only override it when you have a specific requirement!


Key Takeaways

  1. Properties configuration is the foundation - get it right
  2. send() is async - use flush() or close() to ensure delivery
  3. Callbacks provide metadata - use them to track message status
  4. Sticky Partitioner batches by partition - improves performance
  5. batch.size is in bytes, not time - linger.ms is the time setting
  6. Keyless messages use Sticky Partitioner
  7. Always call close() - prevents resource leaks

Conclusion

Understanding how Kafka Producers work is fundamental to building robust event-driven systems. The key insights I gained were:

  1. The asynchronous nature of send() and why close() is critical
  2. How callbacks provide visibility into message delivery
  3. The performance benefits of the Sticky Partitioner
  4. The importance of proper configuration (especially batch.size vs linger.ms)

These concepts form the foundation for more advanced Kafka patterns like transactions, idempotent producers, and exactly-once semantics.


This article is part of my learning journey through Apache Kafka. If you found it helpful, please give it a like and follow for more Kafka tutorials!

Course Reference: Apache Kafka Series - Learn Apache Kafka for Beginners v3

Top comments (0)